It has been analyzed about 4 million web sites and arrival has come with some interesting facts about the website creation and their statistical data. Did you know 5% of pages have either a Twitter or Facebook link? Or that 28% of sites run Google Analytics? Or 12% of them run Google AdSense? Now you do!

The core data comes from CommonCrawl, a non-profit group designed to crawl the web and provide data for anyone to use. Gil Elbaz is both a founder of CommonCrawl and of Factual, a start-up that creates tables of structured information from data found on the open web (see Factual: Parting The Curtains Of The Invisible Web).

Factual found stats such as I cited above after examining 4 million web sites. In particular:

* 28% of sites have Google Analytics on them
* 12% of sites have AdSense
* 5% of sites have EITHER a Twitter or Facebook link but…
* 2% of sites have BOTH a Twitter or Facebook link

There’s also a chart that shows other interesting stats but without precise percentages. I’ll estimate as best I can:

* About 20% of sites have Flash
* About 19% of sites have an RSS feed
* About 6% of sites have a sitemaps file
* About 1% of sites have a Google Webmaster Central verification code
* About 1% of sites have Quantcast tracking code
* About 0.5% of sites have a Creative Commons attribution