Friday, July 15, 2005
Weighing the Internet
In 1798 Henry Cavendish, known for his scientific brilliance and terrible fear of women, developed a system for calculating the gravitational constant (G) by measuring the gravitational attraction between two small spheres. In essence, he was able to "weigh the earth" by comparing the relationship between two known objects.
This got to thinking about weighing the internet -- calculating the number of users online. Since I am by no means a brilliant scientist and am horribly attracted to women everywhere, there were obviously roadblocks in my path that Henry did not have to deal with.
Want to know how many internet users there are? Curious about how many people read a site like Slashdot every day? Read on!
Tools
Alexa has done a nice job collecting the browsing statistics for a sizeable sample of internet users. It’s not a perfect sample, as it relies on a browser plugin that requires a voluntary install, but it’s about as good a sample as is available.
Using Alexa, you can find the percentage of internet users that visited a particular site on a particular day. If we know the actual number of visitors that come to a particular site, and compare this with the Alexa data, we can extrapolate the total number of users on the internet for that day.
Reference Data
Now, measuring the gravitational attraction between relatively small masses is very difficult, due to the fact that the actual gravitational force between tiny objects is infinitesimal. The larger the mass, the more measurable it's gravitational effect. In other words, Cavendish needed some really large balls to weigh the earth. 350 pound balls, to be precise.
There are some similarities, here, to weighing the internet. Alexa data is only really valid for the top 100,000 sites, so you need the stats for a relatively large site to even attempt to make a measurement. Not a lot of sites in the top 100,000 are too keen about divulging their stats. This kind of information is what you might call "shock value bragging material," so it’s typically saved for special introductions and dinner party conversations. So when Vince and Eliot were podcasting about the number of daily hackaday viewers, I realized we now have the missing piece of the puzzle.
Results
The two of them seemed to be in a bit of a disagreement as to the number of page hits hackaday receives daily, with Eliot figuring 65k and Vince figuring 80k. Assuming they both were making a reasonable estimate, I’m going to average that to a whopping 72,500 page hits a day on hackaday.
This was sometime around July 8th. According to Alexa, around that same time they had a reach of about 110 people per million users. On average, people who visited hackaday viewed roughly 1.4 pages.
So we can figure out the number of people who view hackaday by dividing 72,500 by 1.4, which gives us roughly 51,800 daily viewers. The 110 per million figure tells us that they get about .011 percent of the internet’s viewers. 51,800 divided by .00011 leaves us with a result of about 471 million internet users.
With this knowlege, you can easily estimate the traffic to other sites. If we go by the 471 million estimate, Slashdot gets a whopping 380,000 daily readers.
A Perplexing Conclusion
Unless my math is wrong, the result is way off from the 880+ million users Nielsen/NetRatings reports. Even if we go with Vince’s 80k/day estimate, that still leaves us with only 519 million users. It could be that the Alexa reach is exaggerated due to hackaday readers having the Alexa toolbar installed more than average, but I highly doubt that. I ran the numbers with BlogCadre (a statistically much smaller sample) from when we got boingboinged and it came out similarly, around 520 million users.
It would appear that either Nielsen is pretty inflated, or the Alexa toolbar is unavailable to a very large population of people -- a population that disproportionately doesn't read hackaday when compared to the population the toolbar is exposed to.
Cavendish’s measurement with the 350 pound lead spheres was so accurate that it stood for over a century. It had only a 1% error. It makes me jealous.
But who is off? Alexa? Nielsen? Both? Is hackaday still not a large enough site to produce statistically valid results? Are the available web use statistics really that inaccurate? I’m looking forward to your comments.
source:http://www.blogcadre.com/blog/jason_striegel/weighing_the_internet_2005_07_13_03_37_07