Further Exploration of Private Browsing

In Statistical Analysis, Statistics on November 27, 2010 by David Tagged: , ,

Mozilla Labs and the Metrics Team, together with the growing Mozilla Research initiative, are hosting a Open Data Visualization Competition based on Test Pilot data. I really enjoy reading their blog posts, and now that they’ve opened up their dataset, I wanted to have a go at it. On the Mozilla website, there is an option to enroll in a data-collection study on how individuals use their browsers. In addition to usage statistics such as how many tabs are open and how frequently they use their browser, there is a survey of demographics and self-described interests.

There was a really blog post on how people use Private Browsing Mode based on usage data. I wanted to see if I could go one step further, by testing their conclusions and cross-referencing individual usage data with their survey responses. I was able to confirm some of their conclusions, such as the fact that most people spend about 10 minutes in Private Browsing mode, but because location data was stripped from the dataset, was not able to verify the spikes according to time of day. From PDT it looks like there is large peak throughout the afternoon, but this probably skewed by different numbers of users in each time zone.

A Greater Proportion of Male Users than Female Users use Private Browsing Mode

The dataset is skewed towards having more males represented in the sample population (94% male), but in terms of most metrics, there is gender equity. Males and females use roughly the same number of extensions, have similar age distributions, and have very similar self-reported number of hours in front of the computer. Females do seem more modest in self-reporting of proficiency with a computer. The most striking difference was the difference in use of Private Browsing Mode, with almost four-fold increase in the proportion of males. Further statistical analysis based on gender, either of the duration or frequency of the use of Private Browsing Mode seems suspect due to the small sample size.

Younger People Tend to spend more time in Private Browsing Mode

In addition to gender, there appears to a slight, admittedly weak, relationship between the age of the individual and the average time spent in Private Browsing Mode. The data is colored based on gender, with blue for males and red for females. There appears to be a slight bump in the 18 to 25 age category, although this could be due to differences in sample size across different ages. Note: This plot is of individuals which use Private Browsing Mode – if examining the population at large, there would be a ton of data points with a duration of 0.

Self-Identification Affects Private Mode Usage

Question 12 of the survey posed the question “What are your most frequently visited websites?”. The survey allowed for a variety of responses ranging from “Search engines” and “Social networking sites” to “Adult pages” and “Gambling and online betting”. I was curious whether this self-characterization would be a good metric to identify individuals who use Private mode more often. I was able to separate out the survey responders based on whether they chose each website different categories. For example, I subsetted the entire survey into individuals who chose “Social networking sites” vs. individuals who did not choose “Social networking sites”. A priori, if this self identification did not matter, there should be little to no difference between the average time in Private Browsing Mode between the two populations.

For each category, here is the absolute difference in the two populations.

[1] 5.594305
[1] 3.84167
[1] 0.2490019
[1] 0.7180658
[1] 3.601866
[1] 0.3737534
[1] 1.408243
[1] 1.525824
[1] 3.27257
[1] 6.79226
[1] 4.651107
[1] 5.669116
[1] 1.027911

There was the smallest difference in individuals who claim and do not claim to use the internet for “News sites”, “Social networking sites”, and “Shopping”, while there appears to be a bigger difference in individuals who claim to use the internet for “Forums”, “Adult pages”, and “Gambling and on-line betting”. There appears to be a noticeable difference in usage between individuals selected any of the “riskiest” 3 categories and individuals who did not.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: