header
Vol. 12 No. 4, October 2007


 

Concentration of Web users' online information behaviour


Chun-Yao Huang
Department of Business Administration, National Taiwan University, Taipei, Taiwan
Yung-Cheng Shen
Department of Business Administration, Yuan Ze University, Taoyuan, Taiwan
I-Ping Chiang
Department of Information Management, National Taipei University, Taipei, Taiwan
Chen-Shun Lin
Department of Statistics, Renmin University of China, Beijing, China



Abstract
Introduction. Focusing on Web users' behavioural concentration across Websites they have visited, we investigate heterogeneity in Web users' online information behaviour.
Method. The Gini coefficient is used to measure the degree of a Web user's online information behavioural concentration in terms of both page-views and visit duration. We explore how the behavioural dimensions of the number of sites visited, the number of page-views per site and the duration per page predict online information behavioural concentration.
Analysis. Data from an online panel are analysed using multiple regression models, which reveal that the three dimensions of online information behaviour predict more than three quarters of the variances in behavioural concentration.
Results. The number of sites visited and the number of page-views per site positively predict the degree of behavioural concentration (in terms of both page-views and visit duration), while the speed dimension of online information behaviour positively predicts the degree of behavioural concentration in terms of page-views but negatively predicts that in terms of visit duration. The relative importance of variables in the explanation of Web users' degree of behavioural concentration is also analysed.
Conclusion. The quantitative analytical framework presented herein gives insight into the heterogeneity of online information behaviour. This paper is a stepping-stone for a more comprehensive understanding of online information behaviour from a macro perspective.




Introduction

Being a hyperlinked hypermedia information system (Huberman et al. 1998; Kari and Savolainen 2003), the World Wide Web (hereafter, the Web) is an interactive environment as well as a multi-channel communication tool (Wikgren 2001). The Web can act as an instrument of communication, education, business, entertainment, finance, staying informed, passing time, relaxing, escape, socialization, work, surveillance, etc. (Kari and Savolainen 2003; Song et al. 2004). On the Web, users follow links, enter queries, and scan pages (Pharo 2004). In everyday life, Web users search and retrieve, browse, monitor, unfold, exchange, dress, instruct, and publish content in various guises of information (Hektor 2003). The Web, to borrow from Savolainen (1995), has gradually rooted in its users a 'way of life' as well as a 'mastery of life'.

Although the literature on online information behaviour has been growing with the prevalence of the Internet, most published studies focus on micro-level behaviour confined to specific domains or tasks (e.g., Koufaris 2002; D'Ambra and Wilson 2004; Hope and Li 2004; Jansen et al. 2005; Tombros et al. 2005), study a specific segment of Web users (e.g., Hsieh-Yee 1998; Rieh and Belkin 2000; Zhang et al. 2004; Andrews et al. 2005), explore the motives of Web users (e.g., Bilal 2002; Savolainen 2004; Song et al. 2004), or propose conceptual models of online information seeking behaviour (e.g., Hektor 2003; Kari and Savolainen 2003; McKenzie 2003). By its nature, the Web is able to keep every user's behavioural details and, therefore, is (at least potentially) a perfect platform for an in-depth analysis of traceable human information behaviour. In a commercial setting, descriptive statistics of visitors' online information actions (such as page-views, visit duration, visit sessions, etc.) at competing sites have been constantly reported. However, owing to the difficulty in accessing data that client-side software collects under which such cross-site reports are compiled, other than efforts sush as Toms et al. (2003), Spink and Jansen (2004), and Jansen and Spink (2006), which examine Web searching across multiple systems and domains, little research has been done so far by academics to empirically explore behavioural patterns of Web users' everyday Web usage in a systematic way.

Given the established literature as a foundation, this paper studies online information behaviour from the perspective of 'behavioural concentration' and aims to serve as a stepping-stone for further in-depth understanding of how the Web is used by heterogeneous users. In terms of behavioural concentration, we focus on how a Web user distributes his/her page-views and visit duration among a set of sites that s/he visited during a period of time. For both page-view and visit duration, the behavioural concentration is measured by the Gini coefficient. Coupling informetrics with an online panel in the empirical study, we are able to provide a holistic picture that parsimoniously quantifies how users consume information on the Web and how these users' behaviour differs from each other.

Background

Online information behaviour

According to Wilson et al. (1999), information behaviour is behaviour engaged in by persons in relation to information sources and channels, which cover information seeking behaviours that may be either active or passive. Spink and Cole (2004) further elaborate the concept of information behaviour as 'a broad term covering all aspects of information seeking, including passive or undetermined information behavior'. Therefore, online information behaviour, by definition, includes all activities that users conduct on the Web, be it goal-directed searching or just surfing without a specific purpose.

The literature also indicates that the multi-facet Web, to its heterogeneous users, is taken as an information environment where hypermedia and hypertext documents are linked to one another (Kari and Savolainen 2003), an information space which the user has navigated before or can navigate (Pharo 2004), an information horizon which is an imaginary field or subjective map of source preferences in which information sources are given various positions (Sonnewald 1999; Savolainen and Kari 2004), and an information world that individuals resort to when facing information needs (Bruce et al. 2004). Therefore, the Web has rich cognitive and affective, as well as conatative implications for its multifarious users.

To analyse online information behaviour, various conceptual models have been proposed with a focus on behavioural or attitudinal typology. Hektor (2003) suggests eight forms of information activities in a model of online information behaviour in everyday non-work life. Synthesizing Marchionini's (1995) and Wilson's (1997) typology of browsing modes and Ellis's (1989) general information seeking model, Choo et al. (1999) propose an integrated conceptual model of browsing and searching on the Web in the form of an analytical matrix that typifies behavioural modes and moves of information behaviour on the Web. Following the use and gratification paradigm, Song et al. (2004) attribute multiple gratifications that Web users seek online to the dichotomy of process and content gratifications. Empirically, most recent studies of online information behaviour focus on specific sites (e.g., Bucklin and Sismeiro 2003; Mat-Hassan and Levene 2005; Zhang et al. 2004), unique contexts (e.g., D'Ambra and Wilson 2004; Kim 2001; Rieh 2004; Savolainen and Kari 2004; Spink et al. 2004), or segments of users (e.g., Song et al. 2004; Cole et al. 2005) through experiments (e.g., Kim 2001; Lin 2005; Tombros et al. 2005), surveys (e.g., D'Ambra and Wilson 2004; Slone 2003; Song et al. 2004; Tabatabai and Shore 2005), and clickstream data analysis focusing on the interaction at information aggregators (e.g., Silverstein et al. 1999; Jansen et al. 2000; Anick 2003; Mat-Hassan and Levene 2005; Park et al. 2005; Agichtein et al. 2006) for a specific set of Websites.

The most relevant studies for our research in the growing body of online information behaviour literature are those focusing on observable browsing strategies and tactics (Catledge and Pitkow 1995; Tauscher and Greenberg 1997; Huberman et al. 1998; Nicholas et al. 2003; Bucklin and Sismeiro 2003; Zauberman 2003; Johnson et al. 2003; Kari 2004). Most of this line of research looks at the micro-level online information behaviour. At the relatively early stage of the Web's development, Catledge and Pitkow (1995) find that more than 90% of page downloads by users in front of a browser were made through hyperlink clicking and the back function built in the browser, a finding empirically echoed almost a decade later by Kari (2004). They also establish that users tend to navigate only a small area within a particular site in a hub and spoke manner. Tauscher and Greenberg (1997) support this very finding and further indicate that Web users use a few pages frequently and use short, repeated navigation paths. Huberman et al. (1998) implement the inverse Gaussian distribution to model the strong regularities found in Web surfing. Nicholas et al. (2003) characterize online information behaviour as 'one of bouncing in which users seldom penetrate a site to any depth... and seldom return to sites they [visitors] once visited'. According to them, online information behaviour of Web users therefore is 'seemingly shallow, promiscuous and dynamic'. Bucklin and Sismeiro (2003), Zauberman (2003), and Johnson et al. (2003) from various approaches support the hypotheses that online information behaviour is guided by cost-benefit tradeoffs and, therefore, people may lock themselves into a small set of Websites.

Research questions

In spite of the scope of recent conceptual and empirical studies on online information behaviour, there are vacuums in the literature awaiting a systematic investigation for a more comprehensive understanding of how the Web is used. Although there have been efforts to classify users (e.g., Jaillet 2002; Sheehan 2002), studies focusing on observable browsing strategies and tactics as we have discussed above tend to neglect differences in users' overall online information behaviour. People may differ in the number of Websites navigated, number of page downloads within a site, and/or speed when they use the Web for whatever information is needed. Overall, people are likely to have very different behavioural profiles on the Web. Exploring such differences and the underlying factors leading to such differences may provide valuable insight into Web users' online information behaviour. Owing to the difficulty in accessing client-side data collected across sites, however, heterogeneity in Web users' Web usage patterns has seen little discussion in the literature.

The fact that heterogeneity in Web users' usage patterns has not yet been systematically analysed results in ambiguities that obstruct our in-depth understanding of online information behaviour. For example, it is reported that people seldom go back to Websites they visited before (Nicholas et al. 2003: 24). It has also been established that Web users develop a hub and spoke (Catledge and Pitkow 1995) structure of navigation paths within a site in a repeated way (Tauscher and Greenberg 1997). However, little is known about the more holistic picture. Does the typical cross-site behaviour of a user's everyday Web usage also include a core set of sites on which people concentrate most of their online activities and do they use this core set of sites to anchor their online navigation and exploration? If so, how do people differ in the size and the relative importance of the core?

Studies supporting lock-in online information behaviour (e.g., Bucklin and Sismeiro 2003; Zauberman 2003; Johnson et al. 2003) claim that because the switching cost increases with online experience, experienced Web users are likely to be locked into a very small set of Websites with which they are familiar, for the fulfilment of a specific information need. If this is certain, then, when facing increasing switching costs, these locked-in Web users are unlikely to make substantial efforts to explore the Web and will stick to familiar sites. On the other hand, in everyday life a typical Web user has multifarious information needs. Aggregating online activities for the fulfilment of these needs, what kind of pattern emerges as to Web users' overall online information behaviour?

To clarify these ambiguities, we propose to look at online information behaviour from the perspective of behavioural concentration. By behavioural concentration, we mean the degree to which a Web user concentrates his/her page-view downloads and visit duration among a set of sites visited (we illustrate behavioural concentration and operationalise the concept in the following Method section). The concept is derived from the work of researchers in author productivity studies who have determined that, 'a large portion of the productivity of a whole domain is "concentrated" on a relatively small number of authors', such that observing the degree of concentration in a domain serves to characterize the heterogeneity of author productivity in that domain (Yoshikane et al. 2003: 521). Following this idea, the differences in Web users' online behavioural profiles can be parsimoniously characterized by heterogeneity in their degrees of behavioural concentration. As a Web user typically visits multiple Websites during a substantial period of time, s/he spends more time on and reads more pages from a few of the sites that s/he visits. This concentration focus is potentially important in opening a door to new avenues, which complement current ones, to understanding online information behaviour.

Focusing on behavioural concentration, by the analytical approach proposed below we aim to address the following questions:

  1. How do Web users (people who conduct any activity on the Web which leads to page downloads from Websites) differ in their behavioural concentration?
  2. How does one account for heterogeneity in such online information behaviour concentration? In other words, what factors predict the degree of behavioural concentration at the individual level?
  3. What insight can be gained from empirically analysing heterogeneity in such behaviour concentration?
  4. Especially since behavioural concentration is a new perspective to analysing online information behaviour, what pattern emerges from such an analysis to complement our understanding of online information behaviour?

Method

Concentration and the Gini Coefficient

Given the research questions, we are interested in exploring the degrees of concentration of Web users' online information behaviour and finding explanations of heterogeneity in degrees of concentration.

Online information behaviour here is operationally measured by every additional click of a mouse or ever keyboard depression by a user in an online mode. A user's online information behaviour thus encompasses all activities that s/he undertakes on the Internet for the sake of purchasing, occupationally-related task completion, information bank building, self-expression, maintaining interpersonal relationships, pleasure pursuing or experiencing interests, and so forth (Hoffman and Novak 1996; Kari and Savolainen 2003; Song et al. 2004).

The notion of concentration plays an important role in modern informetrics as it is implied in the historical law of Lokta, which is the basis for authour productivity studies (Egghe 2005). Being a concept that can be objectively measured so as to render an overall picture of the distributional pattern of the subject matter of concern, concentration has long been studied by economists (e.g., Stigler 1964; Allen 1981; Koeller 1995) and informetricians (e.g., Drott 1978; Egghe and Rousseau 1991; Yoshikane et al. 2003). However, researchers have not yet recognized the potential that concentration studies may enrich our understanding of information behaviour.

There are various methods to measure concentration. Among these, the Gini coefficient is a very popular one utilized in both economics (e.g., Stigler 1964; Allen 1981; Koeller 1995) and informetrics (e.g., Drott 1978; Egghe and Rousseau 1991; Yoshikane et al. 2003). Technically, it has been suggsted that, owing to certain properties, the Gini coefficient is 'a good concentration measure' (Egghe 2005: 939). Yoshikane et al. (2003) further point out that unlike some other concentration measures that are sensitive to extreme cases, the Gini coefficientis more robust.

geometric illustration of the gini coefficient

Figure 1: The Geometric Illustration of the Gini Coefficient

The Gini coefficient is therefore adopted in this paper for the measurement of the degree of online information behavioural concentration. Geometrically, as Figure 1 illustrates, the Gini coefficient is defined as two times the area between a Lorenz curve and the 45-degree straight line (i.e. 2*A in Figure 1; Egghe 2005). For user j who downloaded pages from multiple sites, we calculate his/her Gini coefficient of information consumption by(Yoshikane et al. 2003):

Yoshikane equation

where Vj represents the number of sites Web user j visited, fl and fm represent units (number of page-views or duration in seconds) of information consumed in sites l and m, respectively, and Pj represents user j's average number of page-views per site across the Vj sites. The Gini coefficient thus has a range between 0 and 1.

We use the simplified example in Figure 2 to illustrate what we mean by the concentration of online information behaviour (in terms of page-views). Cases in the figure represent the distribution of all page downloads that a Web user has made from all the sites s/he visited during a (short) period of time. For case (A), the user visited ten Websites and made ten page-views from each site within the period of time. Across the ten sites, this user's behaviour is evenly distributed and is not concentrated at all. The Gini coefficient is thus zero. For case (B), the user also visited ten Websites and made a total of 100 page downloads, but this time s/he made twelve page-views each from six sites and seven page-views from each of the other four sites. Thus, this user's online information behaviour is slightly more concentrated (on the six sites from which more page-views were made) than in case (A), as the Gini coefficient is 0.12. In case (C), the user visited just five sites and made a total of fifty page-views. Thirty-six page-views were made from three sites, and the other fourteen pages were downloaded from the remaining two sites. Again, the Gini coefficient is 0.12. In case (D), the user visited ten Websites and conducted a total of 100 page downloads. Among the 100 page-views, however, ninety-one pages were downloaded from a single site, whereas the other nine pages were downloaded from the remaining nine sites -- a much more concentrated case than the former three with a Gini coefficient of 0.81. Because in the calculation of the Gini coefficient we only count sites that Web users visit, a Web user will have a less-than-one maximum Gini coefficient if s/he visits multiple sites. In case (D) when the Web user visited ten sites and made a total of 100 page-views, the maximum Gini coefficient is 0.81.

An interesting observation is that case (B) is a scaled-up version of case (C) with the number of sites visited and page-views downloaded simultaneously having doubled without changing the distribution of page-views per site. The two cases have the same Gini coefficient: their degrees of behavioural concentration are identical. The equation above can generalize that the Gini coefficient will remain the same if (1) the number of sites visited and page-views downloaded change by the same scale without changing the distribution of page-views per site or (2) page-views per site across all the sites visited change by the same scale without changing the number of sites visited. This characteristic of the Gini coefficient is important in our following empirical analysis.

illustrations of gini coefficient

Figure 2: Illustrative cases of behavioural concentration

Cases (F) and (G) are real-world examples drawn from the empirical study reported below. The panelist in case (F) visited 332 sites during the period of observation. The degree of page-view concentration across these 332 sites measured by the Gini is 0.78. The panelist in case (G) visited much fewer (126) sites in the same period and concentrated most of the page-views made across the 126 sites on just a handful of sites. To be more specific, this Web user made 4,086 page-views across 126 sites in a month. Almost half (1,978) among the 4,086 page-views were made on two major portal sites. The Gini coefficient for this highly concentrated case is 0.92

.

Data

A month-long, client-side, user-centred panel dataset was obtained from InsightXplorer Limited, Taiwan for our empirical study. InsightXplorer has a syndicated service (like those provided by NetRatings and MediaMetrix) for the Taiwanese Internet market. It maintains an online panel for the local cybermarket and tracks panelists' every online clickstream across Websites that they visit by installing a proprietary client-side software on panelists' computers. The software also solves tracking problems usually encountered upon analysing visiting activities using site-centric data such as caching and proxy server downloads. Panel participants allowed InsightXplorer to track their every clickstream online and in return were compensated by regular sweepstakes programmes. They can uninstall the client-side software and thereby stop being tracked by InsightXplorer at their own will.

Data were collected from 2,022 Internet users who have made at least one visit to any Website during the month of July 2003 and made available for this study. For each panelist, the number and time of sites visited and pages downloaded in July 2003 are recorded by the client-side software and transfered online back to InsightXplorer in the form of raw logs. Comparing demographics of the sample with the estimates provided by TWNIC (a non-profit organisation that takes charge of domain name registration and IP allocation in Taiwan), the sample's average age (31 years old) is close to the average Taiwanese Web users' age (30 years old) estimated by TWNIC, whereas the sample's male proportion (72%) is slightly higher than the TWNIC estimate (61%).

Aggregating from the raw logs, each panelist's number of sites visited, number of page-views made per site, and time spent per page-view were summarized. We utilize these data to calculate the Gini coefficients in page-view distribution and duration distribution for each panelist.

The 2,022 panelists under study made 4,138,423 page-views which lasted for a total of 151,100,519 seconds from 29,042 different Websites. Figure 3 presents the distribution of the number of sites the panelists visited during the observation month. On average, a panelist visited ninety-four different Websites in July 2003.

Distribution of the number of sites visited

Figure 3: Distribution of the number of sites visited among the 2,022 panelists

Figure 4 and Figure 5 present the distributions of number of the page-views made and average duration per page-view across the 2,022 panelists in the month-long observation period, respectively.

Distribution of the number of page-viewsDistribution of duration per page
Figure 4: Distribution of the number of page-views                                Figure 5: Distribution of duration per page

Figure 6 and Figure 7 present the distributions of behavioural concentration of the panelists in terms of page-views made across sites visited and time duration at the sites visited, respectively. The average Gini coefficient in terms of page-view is 0.75, whereas that in terms of time duration is 0.78.

Distribution of behavioural concentration in page-viewDistribution of behavioural concentration in visit

Figure 6: Distribution of behavioural concentration in page-view     Figure 7: Distribution of behavioural concentration in visit duration

Given these descriptive statistics and distributional information, it is apparent that these dimensions of online information behaviour on the Web are highly heterogeneous. Furthermore, the relatively high degrees of behavioural concentration found here (Figures 6 and 7) indicate that most Web users in the panel have a core set of sites in which they may dig deeply to extract information, from where they may orient their online information gathering, on which they may anchor their navigation, and to where they may return frequently. Outside of this core set, there is a much larger set of sites for which the Web user just makes occasional limited visits.

Analysis

The descriptive summaries reported above show high heterogeneity of online information behaviour across the panelists. In the following, we will try to establish the quantitative relationships between behavioural concentration, the focus of this study, and the dimensions of online information behaviour. For this purpose there are two phases of analysis. In the first phase, to get the flavour of the relationship between behavioural concentration and total number of online actions, we explore how behavioural concentration in terms of page-views made across sites is associated with total number of page-views with the aim of specifying a functional form best suitable for describing such an association. In the second phase, we run multiple regressions to further establish the relationships between behavioural concentration (in both terms of page-views and visit duration) and online information activities (number of sites visited, page-views per site, and time spent per page-view).

Behavioural concentration and total page-views

We first plot each panelist's number of page-views made in the month against his/her Gini coefficient of page-views distributed across sites s/he visited (Figure 8). Judged by Figure 8 in which each point represents the association of a panelist's total page-view number and Gini coefficient, there is a non-linear relationship between the two variables. It should be noted that the upper and lower boundaries to the data in Figure 8 is due to the data and not the issue of the maximum Gini coefficient for a given number of page-views. For example, with 2000 page-views, the maximum Gini coefficient is 0.999 (when a visitor visited two sites and made 1,999 and 1 page downloads from them, respectively), which is higher than the upper boundary in Figure 2 for around 2000 page-views. The minimum Gini coefficient is 0 (when a visitor visited just one site), which is lower than the lower boundary in Figure 2 for around 2000 page-views.

Plotting the number of total page-views against Gini

Figure 8: Plotting the number of total page-views against Gini

Judged by Figure 8 in which each point represents the association of a panelist's total page-view number and Gini, there is a non-linear relationship between the two variables. A series of transforming attempts indicate that taking the double logarithm of the number of page-views best explains the Gini heterogeneity in a linear way. Figure 9 shows the relationship after the transformation. A simple regression of users' Gini coefficients on their log(log(number of page-views)) renders a coefficient of determination (R2) at 0.82. Obviously, we have a very strong (statistically speaking) case here, whereby the more page-views a Web user makes, the higher the proportion of page-views concentrated on a relatively small set of anchoring or core Websites.

Plot of log(log(no of page-views)

Figure 9: Plotting log(log(number of total page-views)) against Gini

Behavioural concentration and dimensions of online information behaviour

We next investigate how the dimensions (i.e., number of sites visited, number of page-views per site, and time spent per page-view) of online information behaviour influence behavioural concentration by multiple regressions. As we have mentioned, we focus on both the concentration of page-views and that of duration across Websites visited. Therefore, the backbone models of the analysis can be described by the following two regressions.

Page Model:
Gini_page_i =  a0 + a1 * #Sitei + a2 * #page-views per sitei + a3 * Time spent per pagei   

Duration Model:
Gini_duration_i =  b0 + b1 * #Sitei + b2 * #page-views per sitei + b3 * Time spent per page 

Gini_page_i refers to the Gini coefficient of page-view distribution across sites that Web user i visited, and Gini_duration_i refers to user i's Gini coefficient of duration distribution. With the data, in our empirical setting i denotes a series of integers ranging from 1 to 2,022.

We again look for the appropriate transformation for the independent variables in the regressions by trying various transformations. Figures in Table 1 are R squared values associated with various transformed independent variables. As Table 1 shows, the double logarithm transformation once more brings forth the best fit. The signs and significance (at the 0.05 level) of the model coefficients, however, do not change with these transformations. In the Results section below, we report results from the models with independent variables taking a double logarithm.


Table 1: Model fit with various specifications of the independent variables
  Page Model Duration Model
(1) R2: dependent variable without transformation 0.400.38
(2) R2: dependent variable taking a logarithm0.800.73
(3) R2: dependent variable taking a double logarithm0.860.76
(4) R2: dependent variable taking a triple logarithm0.750.63

Results

From the model

Running multiple regressions for both the Page and the Duration models as specified in the Analysis section with the independent variables taking a double logarithm, Table 2 summarizes the results. It is found that dimensions of online information behaviour, including number of sites visited, number of page-views per site visited, and time spent per page-view all significantly (at the 0.05 level) contribute to the explanation of the degree of behavioural concentration (in terms of both page-view concentration and duration concentration). Among them, number of sites visited and number of page-views per site positively explain page-view concentration and duration concentration. Time spent per site also positively explains duration concentration, but is negatively associated with page-view concentration across sites.


Table 2.: Model results from the best-fit specification [ ** p<0.01.]
  Page Model Duration Model
Independent variables Coefficient t-value Coefficient t-value
Intercept0.2943**17.080.2706**13.36
log(log(no. of sites))0.2146**64.750.2256**57.88
log(log(no. of pages/site))0.2667**58.050.1539**28.49
log(log(duration/page))-0.0569**-4.600.0506**3.48
R20.86390.7632
Adjusted R20.86370.7629

Further analysis

To further investigate the role that a Web user's various behavioural dimensions play in influencing his/her behavioural concentration, we use the same model specification reported above and run a series of stepwise regressions. The focus now is on the partial R2. We now want to see how the relative contributions of independent variables predict the dependent variable change as the data structure changes. At the same time, since there is a maximum value for a dependant variable in a regression for some values of the independent variables, we also would like to check the validity of the regression results reported above so as to further validate the analytical framework.

We first run stepwise regressions on the whole dataset (the result is identical to what is reported by Table 2). Second, we run stepwise regressions on the dataset excluding panelists who visited fewer than five sites during the observation month. Third, we run stepwise regressions on the dataset excluding panelists who visited fewer than ten sites during the observation month. The process goes on with a consecutive regression, dropping panelists who visited less than 15 20, 25..., 100 sites. Therefore twenty-one stepwise regressions in total are run for each model. For each stepwise regression, the total R2 is divided into partial R2 of the independent variables. The partial R2 that each independent variable carries implies the relative contribution of that variable in the explanation of the dependent variable.

For both Page Model and Duration Model, the analysis shows that none of the coefficient signs or statistical significance (at the 0.05 level) is changed by different sample sizes. It can also be generalized from the analysis that the speed dimension of online information behaviour has a relatively minor influence on the degree of behavioural concentration (partial R2 below 0.01). An interesting finding is that for the whole dataset, number of sites visited is more important than number of page-views per site in terms of partial R2. However, if we drop more panelists who just visited a small number of sites, the importance of the latter grows and that of former decreases. For both models, number of page-views per site explains more variances in the models than number of sites visited when we drop out of the dataset those light Web users who visited fewer than ten sites during July 2003.

Relative importance of number of sites visited - Page Model

Figure 10: Relative importance of number of sites visited and page-views made per site in the Page Model
Relative importance of number of sites visited - Duration Model

Figure 11: Relative importance of number of sites visited and page-views made per site in the Duration Model

Focusing on the change of partial R2, Figure 10 and Figure 11 plot the results from the analysis.

Findings

Summarizing the results discussed above, we arrive at a series of empirical findings that reveal patterns in the heterogeneity of Web users' behavioural concentration relating to their number of sites visited, number of page-views per site, and time spent per page-view.

  1. The larger the number of sites that a Web user visits, the higher the degree will be of his/her behavioural concentration. Relative to Web users who visited fewer Websites, those who visited more Websites are more likely to make most of their page-views from and spend most of their online time at just a small proportion of the sites they visited.
  2. The larger the number of page-views that a Web user makes per site visited, the higher the degree will be of his/her behavioural concentration. Relative to Web users who just downloaded a few pages from the sites they visited, those who conduct more page-views per site visited are more likely to make most of their page-views from and spend most of their online time at just a small proportion of the sites they visited.
  3. The faster the speed of a Web user's online information behaviour, the higher the degree of his/her behavioural concentration will be in terms of page-views. Relative to Web users who browse slowly online, those who browse at a faster pace (who have a shorter duration per page-view) are more likely to make most of their page-views at just a small proportion of the sites they visited.
  4. The faster the speed of a Web user's online information behaviour is, the lower the degree of his/her behavioural concentration will be in terms of duration. Relative to Web users who browse slowly online, those who browse at a faster pace (who have a shorter duration per page-view) are less likely to spend most of their online time at just a small proportion of the sites they visited.
  5. Except for those who visit very few sites (i.e., those who visit fewer than ten Websites during a month), number of page-views per site is the most important variable in the explanation of Web users' degree of behavioural concentration among Websites visited.

Conclusion

This paper analyses how Web users differ in their online information behavioural patterns from the angle of their behavioural concentration. From a Web user's behavioural concentration, we look at the degree of concentration of page-views made and visit duration among the Websites s/he visited, and we measure this by the Gini coefficient. We also investigate how number of sites visited, number of page-views per site, and time spent per page-view explain people's heterogeneity in their concentration of online information behaviour. Empirically, through online panel data that record panelists' every move on the Web for a whole month, it is found that these three online information behavioural dimensions explain a substantial part of behavioural concentration in terms of both page-views and duration.

From the empirical findings, an obvious behavioural pattern emerges that distinguishes heavy users from light users. Web users who visited more sites and made more page-views from these sites, i.e., heavy Web users, are more likely to concentrate most of their online activities on a small set of core, anchoring Websites. On the contrary, light Web users who visited a limited number of sites and downloaded fewer web pages are found to distribute their online activities more equally among the small scope of sites visited. If we follow Yoshikane et al. (2003) to distinguish 'absolute concentration' which in our context directly indicates the number of sites visited within the sea of sites and 'relative concentration' which is exactly what we look at here through the Gini coefficient, then empirically we have established a concrete conclusion that for Web users' online information behaviour, relative behavioural concentration is positively associated with absolute behavioural concentration.

Our findings in this study lead to a series of questions whose clarification will further enhance our understanding of online information behaviour. First, it is of great interest to investigate what the set of core, anchoring Websites is and then analyse the degree of overlap among Web users' core sets of sites. It is obvious that on the aggregate level a substantial portion of online activities is concentrated on a few sites (such as big portals, search engines, etc.). However, very little research has been done so far as to the individual-level heterogeneity of the sites that Web users concentrate their online activities. An individual-level analysis, however, may enrich our understanding of heterogeneous online information behaviour. For example, a few Web users' core sets of sites may consist of Web services such as e-mail. Since it is just for convenience that e-mail can be checked via the Web, Web usage in checking e-mail should be distinguished from other types of Web-specific online information behaviour.

Secondly, as it has been quantitatively established that relatively heavy Web users concentrate their online information behaviour on a small set of sites, what role this supposedly important set of core, anchoring Websites plays in the users' everyday online information lives to fit in with their way of life and mastery of life (Savolainen 1995) needs further study. Third, it can be hypothesized that those many sites out of a Web user's core, anchoring set of sites that s/he visits only once or infrequently are actually visited for complementary or exploratory purposes. The existence of these sites in Web users' online experiences indicates that the site-and-context-specific studies supporting the lock-in argument, in which experienced Web users resort to a very small set of Websites for the fulfillment of a specific information need (Bucklin and Sismeiro 2003; Zauberman 2003; Johnson et al. 2003), are at best partially substantiated and need further elaboration. Fourth, as it has been established (Schmittlein, et al. 1993), under certain circumstances users' degree of behavioural concentration may change over time. If a longitudinal study is feasible, then it will be of much more value to track how the number of sites visited, the number of page-views per site, the duration per page as well as how the concentration of Web users' online information behaviour evolves over time.

Focusing on Web users' behavioural concentration across Websites they have visited, this paper investigates heterogeneity in Web users' online information behaviour. The analytical approach proposed herein renders us a picture of online information behaviour that is important in understanding Web users' overall Internet usages and their heterogeneity, which have been missing in the related literature so far.

Acknowledgement

Research funding from the National Science Council, Taiwan (No. 94-2416-H-007-006) is acknowledged by the first author.

References


How to cite this paper

Huang, C-Y., Shen, Y-C., Chiang, I-P. and Lin, C-S. (2007). "Concentration of Web users' online information behaviour" Information Research, 12(4) paper 324. [Available at http://InformationR.net/ir/12-4/paper324.html]
Find other papers on this subject




Check for citations, using Google Scholar


Bookmark This Page

counter
Web Counter
© the authors, 2007.
Last updated: 18 August, 2007
Valid XHTML 1.0!