published quarterly by the university of borås, sweden

vol. 26 no. 4, December, 2021

What kind of sources do I need? Critical search for information on the Web

Martha Vidal-Sepúlveda, Cristian Olivares-Rodríguez, and Luis Cárcamo-Ulloa

Introduction. Previous studies have shown that students have a high confidence in search engines. This poses a significant risk in learning processes if students do not have critical skills for document selection. This study provides clues about the quality of the information sources that university students access in their internet searches, and highlights critical thinking as a key competence in personalised information searches.
Method. A quasi-experimental study was conducted with a sample of fifty-eight university students who solved four information tasks. The sources were categorised according to the quality of their editorial process.
Analysis. We stress the critical thinking in a realistic study regarding to both the interactions of university students with not enough validated sources and the type of information task working with.
Results. The main finding of this study is that students mostly access to sources with a corporate publisher (52.9%) and alternative sources (40%). Consequently, the publisher type of the source is not related to the ranking elaboration.
Conclusions. The greater access to information does not ensure the quality or veracity of it, confirming the need to promote the development of critical thinking.

DOI: https://doi.org/10.47989/irpaper913


Currently, new technologies have permeated all public and private spaces. The classroom is no stranger to this reality, and therefore, new challenges arise for the education system. Knowledge does not only belong to the teacher, instead it flows and converges in the classroom through different technological devices. The opportunity to access a large amount and variety of information (which grows in unimaginable volumes but, at the same time, does not ensure veracity or quality) complicates the distinction between the useful and the disposable. We move from the certainty of the teacher’s words and the physical book to the questioning of those words and the uncertainty of online information.

The amount of information offered by the Internet is endless. The indexed web contains at least five billion web pages (De Kunder, 2019), many with repetitive and dubious quality content, but with immediate access to hundreds of results. This hinders the selection of rigorous information and such overload causes anxiety in students, affecting their decisions (Bawden and Robinson, 2009). Also, the customisation algorithms created to facilitate access to information of our interest, adapt the results pages to a profile that search engines create based on our clicks (Pariser, 2017). For example, our Google search is influenced by our Gmail profile information, geographic location and search history.

Therefore, given the increase in information available on technological platforms, it is necessary to have the critical competencies that allow us to perceive it correctly. Today's students need to develop critical thinking skills to: a) integrate new and changing knowledge, b) form their own thinking matrix, avoiding the risk of hegemony and homogenization, and c) foster the creation of creative ideas in their area of study and daily life (Olivares-Rodríguez and Guenaga, 2015).

In a personalised search environment, which some people find limiting, it is urgent to characterise the information sources that can be accessed, and thus counteract beliefs about new generations and their mastery of technologies. For example, although new generations demonstrate ease and familiarity with computers, they are dependent on search engine results (Rowlands et al., 2008).

The objective of this study was to determine the quality of the sources of information encountered by students when using a search engine to solve a task, and highlight the need for critical thinking as a key competence in personalised information searches. To do this, two groups of first-year university students searched for information on the web to solve factual and research / exploratory tasks. To control the bias of previous knowledge, tasks were formulated from different areas and outside the field of study of the participants. Information sources that were accessed through the search engine were categorised according to quality and veracity.

In the first stage of the study, we categorised the information sources according to their content edition in three categories: websites whose content is declared to be edited by corporations or institutions (Corporate), websites with non-validated alternative edited content (Alternative), and websites with collaborative edited content (Colaborative). In the second stage, we analysed the ranking of the results to compare the different websites in response to the task queries using the RankFlow metric. The study was conducted using the GoNSA2 technology platform (Olivares-Rodríguez et al., 2018), which implicitly registers all user actions. This quasi-experimental study made it possible to analyse results pages from a search engine in the same period, comparing the same task developed by students with similar demographic characteristics (age, sex, language, career, and geographical location) but including the singularity of the personal elaboration of the queries.

Algorithmic challenges for autonomous learning

The following section summarizes previous findings on critical thinking, bringing together its main definitions and its relationship with student learning. We explain the operation of the search engines and the complexities in the interaction between the student and the personalization algorithms implemented in these.

Critical thinking as a learning competence

Although there is no consensus on the concept of critical thinking, it is recognised as a superior way of thinking and a necessary competence in the educational field (da Silva and Rodrigues, 2011). Critical thinking has been defined as a complex type of thinking, useful in academia and everyday life. It is also defined as a type of thinking that allows to evaluate and process in an analytical and reflexive way, creating judgment and accepting or rejecting information of a scientific or daily nature (Yang, 2012). It also allows the creation of new forms of knowledge, while it can also be used to make everyday decisions (Black, 2012; Halpern, 2006).

Psychology highlights the cognitive and self-regulation components of critical thinking, defining it as a superior way of thinking that possesses a set of intellectual skills, aptitudes, and dispositions, the latter of an affective nature (Ennis 2005, 2011). The critical thinker is described as a subject who has developed appropriate standards to evaluate his or her own thinking and who, in turn, uses them constantly to improve their quality and to develop the intellectual characteristics that define the individual as a critical thinker (Elder and Paul, 1994). Scriven and Paul point out that critical thinking is made up of a set of skills to generate and process information, beliefs, and actions based on intellectual commitment (Scriven and Paul, 1992, 2003).

Related to this second approach, Ennis (2005, 2011) defines a concept of critical thinking in which skills are cognitive components and dispositions are affective components, and he or she establishes two dispositions: a) the concern for solidity and b) the concern for justice. In the concern for solidity the critical thinker seeks the truth and that his or her decisions are justified. Faced with the concern for justice, the critical thinker presents his or her position and that of others, showing the foundations and bases of his or her convictions. Ennis (2002) proposes as skills: basic and advanced clarification, fundamentals for decision making, inference, assumption and integration, and auxiliary critical skills, offering a scheme of intellectual operations that are exercised on the information previously collected and selected. Kuhn and Weinstock (2000) also point to metacognitive and epistemological evaluation competencies as essential components in the development of critical thinking. It is no longer enough just to have intellectual competencies and dispositions; the student must also be able to think about his or her own thinking. Daniel (2002) also proposes a critical thinking structure that contains the logical, creative, responsible and metacognitive dimensions.

A Delphi project defined an agreement on the essential skills of critical thinking: analysis, interpretation, self-regulation, inference, explanation, and evaluation (Facione, 1990). According to this, the difference between a critical thinker and a non-critical thinker is given by these cognitive skills and dispositions. The dispositions are defined by experts as: the attitudes that lead us to use our cognitive abilities to focus on critical thinking. It is worth mentioning that a subject may or may not use his or her critical skills, based on his or her dispositions or motivation (Facione, 2000; Saiz and Rivas, 2008; Valenzuela and Nieto, 2008). This could answer how, when and why people decide to use critical thinking skills, what triggers their use or, conversely, why they choose not to use them. According to this, there is some people with a high disposition to critical thinking and others with a low disposition. The activation of critical thinking would be subject to the personal decision of the subject and not to the absence of the cognitive functions necessary to apply it (Saiz and Rivas 2008; Saiz et al., 2015). Therefore, the disposition to critical thinking would determine different ways of facing life, and there would be people who make more thoughtful decisions and others who would not use their full intellectual potential.

A third approach is proposed by the American Library Association (Eisenberg and Berkowitz, 2003) who analyse critical thinking based on skills related to search, location and use of information. ALA defines nine standards for critical thinking, organized into three groups of basic standards: information literacy, independent learning, and social responsibility. According to this:

Therefore, the American Library Association's interest is focused on information literacy as a crucial competence for the subject to access the evidence and construct the arguments that allow the individual to reason logically, away from personal biases that could influence problem-solving. We can see that critical thinking goes beyond scientific thinking, as it must incorporate the affective aspects of the subjects. We conclude that critical thinking is a superior way of thinking, which requires the direct action of the person who is predisposed to intellectual work. Persons who think critically are willing to evaluate, reflect and judge, accept or reject ideas or concepts based on evidence and not on their own ideas or personal convenience. In addition, they constantly monitor the influence of their emotions, applying logical criteria and standards, which they are constantly willing to question.

Information search for autonomous learning

Today, access to information is primarily achieved through technology. When students consult a search engine, the information available on the Web is retrieved through linked indexed electronic documents or resources, which are displayed according to their relevance (Page et al., 1999). However, the relevance of search output in relation to the query made depends on technological and human variables, which together complicate the search for information.

According to the report of the Organization for Economic Cooperation and Development (OECD, 2018), the diversity of pages, the inherent ambiguity and redundancy of the queries, and the low levels of reading comprehension cause difficulties in the search process. This reduces the exploration of the search output and increases frustration, especially in children and young people (Foss and Druin, 2014). It should be noted that in the search process, cognitive, physical and emotional aspects of the students are involved (Kuhlthau, 1991), as well as the ability of technology to respond to the needs of users (Marchionini, 2006).

Models that provide query recommendation (Duarte Torres et al., 2012) and that detect the user's purpose (Sadikov et al., 2010; Santos et al., 2003) also determine the efficiency of the search process. These, in theory, emerge as alternatives to increase the efficiency of user search against the growing cloud information, reducing the frustration caused by hundreds of output pages with diverse information. However, search personalisation based on user history reduces the search space with filters that determine which results are close to our interests or areas of study, leaving out those that are less related and that present antagonistic or contradictory information.

This ‘bubble’ (Parisier, 2017) limits access to quality information. A student who explores the Web with the purpose of solving a learning task does not have access to all the diversity of contents (Parisier, 2017) as these will be limited to the sites that he or she frequently visits or to similar ones. Therefore, if the student never or almost never explores quality sources, the possibility of accessing them when required is reduced. For Jiang (2014b), the ranking of search outputs depends on user clicks, but also on language, writing style, site popularity, and geolocation. This last factor is decisive in the search output received by the user, as there are significant differences in the information received depending on the geographic location of the user (Jiang 2014a, 2014b; Cano-Orón, 2019).

It is also necessary to highlight the importance of advertising in the search engine business, which coordinates the interests of both content distributors and advertising (Rieder and Sire, 2014). In this sense, the delivery of search output through content ranking is a biased model, because the algorithm determines the priority of some contents over others (Lewandowski, 2017, Rieder and Sire, 2014, Jiang 2014a, 2014b), influencing the access to information. Another factor to consider in search engine biases is media influence, as this also affects the decision making of the search algorithm (Cano-Orón, 2019).

For Haim, Graefe and Brosius (2018), the bubble does not isolate internet users, because personalisation does not limit access to all possible options. However, studies that have stressed the bubble have limitations as they are not conducted in real contexts: the information sources are news from a single newspaper, the study considers only a particular type of diversity (Möller et al., 2018), or participants are simulated with profiling algorithms (Haim et al., 2017).

Finally, the search for information on the Web is a fundamental skill in media literacy; however, most students do not have the necessary skills to access information efficiently (Druin et al., 2009; Qureshi et al., 2015; Şendurur and Yildirim, 2015).

Research questions

The opportunity to access quality information on the Web is conditioned by technology and the critical competencies of the student. If the student makes a greater effort to explore the Web and formulate relevant queries, access various sources, select and read these sources, he or she manages to escape the bubble. Therefore, a student who has critical competencies when facing search output pages will select the information to be used according to logical criteria, focusing on the accomplishment of the task, privileging the veracity and quality of the sources. This student will take advantage of the information search technology to achieve his or her learning objectives, this is why promoting critical thinking is key in hypermediated learning contexts.

The two questions this study seeks to answer are the following:

Material and methods


The sample was composed of first-year engineering students, between 18 and 19 years old. Informed consent was signed by fifty-eight students out of a total of sixty-five. Of these students, fifty-three were male and only five were female. Data collected were treated confidentially, anonymously and in an aggregated way, and participants were told they could withdraw at any time and without any reprisals.

Design of information search tasks

Participants received two factual and two research or exploratory tasks (Table 1). Factual tasks refer to specific data search tasks that do not require elaboration by the student but are satisfactorily resolved from sources of authority in the subject. Research or exploratory tasks require greater intellectual effort as the student must look for different sources to solve the task (Marchionini, 2006).

To solve both types of tasks, students had to look for information on the Web. They had to evaluate the veracity and quality of the sources reached with their queries. For the development of factual and research or exploration tasks, students had five and twenty-five minutes, respectively. To control bias from previous learning, the topics of the tasks were proposed in different areas of knowledge and outside the formal learning of the participants, considering tasks that have been previously used in the literature (Arguello et al., 2012; Wu et al. 2012; Kules and Shneiderman, 2008). In addition, tasks were designed in accordance with the proposal of Wildemuth and Freund (2012).

Sea depthFind the name of the deepest point in the ocean.Factual
HIV in ChileFind the number of people with HIV in Chile.Factual
Crime in ChileDetermine the components that support a plan to combat crime in Chile.Exploratory
Build a carIdentify the necessary steps to build a car.Exploratory


Searches were performed using the GoNSA2 technology platform (Olivares-Rodríguez et al., 2018), which interacts with the Microsoft Bing search engine. GoNSA2 is a platform that supports the design of information tasks, the performance of the search, the integration of student behaviour and the evaluation of the solutions developed. Figure 1 shows the information task resolution interface presented to the students, divided in a) the challenge with the goal and description (left), b) the search box and search output (middle), c) the solution box (right-upper) and d) the personal library (right-lower). The main contribution of GoNSA2 is the generation of detailed information obtained from the search of information, the queries, the documents reached with such queries, and the solutions delivered by the students.

GoNSA2 records all actions performed by users when searching for information to solve tasks. For each task, GoNSA2 records the queries issued by each student, and for each query, it stores the documents and sources delivered by the search engine. It also records the actions that were executed on the results, e.g., if they were seen, stored in the personal library of the task or if they were deleted. In addition, GoNSA2 records the intermediate solutions developed by the students and the time in which the student performed a specific action, from the beginning to the end of the task.


Figure 1: GoNSA2, task resolution interface.


We presented the functionalities of the technology platform at the beginning of the session. Then, the students had five minutes to solve each of the factual tasks, and twenty-five minutes to perform each of the exploratory tasks. At the end of the session, students had the possibility to solve six optional tasks.

Source categorization

In order to develop a source categorization, we have considered the publication model behind the websites as a proxy to the information quality assurance. First, we categorised the sources according to the process of content publication in institutional or corporate publisher (Corporate), alternative publisher (Alternative), and collaborative publisher (Colaborative). Corporate sources correspond university sites, research centers, specialised press and scientific journals, among others. Alternative sources correspond to informational sites or sites for the dissemination of miscellaneous content, blogs, etc. that do not have a formal editing process. Finally, Colaborative sources correspond to sites such as Wikipedia. In a second stage, we analysed the ranking of the results found for each task, to compare the fluctuation of the websites in the queries made, using the RankFlow metric (Rieder, 2014).


Quality of the information sources according to their process of content edition

To categorise the quality of all the information sources accessed by the students, we reviewed the process of content publication of each of the websites. They were categorised as: Corporate, Alternative or Collaborative. The analysis of source quality was carried out for each of the two participating groups and for each type of task proposed (Figure 2).

Considering all the proposed tasks, 52.1% of the information sources corresponded to the Corporate category, followed by 40.7% of Alternative, and 7.2% of Colaborative. When we analysed the quality of the sources according to the type of task requested, the type of source that the students accessed did not seem to depend on the type of task requested, but on the media agenda of the country (Cano Orón, 2019). For the ‘HIV in Chile’ task, in which the students had to deliver the exact number of people living with HIV in Chile, we obtained 82.1% of corporate sources (Figure 2). This can be explained because this study was carried out in a period of great discussion of the subject, after the media covered the increased HIV contagion and questioned the preventive campaigns in the country. This resulted in increased corporate content (government, specialised press, etc.). Contrary to this, in the factual task ‘depth of the sea’, in which it was requested to find the deepest point of the sea, 54.4% were Alternative sources (Figure 2).


Figure 2: Source type characterization.

On the other hand, for research or exploratory tasks we raised issues of media interest such as crime and other topics in the scientific and technological fields. For the ‘crime in Chile’ task, students had to investigate the Web to propose a solution, obtaining 64.1% of Corporate sources, 31.3% of Alternative sources and 4.6% of Colaborative sources. However, for the ‘build a car’ task, mostly Alternative sources were obtained (56%), followed by 34.6% of Corporate sources (Figure 2). It is interesting that Colaborative sources obtained a low presence in the search output rankings for all types of tasks, with only a slight increase in the scientific and technological tasks such as ‘sea depth’ and ‘build a car’. Possibly, the characteristics of the topics do not place them among the most popular searches on the Web.

Website ranking

To analyse the fluctuations of the search output rankings between the different tasks proposed, we compared the rankings using the RankFlow metric (Rieder, 2014), which measures the average deviation distance with a maximum of 1. We considered the first twenty search outputs obtained in each student’s search, for each proposed task. For the ‘HIV in Chile’ task (Figure 3), we analysed twenty-two students who developed an identical query. The website ranking in Bing showed low fluctuation, as deviations were observed in only two students. The website positioned in the first place was the same for all students. This low fluctuation can be explained because GoNSA2 connects directly and uniquely with Bing. Therefore, the diversity and fluctuation can be only explained by the personal decision of each participant through its search strategies.


Figure 3: Variation of the search outputs for the task: ‘HIV in Chile’

For the ‘build a car’ task sixteen students searched ‘how to build a car’. In this case, the website rankings showed non-relevant fluctuations, which reached 0.09 average distance of deviation from a maximum of 1 (Figure 4).


Figure 4: Variation of the search outputs for the task: ‘How to build a car’

One of the unexpected findings of this study was a student who made thirteen different queries to solve the ‘Build a car’ task, seven in Spanish, five in English and a final one in Spanish. For the same task, the average per student was four queries, privileging the use of the mother tongue with 226 queries in Spanish and only six queries in English. The average deviation distance for the thirteen queries of this student was 0.9 (Figure 5). But, beyond the fluctuation, the relevance of this finding is that the student elaborated queries that allowed them to reach a significant number of new results in each of their queries. Also, they developed a clearly defined search strategy by querying generic and Spanish key terms and, then, specific and English terms, and finally, back to Spanish but specific terms. Such a search strategy provided access to a more diverse set of ordered results against other participants, mainly when English is used (B8-B12). Indeed, the student obtained different results even in the first place of the Web ranking, and reached a greater number of Corporate sources compared to the sample (student: 40.5%, sample: 34.6 %). For this task, the sample obtained 56% of Alternative sources. Therefore, search strategy determines the diversity of results, but users are responsible by deciding to move to a deep exploration by questioning the quality of information.


Figure 5: Search output variation in the 13 queries performed by a student to solve a research or exploratory task

Discussion and outlook

The objective of this research was to determine the opportunity for students to access quality sources when they search on the Internet, considering the biases of the customization algorithms.

We found that most of the websites obtained from the task searches correspond to Corporate sources (52.1%), followed by Alternative sources (40.7%), and lastly by Colaborative sources (7.2%). Although students mostly accessed Corporate sources, there is a high presence of non-validated information, which tests students' critical thinking to select the most appropriate source. It would be interesting to analyse why there is a low presence of Colaborative sources in search outputs, considering that the growth of this type of information sources is sustained by the democratization of digital content creation; however, they are not relevant to the algorithm of the search engine. Therefore, user explorations, analysis and decisions mediated by critical dispositions are crucial to improve the relevance of more qualified sources, such as Colaborative or authority websites, because the relevance algorithm is based on user actions.

When analysing the quality of the sources versus type of task, the differences observed did not depend on the difficulty of the task (factual or exploratory), but on its topic. We found that the sources reached were mostly of the Corporate type for topics highly covered by media such as ‘HIV in Chile’ and ‘Crime in Chile’. Coincidentally, both topics correspond to the area of science and technology. As this is a priority area in Chilean education, it is worrying that most of the information that student's access is not of high quality. However, we have categorised the sources based on its publication process as a quality proxy, but information of high quality and relevance can be also found in authority websites, classified as Alternative sources. Therefore, user decisions mediated by critical dispositions are crucial to increase the relevance of quality sources by performing deeper explorations of results.

We analysed the fluctuations of website rankings (RankFlow) elaborated by the search engine Bing, considering only the first twenty websites in order of appearance. For the ‘HIV in Chile’ task, we observed a low fluctuation that in terms of source type showed a greater percentage of Corporate sources, which should ensure quality and veracity in the information. A low variety of sources explains the lower variation in the rankings, which remained practically the same for all participating students. On the other hand, for the ‘build a car’ task, which showed a high deviation in the website ranking, the diversity of the available sources was higher, but this does not ensure the quality or veracity of the contents because most of the websites corresponded to UNVAL sources.

An unexpected finding was a student who elaborated thirteen queries to solve the ‘build a car’ task. His or her actions reached different rankings for all the queries, despite their semantic proximity. In addition, the student conducted queries in Spanish and English, which expanded the possibilities of accessing diverse information and increased access to Corporate sources, exceeding the performance of both students’ groups. In the same task, another student made five different queries with only one of them in English, but the low number of queries seemed to prevent significant fluctuations in the website ranking. The same result was observed for the general average of the sample, which only reached four queries. This led us to conclude that the number of queries affects the possibility of reaching a greater variety in the ranking of results, because every new query is related to a user perception of not enough information.

We have demonstrated that the type of edition of the source is not relevant for the search output ranking, unlike other variables such as user history, site popularity, geolocation, language (Jiang, 2014b) and topic media coverage (Suárez-Gonzalo, 2017). Therefore, it is essential to encourage critical thinking in students to counteract the belief in search engine rankings. This, because students usually accept the first sources that appear in the ranking as valid (Noble, 2018), considering them closer to their query and associating them with quality content and veracity at the snippet level, without clicking the websites.

Personal search strategy defines the diversity and quality of sources which are provided by the algorithms of search engines. Such a strategy is mainly based on personal decisions of which terms are used, what sources are explored, how many queries are submitted and which languages are used. Such interactive decisions are considered by the relevance algorithms of search engine by providing more diverse results. Therefore, users with critical dispositions during sources analysis provide a strong relevance feedback in order to support quality sources not explored previously.

Considering that other studies confirm the use of search engines by young people to solve their academic tasks and perform leisure activities (García-Marín and Cantón-Mayo, 2019; Rowlands et al., 2008), it is key to take actions to improve their critical thinking skills. This will allow students to overcome the personalisation of search engines and access quality content. It seems particularly necessary to educate about information search in the first years of university.

We have defined some pedagogical guidelines: a) define a first year course at degree level about information search with a critical view, which is focused on media and information literacy, source credibility and algorithmic decisions of search engines, b) promote to students a critical analysis about personalisation and recommendation algorithms behind search engines, c) promote to students the proper use of quality sources from the first year and d) promote a search with an inter-lingual strategy to reach a higher diversity of results during English and Spanish courses.

To continue with this study, we will expand the sample to university students from other careers and other countries, and will compare the results obtained with other areas of knowledge and with the geographical location of the queries. We also plan to conduct a new experiment in another search engine such as Google, to study the possible differences in the students’ search results, its influence in Web browsing behaviour and in document selection criteria and solution elaboration. Finally, we are planning a longitudinal study in order to analyse the evolution of search strategies through the different levels of a degree programme.


The authors sincerely appreciate the support provided by the Universidad Austral de Chile, through the Instalación 2020 programme.

About the authors

Martha Vidal-Sepúlveda is an academic in education and social communication at the Universidad Austral de Chile. She is a PhD student in Communication from the Universidad Austral de Chile and Universidad de la Frontera. Her research interests include media literacy, critical thinking and information search. She can be contacted at: martha.vidal@uach.cl | ORCID iD: https://orcid.org/0000-0002-0929-8179
Cristian Olivares-Rodríguez is an academic in computer science at the Universidad Austral de Chile. He holds a PhD in Engineering from the Universidad de Deusto. His research interests include learning analytics, educational data science and information behavior. He can be contacted at: cristian.olivares@uach.cl | ORCID iD: https://orcid.org/0000-0002-4991-5784
Luis Cárcamo-Ulloa is an academic in social communication at the Universidad Austral de Chile. He holds a PhD in Perception, Communication and Time from the Universidad Autónoma de Barcelona. His research interests include news and data science, media literacy and technologies, and media psychology. He can be contacted at: lcarcamo@uach.cl | ORCID iD: https://orcid.org/0000-0003-0633-9606


Note: A link from the title is to an open access document. A link from the DOI is to the publisher's page for the document.

How to cite this paper

Vidal-Sepúlveda, M., Olivares-Rodríguez, C., & Cárcamo-Ulloa, L. (2021). What kind of sources do I need? Critical search for information on the Web. Information Research, 26(4), paper 913. Retrieved from http://InformationR.net/ir/26-4/paper913.html (Archived by the Internet Archive at https://bit.ly/3m79eeJ) https://doi.org/10.47989/irpaper913

Check for citations, using Google Scholar