This week in DITA we scratched the surface of databases with structured information and information retrieval (IR) with unstructured or semi-structured information. This could also be described as covering how good we are at the ‘put’ and ‘pull’ of information management: where shall we put it and how to we pull it out when we need it again?
Our lecture started with a bit more Foucault, The Hitchhiker’s Guide to the Galaxy, Withnail and I and Shakespeare. These were all intended to get us thinking about the meaning of things and that the meaning of things and the ordering things are always cultural and always contextual. How we search for things always depends on where we are starting from: who we are, where we come from and what we already know even when we think we are searching ‘objectively’. No search is ever neutral.
The Seeker, their Interests and their Needs
There is a long literature on information seeking that has and continues to attempt to classify information needs and types of query into a search taxonomy. There is a good overview of some information seeking models and approaches to modelling information seeking in Chapter 3 of Marti Hearst’s 2009 book Search User Interfaces which is freely available online for personal use. This overview includes the classic search model adapted for the web by Andrei Broder in his 2002 article A Taxonomy of Web Search. It is this article that we used to classify web information needs:
For navigational the only need is to go to a particular place on the web. We know now more about the task to be performed when once there. “how to get I get to?” question. For transactional the need is to access a particular service and/or perform a particular task. It is a “where can I do this?” type of question. Informational broadly covers all other types of searching and its aim is to “acquire information” (Broder, 2002). It’s a “I would like to find out about …?” type of question. Whilst this type of query is broad, Broder suggests it was responsible for less that 50% of web queries so so it can’t be assumed as the primary need when searching on the web. A review of more recent literature might reveal whether Broder’s taxonomy is as enduring as his process model.
With this taxonomy Broder links information needs more closely to task intent. This may be even more important for non-text and media retrieval. The Hearst book has a good section on query intent including a reference to a 2006 study A Goal-based Classification of Web Information Tasks by Keller et al. They found four main information tasks:
- Fact Finding
- Information Gathering
In our class we used this alternative scheme:
A known-item task involves an information need about a particular thing. A fact task involves finding our unknown information about a particular thing. A subject task involves an information need about things within a particular topic. An exploratory task involves understanding the information, topics or things that may exist within a particular information sphere.
Once the user has an information need, whether it is well understood and verbalised or not, the second part of information retrieval is asking a question within a search ecosystem. This could be by running a query, using languages such as Structured Query Language (SQL), from structured databases or using different search techniques for unstructured or semi-structured information bases such as the web.
Search techniques include:
- free text searching (with or without “” phrase operators)
- using boolean operators
- using advanced search functionality where provided
- query modification i.e. changing the question based on initial results
- using facets where available
Asking the Web Questions
As information has proliferated on the web, search engines have become one of the primary interfaces for mediating between questions and information.
They compile vast indexes of web pages and the mesh of links between them; map patterns and make new connections by mining their index and compute complicated and secret algorithms for ranking results within the index when asked a question.
They also pay attention to our web browsing history and anticipate our information needs based on our previous information seeking behaviour. On the plus side this can boost relevance but the downside is the risk of ‘filter bubbles‘.
They are increasingly good at answering natural language queries as the questions we submit get lazier and the underlying technology gets smarter.
They are getting better by analysing how millions of us seek information. They are learning from us every time we get answers from them. They are deriving explicit knowledge from tacit information by following what we do. At the moment web searches return results as lists of links; in the future they could provide answers to questions you haven’t yet thought of.
“We can use the Knowledge Graph to answer questions you never thought to ask and help you discover more.” – Google via The Observer
Google is building a ‘Knowledge Graph‘ from this insight that uses real world searches to meaningfully associate the “the 500 million most searched for people, places and things in the Google world”. In the Knowledge Graph it is not only these 500 million things that are important; the connections between them are meaningful things too. This is the essence of the semantic web and efforts like Knowledge Graph are seeking fully semantic, fully human search methods.
Is this creepy? Exciting? Dangerous? Welcome? It may be all of these things but it is certainly a pathway the architects of web search and connected knowledge are happily exploring.
Featured image credit: Library – the original search engine by Enokson. Source: Flickr. (CC BY)