Developing an information search strategy: Restrictions
Metadata and possible restrictions
Picture: Examples of selections and perspectives offered following a request on the press database Factiva.
Each category corresponds to a metadata, in this case: a type of topic, a source name, a date.
The documents also have:
- a title
- an author
- an abstract
- a text
- + many other metadata: Topics, languages, APE code, press item, etc.
Each metadata added to a document adds a possibility of selection and restriction and offers an extra perspective on the base and the topic explored.
Other possible selections
- Searching for <Université de Brest or Université de Bretagne Occidentale> in the Address field (Corporate author for Pascal database) will bring all documents whose authors have provided this address (hence the importance of signing your document: Name + reference institution).
- Searching for <Brest> in the field Conference Information of the Pascal database will only display conferences held in Brest ( or possibly dealing with Brest).
- A search on the SUDOC restricted to "theses" as the type of document and "Brest SCD" as the localisation filter will only display theses defended at Brest University.
- Searching for "Bill Clinton" in the field "topic" will only display books written on Bill Clinton. In order to display books written by Bill Clinton, you will need to use the "author" field.
Reducing noise on Google
Google works on full text. All (or nearly all) the words are entry points. Its index contains billions of completely heterogeneous pages which do not undergo any standardised editing process.
Google's main selection criteria to rank them is their popularity. It is in proportion with the amount of links directed to these pages.
This criteria can be efficient, insufficient, crippling sometimes especially for requests which produce over a thousand results (most requests).
To reduce this noise you should use fields common to all web pages and exploitable by Google (advanced search):
- The page title: which are the pages whose title contains this/these word(s)?
- The URL: which are the pages whose URL contains this/these word(s)?
- The server name (ex.: http://www.univ-brest.fr/): Which pages used on this server contain this/these word(s)?
- The domain name (.fr, .org, .ca etc.).
- The anchors (clickable words connected by a link to another page).
- Links: Which pages are directed onto this page?
You also need to enter several words and choose them carefully, preferring discriminating terms (rare and/or precise) to common and generic terms.
For instance: A request on Google with the term "socio-technique" will direct you straightaway to the works of the innovation sociology centre (centre de sociologie de l'innovation) of Paris-based Ecole des Mines. This is an excellent term for spotting – in one single request – a part of their works, or of the works of those who refer to it.
Using character chaining: "-"
By inserting a chain of words between quotation marks, you ask Google to treat these words as one phrase.