Importance of Language

Let's get some terminology under our belts before we begin talking about searching...

Natural Language

Natural language is the language that we speak in everyday life. Since we know our native language quite well, we can consider ourselves to be experts in natural language. In fact, many of us perform most of our searches on the Web in natural language. This isn't necessarily a bad thing. But, because of regionalism (how many different words can you use to describe a sandwich on a long roll?), cultural differences, slang, and the variety in languages, natural language is quite complex and not always predictable. Since we know that subscription databases are different than the Web, we may not want to rely on natural language when we search for information in a database.

FYI: In a research study of the way people label concepts and objects, "it was found that two persons agreed on the same term for an object less than 20% of the time, and that an average of 15 terms was needed to achieve an average of 80% agreement" (Wellisch 48).

Wellisch, Hans H. (1995). Automatic Indexing. Indexing from A to Z. Second edition. New York: H. W. Wilson. 41-52.

ACTIVITY: Open up ProQuest, EBSCO, and Google. Put in the following question: What is the effect of moving on students? How do your results differ?

Controlled Vocabulary

Many databases employ controlled vocabulary to organize and index their materials. Controlled vocabulary is not an unnatural language; it does use many of the same terms as natural language, but it consists of a restricted list of terms chosen by human indexers. When you see list of subject headings on a Yahoo! subject directory, a suggested search terms at the bottom of a Google search, entries in a thesaurus, a back-of-the-book index, or a listing of subject headings within an article that is retrieved in a subscription database, you are most likely looking at a controlled vocabulary chosen by people!

Uncontrolled Vocabulary

Uncontrolled vocabulary is the search terms freely chosen from the full-text or abstract of an article. Uncontrolled vocabulary is also synonymous with tagging, where users of a system create their own organization of information. Examples of uncontrolled vocabulary are found in a lot of social networking. Flickr, Facebook, LibraryThing use tagging.

QUESTION: Which is better: having information organized by the users of a system or by one or two indexers?

Stop Words

Stop words are words that are taken out of natural language searching and are controlled by human indexers. These words can include "a," "the," "for," "of," etc. Databases and search engines do not always employ the same list of stop words. And, sometimes, a system that does use stop words could negatively affect a search. Imagine searching for information on Vitamin A in a system that filters out the stop word "a." Yet, some systems can account for using stop words in a search even if they usually filter them out. If the searcher puts the phrase in quotations or uses other search operators, the system will search for the stop word. Unless the searcher knows this information, he/she may not retrieve any information. (Check out the database's or search engine's "help" section for more information on stop words).


Developing keywords is probably the most important searching strategy. It is important for searchers to recognize when certain words are more helpful in the context in which they are being searched. Being able to predict how authors are using vocabulary and how systems are organizing and indexing language takes a lot of skill. Developing this skill proves to be difficult for students. Examine the example below to see how we can help students develop good keywords after they have chosen a topic or developed an essential question:
  • Essential Question: What is the effect of moving on students?
    • Choose words within the actual essential question or topic statement that are good keywords to search for.
    • What are some synonyms for words in the essential question or topic statement?
    • Are there any related words?