Thursday, February 19, 2009

How the Internet is Indexed

How the Internet Is Indexed
ENC's Associate Director of Instructional Resources explains how teachers can help students use web indexes, directories, and multi-threaded search engines to navigate the Internet.
At the end of this article, see:
· Roempler's Recommended Resources
by Kimberly S. Roempler, ENC Instructional Resources
You have heard the World Wide Web has it all--lesson plans, activities, professional development ideas, real data, software, newsgroups, and much more. But how do you get to it? Searching the Internet can be frustrating and time consuming for you and your students. However, once you know how to search, the possibilities are limitless.
I realized first hand the importance of good search techniques through an assignment I gave to college students taking my introductory science course. The assignment was suggested by a picture of a light bulb with a list of all the chemical resources that are needed to produce it. I asked my students to choose some other common item and to search the web to create the same sort of list. What better place than the Internet to find such information?
A week later, I asked how everybody was doing on the assignment. Talk about frustrated! They were ready to give up. One student was particularly vocal. He had chosen something simple--a match--but his search of the Net had revealed nothing about the chemical components of a match. However, he had learned how to make plastique and how much it would take to blow up a school bus. Other students had similar stories. I realized then that I needed to know much more about searching the Internet before I could help my students use it productively.
Understanding the Challenges
Unfortunately, no single Internet search tool can be truly comprehensive because of the lack of a common indexing scheme for Internet materials. In other words, universally accepted cataloging standards, such as those found in libraries, do not exist for the Internet. Since some search tools are more effective for certain topics than others, you will have to use several tools. There are many similarities among the tools, but each has its own strengths, weaknesses, and peculiarities.
Another challenge is that Internet searches frequently result in the "all or nothing" dilemma--either far too much or no material at all is retrieved. This problem can be reduced by following appropriate search strategies.
The first step in becoming a proficient Internet researcher is understanding the different search tools. Basically, search tools fall into one of two general categories, indexes and directories. In addition, multi-threaded search tools combine the functions of indexes and directories.
Web Indexes
If you think of the Internet as a gigantic book, web indexes perform the same function as a book's index, referencing names, technical terms, and concepts found therein. Keep in mind, however, that the book is so big and is growing so fast that no index can keep up. Indexes offer the most comprehensive compilation of web documents. Searching indexes tends to return extensive lists of resources, and the task of sorting through them can be overwhelming.
Indexes require you to develop expertise in mastering search language. One of the biggest problems with indexes is narrowing your search sufficiently to retrieve a manageable number of documents relevant to your topic.
Web Directories
To stay with the book analogy, web directories are like a table of contents, which helps you locate major sections and subsections of a book. In their simplest form, web directories are merely catalogs of links to other web sites. They contain main subject headings and several levels of subheadings.
Some web directories are so vast that they have tools to search their own contents. Yahoo!, which started off as two graduate students' list of favorite web sites, is probably the best-known web directory. Two other directories are Magellan and the Argus Clearinghouse. Use of these three tools illustrates the vastly different policies for selection criteria in different web directories.
Choosing Among the Tools
A major distinction between directories and indexes is the way their content is compiled. Indexes cast a broad net using artificially intelligent computer programs with descriptive names like crawler, bot, worm, spider, wanderer. These programs capture all information that meets their data-collection criteria. Directories tend to be more discriminating. Human editors normally sift through web documents and list those that meet a site's selection criteria.
Directories tend to produce the most relevant results when you are searching for a general topic, but they may not be as comprehensive as indexes. In addition, directories usually are not as up-to-date as indexes since document selection is not automated.
Confused? One solution is the use of multi-threaded search tools that simultaneously search multiple directories and indexes. These are useful when the topic is obscure and you are not having luck with your search. These tools are also helpful when you want to find as much as you can with a single search statement and your search is not complex. The better multi-threaded search engines remove duplicate files and provide some information along with the document title.
Understanding the different Internet search tools is the first step in developing productive searches. To help my students finish their assignment, I gave them the information in this column and required them to use each search tool, entering the same query information in each. The diversity of the results of their searches was amazing. We also discussed search techniques specific to each search tool.
As you and your students improve your web-searching skills, keep in mind another important issue: quality control. Since the material on the Internet is not checked for accuracy, we all need to view the information with a critical eye.
Reference
Owston, R. (1998). Making the Link: Teacher professional development on the Internet. Portsmouth, NH: Heinemann. (ENC-012911)
Roempler's Recommended Resources
Web Indexes
Alta Vista: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.altavista.com/Provides a very large full-text database searchable by keywords, phrase, or field; performs complex Boolean searches.
Infoseek: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://infoseek.go.com/Searches the full text of a relatively large database using implied Boolean logic with field-search options; retrieves results quickly.
HotBot: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.hotbot.com/ Searches a very large database with field and media search options using a fill-in template to guide your query.
Excite: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.excite.com/ Provides an up-to-date database searchable with Boolean logic, keywords, or natural language; provides concept searching when you don't know what terms to use.
Lycos: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.lycos.com/ Contains a very large database with a variety of Boolean and term proximity options; you can also control the relevancy ranking of your results.
Web Directories
Yahoo!: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.yahoo.com/ Provides broad but unevaluated subject coverage allowing an overview of what is available on the Internet on your topic.
Magellan: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://magellan.excite.com/Identifies generally good quality sites thereby reducing the need to wade through sites of lesser quality. (The database of Reviewed Sites is currently not being kept up to date.)
Argus Clearinghouse: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.clearinghouse.net/ Finds a collection of Internet resources on specific topics recommended by specialists.
Multi-threaded Search Engines
MetaCrawler: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.go2net.com/search.html/ Processes results fast, removes duplicates, and presents results in order of relevance.
Inference Find: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.infind.com/ Searches multiple search engines and groups results by concept and Internet site.
Ask Jeeves: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.aj.com/ Allows you to ask a question in plain English, and after interacting with you to confirm the question, takes you to one web site that provides an answer.
Dogpile: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.dogpile.com/ You set the order in which the search engines are searched.
Web Sites on How to Search the Internet
University at Albany Libraries Searching the Internet: Recommended Sites and Search Techniques http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.albany.edu/library/internet/search.html Provides strengths and weaknesses for each index and directory listed as well as information on the search syntax (language) used.
Searching the Web: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://quest.arc.nasa.gov:80/common/web-search.html Collects some of the most useful search tools on the web and categorizes them based on what information is available (e.g., Internet catalogs, software, people, publications).
Spider's Apprentice: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.monash.com/spidap.html Evaluates different search tools. It has come up with the following rating system for search tools:
· Biggest, Fastest, Coolest
· Most comprehensive results
· Highest overall usability rating
· Most relevant results
· Most likely to find a hit when others can't
For current ratings, visit Spider's Apprentice: http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.monash.com/spidap.html
Another example of a web searching tutorial is Searching the Web at http://www.enc.org/redirect/0,1366,0,0.shtm?Url=http://www.niti.org/enc/
Citation information
Roempler, Kimberly S.. April 1999. How the Internet Is Indexed. ENC Focus 6(1)

No comments:

Post a Comment