August 11, 2004

A Note on Information Retrieval, Search Engines, Current Page Rank Technologies

logo.gif

There is an evergrowing depth of information with regards to bodies of knowledge present on the Internet. The problem becomes not whether information exists but 'how' to find specific information within larger bodies of data. Whether academic university site or subject-centred website, questions surround 'metadata' (i.e. data describing data) but also different search methodologies and page rank technologies. By looking under the hood of various search engines, it is possible to glimpse at some of the parameters of 'information retrieval within larger information spaces.

To begin, different 'search engines', employ different 'page rank' technologies. Google uses networked PC's rather than mainframes to 'troll' for information. Page rank technology here relies not so much on how many times a particular 'keyword' appears on a particular page but on how many pages 'link' to the particular page. Google puts this,

Instead of counting direct links, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives.

'Yahoo' allows web developers to suggest under which taxonomy a particular site resides. For example, Regions | Cuba | Social Science | History | The Cuban Rafter Phenomenon describes a hierarchical taxonomic tree that classifies subject areas on a genus/species type model.

Most search engines also include a paid inclusion model and make a distinction between search engine 'submission' (i.e. making sure your site is listed) and search engine 'optimization' ( altering your site so it ranks well for particular terms). Yahoo also has 'human editors' compiling its results as opposed to most other engines which use 'crawlers' or bots. All of these factors should be taken into consideration when either listing or beginning to promote/market/place a site
.

In terms of metadata, (i.e. data describing data), the fast rule here is that the various search engines employ metadata (i.e. keywords, tags) differently. In terms of librarian-type standards for metadata, there are higher standards such as the "Dublin Core Metadata Initiative" which provides a more robust methodology of classifying information but this is not yet widely implemented or used by engines or webmasters though it is gaining acceptance by a smaller group of systems librarians. Circa Summer/Fall 2004, search engine ranking is less science than art and information retrieval methodologies still in a healthy flux vying for both superior search strategies in terms of precision and recall and market share in terms of internet web applications.

Posted by at August 11, 2004 4:04 PM | TrackBack