Live By Search Engine

Fri, Nov 3, 2006, by Sameer Shrestha

Search Engines

Search Engine Concepts

Every time one surfs the net, the very next page that one would probably open is www.google.com , the search engine. Search engines are highly popular among Internet users. Searching the Internet is one of the earliest activities people try when they first start using the Internet, and most users quickly feel comfortable with the act of searching. Users paint a very rosy picture of their online search experiences. They are happy with the results they find; again, nearly all report that they are usually successful in finding what they’re looking for. And searchers are very trusting of search engines, the vast majority declaring that search engines are a fair and unbiased source of information and feel confidence in their searching skill.

Technically, a search engine is the software and algorithms used to perform a search for data based on criteria. A search engine can provide links to relevant information based on your requirement or query. Whenever one comes across new thing and has a quest to learn about that, the foremost, reliable and easiest way is to turn on to search engine. However, most Internet users re naïve about search engine, how it works and different availability of search engine.

How search engine works

A search engine operates, in the following order

  1. Web Crawling
  2. Indexing
  3. Storing
  4. Searching

Crawling is the method of following links on the web to different websites, and gathering the contents of these websites for storage in the search engines databases. This is done by a web crawler (sometimes also known as a web spider or web robot ) — an automated web browser which follows every link it sees. Usually search engines crawl only a few (three or four) levels deep from the homepage of a website. The term deep crawl is used to denote that the crawler or spider can index pages that are many levels deep. Google is an example of a deep crawler. Crawlers or web robots follow guidelines specified for them by the website owner using the robots exclusion protocol ( robots.txt ). The robots.txt will specify the files or folders that the owner does not want the crawler to index in its database.

The contents of each page are then analyzed to determine how it should be indexed. Similar to an index of a book, a search engine also extracts and builds a catalog of all the words that appear on each web page and the number of times it appears on that page etc. Indexes are used for searching by keywords ; therefore, it has to be stored in the memory of computers to provide quick access to the search results.

Indexing starts with parsing the website content using a parser. The parser can extract the relevant information from a web page by excluding certain common words (such as a, an, the – also known as stop words), HTML tags, Java Scripting and other bad characters. A good parser can also eliminate commonly occurring content in the website pages (such as navigation links) so that they are not counted as a part of the page’s content.

Once the indexing is completed, the results are stored in an index database for use in later queries. Due to cheaper disk storage, the storage capacity of search engines is very huge, and often runs into terabytes of data. However, retrieving this data quickly and efficiently requires special distributed and scalable data storage functionality. Indexes are updated periodically as new content is crawled. Some indexes help create a dictionary ( lexicon ) of all words that are available for searching. Also a lexicon helps in correcting mistyped words by showing the corrected versions in a search result. A part of the success of the search engine lies in how the indexes are built and used. Various algorithms are used to optimize these indexes so that relevant results are found easily without much computing resource usage

In addition to indexing the web content, some search engines such as Google, store all or part of the source page (referred to as a cache ) as well as information about the web pages, whereas some store every word of every page it finds, such as Alta Vista. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot , and Google’s handling of it increases usability by satisfying user expectations that the search terms will be on the returned web page. This satisfies the principle of least astonishment since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.

Once user enters the search keywords , the search engine’s search algorithm looks up the indexes for matches for the search keywords. Once it can match the keywords in the index, the search engine provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document’s title and sometimes parts of the text. Most search engines support the use of the boolean terms AND, OR and NOT to further specify the search query. An advanced feature is proximity search , which allows you to define the distance between keywords.

The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of Web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. This relevance matching is achieved by various search engine algorithms and hence is the bread and butter of search engine’s popularity.

Advanced search engines, like Google, use a relevance page ranking system, to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve.

Most web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.

Top Choices

The search engines below are all excellent choices to start with when searching for information.

Google
http://www.google.com/

Google was originally a Stanford University project by students Larry Page and Sergey Brin called BackRub. By 1998, the name had been changed to Google, and the project jumped off campus and became the private company Google. It remains privately held today.

The crawler-based service provides both comprehensive coverage of the web along with great relevancy. It’s highly recommended as a first stop in your hunt for whatever you are looking for.

Yahoo
http://www.yahoo.com/

Launched in 1994, Yahoo is the web’s oldest "directory," a place where human editors organize web sites into categories. However, in October 2002, Yahoo made a giant shift to crawler-based listings for its main results. These came from Google until February 2004.

Technology AltaVista and AllTheWeb was combined with that of Inktomi , a crawler-based search engine that grew out UC Berkeley and then launched as its own company in 1996, to make the current Yahoo crawler. Yahoo purchased Inktomi in March 2003.

Microsoft

http://www.msn.com/

The most recent major search engine is MSN Search, owned by Microsoft, which previously relied on others for its search engine listings. In 2004 it debuted a beta version of its own results, powered by its own web crawler (called msnbot ). In early 2005 it started showing its own results live.

 

Strongly Consider

The search engines below are other good choices to consider when searching the web.

Ask Jeeves http://www.askjeeves.com/

AllTheWeb.com http://www.alltheweb.com/

AOL Search http://aolsearch.aol.com/ (internal)
http://search.aol.com/ (external)

HotBot http://www.hotbot.com/

Teoma http://www.teoma.com/

AltaVista http://www.altavista.com/

Apart from these, there many more search engines. However, Google turn outs to be the most outstanding as far as popularity is concerned and users’ satisfaction on the search result it produces.

Search engines offer users vast and impressive amounts of information, available with a speed and convenience few people could have imagined one decade ago. Their capabilities are expanding practically by the day. Had there been no search engine most of the information in the Information Highway , Internet would have remained in shadow form majority of the users. Since it is practically impossible to know or memorize the urls of related to ones concern at a time.

Today’s Internet users are very positive about what search engines already do, and they feel good about their experiences when searching the Internet. They are comfortable and confident as searchers and are satisfied with the results they find. They trust search engines to be fair and unbiased in returning results. The way that search engine provides information on any query that a searcher quest for makes us virtually believe that Search Engines are God in IT sector with the characteristics of real God within it- O mnipresent, Omnipotent and Omniscient.

With the potential of technology, growing sophisticated each day, soon it will be routine to able to search the contents of vast libraries of books; to find selected portions of video streams or audio recordings; to benefit from personalized searches that remember a user’s preferences and keep track of changing geographical locations. Audio searching and search results will be available for the blind; “implicit searching” will anticipate users’ queries and have answers ready.

This odd situation, in which a growing population of users relies on technology most of

them don’t understand or know little about how engines operate, or about the financial tensions that play into how engines perform their searches and how they present their search results that are being served up to them , highlights the responsibility placed on search engine companies. They are businesses, in many cases extremely successful ones – but their effects on society are far more than merely commercial. One unexpected implication is that search engines are attaining the status of other institutions – legal, medical, educational, governmental, and journalistic – whose performance the public judges by unusually high standards, because the public is unusually reliant on them for principled performance. Hence in this technology age, the search engine, apart from delivering quick access to information should account for accuracy to the information being imparted.

9
Liked it

Leave a Reply