A search engine is a software program or script available through the Internet that searches documents and files for keywords and returns the results of any files containing those keywords. Today, there are thousands of different search engines available on the Internet, each with their own abilities and features. The first search engine ever developed is considered Archie, which was used to search for FTP files and the first text-based search engine is considered Veronica. Today, the most popular and well known search engine is Google.
Because large search engines contain millions and sometimes billions of pages, many search engines not only just search the pages but also display the results depending upon their importance. This importance is commonly determined by using various algorithms.
The picture gives an example of how a search engine works. As can be seen in the image, the starting point of all search engines is a spider or crawler, which visits the pages that will be included in the search and grabs the contents of each of those pages.
Once a page has been crawled the data contained within the page is processed, often this involves stripping out stop words, grabbing the location of each of the words in the page, the frequency they occur, links to other pages, images, etc. This data is used to rank the page and is the primary method a search engine uses to determine if a page should be shown and in what order.
Finally, once the data has been processed it is often broken up into one or more files, moved to different computers or servers, or loaded into memory where it can be accessed when users perform a search.
Some of the most common search engine are:
|Crawler-based search engine|
|AllTheWeb||Crawler-based search engine|
|Teoma||Crawler-based search engine|
|Inktomi||Crawler-based search engine|
|AltaVista||Crawler-based search engine|
|Open Directory||Human-Powered Directory|
|Yahoo||Human-Powered Directory, also provide crawler-based search results powered byGoogle|
|MSN Search||Human-Powered Directory powered byLookSmart, also provide crawler-based search results powered by Inktomi|
|AOL Search||Provide crawler-based search results powered by Google|
|AskJeeves||Provide crawler-based search results powered by Teoma|
|HotBot||Provide crawler-based search results powered by AllTheWeb, Google, Inktomiand Teoma, “4-in-1” search engine|
|Lycos||Provide crawler-based search results powered by AllTheWeb|
|Netscape Search||Provide crawler-based search results powered by Google|
Working of search engine:
Most of us use search engines every day to search about particular query and search engines return some best results those are relevant to that query.Here we will see how search engines works?
Search engines work in three different phases
- Web Crawling
Web search engine stores information about webpages which they retrieve from html itself.These pages are retrieved by web crawlers most of time we call as Spider.Web crawlers crawls site and follows each indexed link of every site.You can restrict these web crawlers from robot.txt which is made for these web crawlers. These crawlers then extract all the information about web page from title ,meta tags, headings & content and stores in database.
When user types a search query then first of all search engines check results in database file if they have already incurred query ,Google uses cache mechanism for that where they store result of search queries to minimize time.Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document’s title and sometimes parts of the text. The index is built from the information stored with the data and the method by which the information is indexed. Unfortunately, there is not one search engine that allows to search documents by date. Most search engines support the use of the Boolean operators AND,OR and NOT to further specify the search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.
Working of Search Engine is shown as below,
How Search Engine Work
The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the “best” results first. How a search engine decides which pages are the best matches, and what order the results should be
shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an “inverted index” by analyzing texts it locates. This second form relies much more heavily on the computer itself to do the bulk of the work.
Components of search engine
Everyday, the Search Engines average 300 MILLION searches. In a recent Forrester Research report 81% of consumers on the Internet find products and services by using the Search Engines. Search Engine Optimization allows you to achieve top search engine placement and a tap into a new source of qualified visitors who are actively searching for products and services on the Internet.
1 identify keyword
The first way to identify keywords is to input the domain or page you are trying to search. Google will crawl the website and return a list of suggestions based on the content.
For example, if you are looking for keywords relevant to “songs”, enter the song name or else and click the Search button. A list of relevant keywords will be shown on your screen. This will provide a comprehensive list of keywords with associated search volume.
2 boolean AND and boolean OR
A type of search allowing users to combine keywords with operators such as AND, NOT and OR to further produce more relevant results. For example, a Boolean search could be “hotel” AND “India”. This would limit the search results to only those documents containing the two keywords.
Most of the search engines use the plus sign(+) which is equals to AND whereas minus sign(-) equals to AND NOT.
For example: internet+architecture+model.
3 phrase searching
A type of search that allows users to search for documents containing an exact sentence or phrase, rather than single keywords.
Enclose multi-word phrases in quotation marks. Either single or double quotes will work, as long as one matches the other. If no quote marks are used, Embase searches for all the words with the Boolean AND operator by default.
You can also bind multi-word phrases with hyphens.
heart attack retrieves heart AND attack (anywhere within an article)
‘heart attack’ retrieves heart attack (phrase)
heart-attack retrieves heart attack (phrase)
The truncation/wildcard symbols are used to create searches where there are unknown characters, multiple spellings or various endings. Neither the wildcard nor the truncation symbol can be used as the first character in a search term. By using truncation/wildcard you will get more results.
Say you want information about politics, politicians, politician and political – these words all have the same stem “politi“. Instead of searching each word separately, you can search them all at once by using truncation. You do this by entering the stem or root of the word followed by the truncation symbol. This is usually one of the following symbols: * (asterisk), ? (question mark), # (hash),+ (plus) or ! (exclamation mark). The most common one is the * (asterisk). Use the help option on the database to find the relevant truncation symbol..
When you type politi* (stem + wildcard) you will retrieve all documents with the following words:
5 title search
Title tags—technically called title elements—define the title of a document. Title tags are often used on search engine results pages (SERPs) to display preview snippets for a given page, and are important both for SEO and social sharing.
The title element of a web page is meant to be an accurate and concise description of a page’s content. This element is critical to both user experience and search engine optimization. It creates value in three specific areas: relevancy, browsing, and in the search engine results pages.
6 url search:
Web sites are found by their addresses on the World Wide Web. These addresses are known as URLs, or
Uniform Resource Locators . Every web site has a URL assigned to it, so both searchers and Web servers can find them quickly and easily.
An example of a URL is http://www.computerhope.com, which is the URL for the Computer Hope website. Below is additional information about each of the sections of the http URL for this page.
The http:// stands for HyperText Transfer Protocol and enables the browser to know whatprotocol it is going to use to access the information specified in the domain. An alternative protocol you may see while on the Internet is FTP.
Next, www. that stands for World Wide Web, is used to distinguish the content. This portion of the URL is not required, and many times can be left out. For example, typing “http://nikhilarora.com” would still get you to the Computer Hope web page. This portion of the address can also be substituted for an important sub page known as a subdomain. For example, http://support.microsoft.com is the support section of Microsoft’s page.
Next, nikhilarora.com is the domain name for the website. The last portion of the domain is known as the “domain suffix” or TLD and is used to identify the type or location of website. For example, .com is short for commercial, .org is short for organization, and .co.uk is United Kingdom. There are dozens of other domain suffixes available.
Next, the jargon and u portions of the above URL are the directories of where on the server the web page. In this example, the web page is two directories deep, so if you were trying to find the file on the server it would be in the /public_html/jargon/u directory.
Finally, url.htm is the actual web page on the domain you’re viewing. The trailing .htm is thefile extension of the web page that indicates the file is an HTML file. Other common file extensions on the Internet include .php, .asp, .cgi, .xml, .jpg, and .gif. Each of these file extensions performs a different function, just like all the different types of files on your computer. See our index.htm definition for additional information about this important file.
Different Kinds of URLs
There are a wide variety of different kinds of URLs, as well as different terms to describe what a URL looks like. For example:
Messy: This is a URL with a lot of garbled numbers and letters on it that makes little organizational sense, i.e., “http://www.abc.com/woeiruwoei909305820580”. Typically these URLs are computer-generated from programs creating thousands of Web pages on the same domain name.
Dynamic: These are what the previous explanation of “messy URLs” really come from. Dynamic URLs are the end result of database queries that provide content output based on the result of that query. The URL ends up looking quite garbled, aka “messy”, and often includes the following characters: ?, &, %, +, =, $. Dynamic URLs are often found as part of consumer-driven websites: shopping, travel, or anything that requires changing answers for many different user queries.
Static: A static URL is the opposite of a dynamic URL. The URL is “hard-wired” into the Web page’s HTML coding and will not change depending on what the user requests.
Obfuscated: Obfuscated, or hidden, URLs are primarily used in phishing scams. Basically, a familiar URL is distorted in some way to make it seem legitimate. The user clicks on the obfuscated URL and is redirected to a malicious website.
7 domain search
The domain search allows user to results to a specified sub domain such as , web site from india(.in), commercial(.com) and network sites(.net).
Domain:in and title: “GNDU college”
Domain:com and title: “GNDU college”
8 link search
As clear its name, searching with the help of particular link
in the above example, you can see all links related to your search will be displayed.
Search engine vs web directory
Search engines and the directories are two different services available to the Web community. However, many people do not know the difference between them. Search engines have databases built up by “robots”, which visit a websites and add information to their database. On the other hand, directories are human edited and build their indexes with editors who visit websites, and add to the directory the sites that they consider to be a valuable resource.
Some search engines and directories include both types of indexes, and are known as “hybrids”. Some examples of search engines are Google, Gigablast, and Alltheweb. These search engines use programs (known as robots), with the following functions:
- To find web pages.
2. To scan the contents of a web page.
3. Return its findings to the search engine’s databases.
Most search engines update their databases frequently. When web searchers use a search engine to locate websites relevant to the keyword (or key phrases) searched, they are searching the search engine’s database. Therefore, a search engine with a frequently updated database should provide better search results.
The best known directories are Yahoo, Business.com, Dmoz.org, and Looksmart. These directories employ human editors to review websites that are submitted for possible inclusion into their directory. The directories usually include only the main page of a website, while search engines can include many pages from a website. The process of adding sites to a directory manually is much slower than the automated work of robots. Therefore, most of the time there are many more websites indexed by a search engine than in a directory. However, the directories have an advantage: The data organization.
Directories, unlike search engines, use a tree hierarchic structure to organize their database. This hierarchic organization allows the existence of specialized directories, by subject or by geographic location. One example is Checkhouston.com, a directory dedicated specifically for information and businesses in the area of Houston, Texas.
You may wonder how these services make money, as they are free for the user.
There are several sources of income:
- Directories generally charge a fee for websites interested in being added to their database. This fee is to cover the costs of the human editors reviewing the site, and other directory expenses. There are exceptions such asjoeant.com,webbeacon.com and dmoz.org, whose editors are volunteers working for free.2. Search engines show sponsored links, in addition to the natural result. These advertisements are very effective for two reasons: a) The advertisers’ website is highly relevant to the user search, and b) the advertiser only pays for the users who click in their ad.3. There are “pay per click” search engines, in which all the results are paid placements. In these search engines, the first search result is the advertiser bidding higher for each search term they are targeting. Although some may think that the results of these searches would not be very useful, actually, most of the ads are professional sites, willing to pay to attract possible customers. The best known “pay per click” search engine is Overture.
Lets discuss about some popular search engines:
Bing is a search engine from Microsoft that was launched on May 28, 2009. Microsoft calls it a “Decision Engine,” because it’s designed to return search results in a format that organizes answers to address your needs. When you search on Bing, in addition to providing relevant search results, the search engine also shows a list of related searches on the left-hand side of the search engine results page (SERP). You can also access a quick link to see recent search history. Bing uses technology from a company called Powerset, which Microsoft acquired.
Bing launched with several features that are unique in the search market. For example, when you mouse-over a Bing result a small pop-up provides additional information for that result, including a contact e-mail address if available. The main search box features suggestions as you type, and Bing’s travel search is touted as being the best on the net. Bing is expected to replace Microsoft Live Search.
yahoo offers a toolbar and search engine, which are promoted via other free downloads, and once installed on your computer, it will add the Yahoo Toolbar and change your default search engine to Yahoo Search (search.yahoo.com).
Yahoo Search will display advertisements and sponsored links in your search results, and may collect search terms from your search queries.
A Google Custom Search Engine enables Web site authors to host a Web site (or Web) search box and search results on on their site. Users can customize the search engine that is built using Google’s core search technology. In creating your own Google Custom Search Engine you can prioritize or restrict search results based on specific Web sites and pages you specify. Once you’ve defined your search engine, Google provides code for a search box that users can copy and then paste right in to their own Web site or blog.
|Google Search homepage|
|Web address||Google.com (US)|
|Type of site||Web search engine|
|Available in||123 languages|
|Launched||September 15, 1997|
|Alexa rank||1 (April 2014)|