WWW


WWW

 

 

Introduction to www

The “Web”, short for “World Wide Web” (which gives us the acronym www), is the name for one of the ways that the Internet lets people browse documents connected by hypertext links.

 

The concept of the Web was perfected at CERN (European Council for Nuclear Research) in 1991 by a group of researchers which included Tim-Berners Lee, the creator of the hyperlink, who is today considered the father of the Web.

 

The principle of the Web is based on using hyperlinks to navigate between documents (called “web pages”) with a program called a browser. A web page is a simple text file written in a markup language (called HTML) that encodes the layout of the document, graphical elements, and links to other documents, all with the help of tags.

 

Besides the links which connect formatted documents to one another, the web uses the HTTP protocol to link documents hosted on distant computers (called web servers, as opposed to the client represented by the browser). On the Internet, documents are identified with a unique address, called a URL, which can be used to locate any resource on the Internet, no matter which server may be hosting it.

 

 

How Does the WWW Work?

Asking how the Internet works is not the same as asking how the world wide web works. Well, Internet and the World Wide Web are not one and the same, although they are often used as synonyms. While the Internet is an infrastructure providing interconnectivity between network computers, the web is one of the services of the Internet. It is a collection of documents that can be shared across Internet-enabled computers.

 

The network of web servers serves as the backbone of the World Wide Web. The Hypertext Transfer Protocol (HTTP) is used to gain access to the web. A web browser makes a request for a particular web page to the web server, which in turn responds with the requested web page and its contents. It then displays the web page as rendered by HTML or other web languages used by the page. Each resource on the web is identified by a globally unique identifier (URI). Each web page has a unique address, with the help of which a browser accesses it. With the help of the domain name system, a hierarchical naming system for computers and resources participating in the Internet, the URL is resolved into an IP address.

 

www vs ww2 vs ww3

WWW1, WWW2 or even WWW3 is nothing but a mirror of the original web server which is typical WWW. Many websites like government, banks even major search engines like Google(www1.google.com or www2.google.com) and Yahoo(www1.yahoo.com or www2.yahoo.com)! uses www1 and www2. The main purpose of this techniques is to reduce server load. Sometimes the original server need to be updated or modified but major websites like .gov, .edu, google.com, yahoo.com cannot just shut their main server down for hours to update their system. Therefore, they need to use www1 or www2 duplication of their mainframe server.

For example:

We’ve all seen www, www2, www3; We type for exemple ‘www.company.com’ and we’re suddenly redirected to ‘www3.company.com’. What exactly is this, and how do we get there?

 

Starting the domain name name of your web site with www. or www2. or www3. is a common convention and nothing more. There is nothing in the HTTP specification that says a web site must start with www. or any other prefix. It is simply a convention that began in the early days of the web and was used to distinguish a company’s web server from its FTP server, gopher server, mail server, etc… But no such distinction is necessary, because this part of the URL, ‘www.’ help it decide how to perform this internal routing.

Usually this type of configuration is stored in your .htaccess file, located in public_html.

 

Load balancing

 

Another example is role-based routing. For example, store.company.com and developer.company.com are both hosted at company.com, but serve different roles on the web; one is an online store, the other is a site with resources for programmers. (And each is probably also load-balanced in ways that don’t rewrite your URL.)

 

Some companies use ‘www2.’, ‘www3.’, etc, to perform ‘load balancing’. An initial request to the ‘www.’ server may get redirected to a less-busy server, such as ‘www2.’

In some cases, the specific hostname may be obscured, creating the appearance that the user is viewing the “www” subdomain, even if they are actually viewing a mirror site.

 

In short, anything after the protocol (http://) and before the domain (‘company.com’) is managed by the host organization, for a variety of different reasons such as load balancing, roles, marketing, etc..

Although, these domains might be treated as different domain names by search engines.

Web browsing vs Web browser

Web browsing

Exploration of the World Wide Web by following one interesting link to another, usually with a definite objective but without a planned search strategy. In comparison ‘surfing’ is exploration without a definite objective or search strategy, and ‘searching’ is exploration definite in both objective and strategy.

Web browser

 

A browser, short for web browser, is the software application (a program) that you’re using right now to search for, reach and explore websites. Whereas Excel® is a program for spreadsheets and Word® a program for writing documents, a browser is a program for Internet exploring (which is where that name came from).

Browsers don’t get talked about much. A lot of people simply click on the “icon” on our computers that take us to the Internet—and that’s as far as it goes. And in a way, that’s enough. Most of us simply get in a car and turn the key…we don’t know what kind of engine we have or what features it has…it takes us where we want to go. That’s why when it comes to computers:

  • There are some computer users that can’t name more than one or two browsers
  • Many of them don’t know they can switch to another browser for free
  • There are some who go to Google’s webpage to “google” a topic and think that Google is their browser.

We will discuss more about web browsers in next chapter

 

Web page

Each day when browsing the Internet, we visit a lot of websites, some more complex, others – just simple personal pages. The term “website” represents a summary of all the content you have put online – each file takes part in what the website represents. And the driving power behind the website, the pillars that hold it together, are the web pages.

Each web page (also known as webpage) represents various types of information presented to the visitor in an aesthetic and readable manner. Most of the web pages are available on the World Wide Web, which makes them widely accessible to the Internet public. Others may be also available online but only restricted to a certain private network, such as a corporate intranet. The information in all those web pages is located on remote web servers in the form of text, image, or script files. A smaller amount of web pages are intended for home or test use and are located on local computers, which do not need Internet connection to display them.

 

How do web pages work?

The information on a web page is displayed online with the help of a web browser, which connects with the server where the website’s contents are hosted through the Hypertext Transfer Protocol (HTTP). For instance, if you look at the URL of the web page you are on at the moment, you could notice the prefix ‘http://’, which tells the browser what protocol to use to execute the particular URL request.

 

Each web page’s contents are usually presented in HTML or XHTML format, which allows for the information to be easily structured and then quickly read by the client’s web browser. With the help of CSS (Cascading Style Sheets), designers can precisely control the web page’s look and feel, as far as layout, typographic elements, color scheme and navigation are concerned. CSS instructions can be either embedded within the HTML web page (valid for that particular page) or can be included in a separate external file (valid for the whole site).

 

Operations on web page as follows:

1 opening a web page

2  viewing a web page

3 saving and printing a web page

4 favorites/bookmark a webpage.

opening a web page:

To open a web page you have to follow these steps:

1 connect or set up the dial up connection

2 open the existing browser( google chrome).

3 type the address of website in the URL of the browser, which you want to open (www.hcl.com) as shown below:

 

4 click enter button from the keyboard.

5 the above step will work properly because the URL is the address of a web site over the internet.

6 you can open many websites according to reuirements with different domains(.com , .edu  etc)

 

 

viewing a web page:

To view a web page you have to follow these steps:

1 connect or set up the dial up connection

2 open the existing browser( google chrome).

3 type the address of website in the URL of the browser, which you want to open (www.hcl.com) as below:

 

 

 

Saving and printing a web page:

To save a web page you have to follow these steps:

1 go to page which you want to save(www.hcl.com)

2 click on it as below:

 

 

 

In the above window you will see many option , you have to click on print then the next window will open like as below

 

If you want to print the webpage click on print button appear on left side.

And if you to save click on change button appears left side then the next window ask to save as pdf:

 

 

When you click on save as pdf you will see the next window.

 

Press save button appears at left side.and it will ask where you want to save this page.

 

Click on save button. Your page will be save.

 

 

Favorites/bookmarks a web page

To bookmark a web page you have to follow these steps:

1 go to page which you want to save(www.hcl.com)

2 click on star as shown in the right upper side of the browser as shown:

 

Click on done your page will be save as your bookmark/favorite

 

How to see bookmark pages:

Click on the right side of the browser as shown:

 

 

When you click then the next dropdown will open you have to choose bookmark and you will see all pages of your favorite as shown below:

 

Click on it and you will go to the favorite page.

 

HTTP concept:

 

HTTP is a very simple protocol: a client sends a request to a server; the server processes the request and replies with a response. Requests and responses are send as messages over a TCP connection. The message format is essentially MIME (Multi purpose Internet Mail Extension) [RFC 2045-2049], a format used, too, to transfer multi media mail messages across the Internet. A MIME message consists of a set of headers and a body, also known as message entity. The body is optional for HTTP messages. The standards speaks of it as the HTTP entity. For HTTP, a request or response line, respectively, is prepended to the message.

A request line consists of the request method, the resource locator and the protocol version. A resource can be anything: a HTML page, an image, a file, a database, a service, an application. It is identified by the resource locator, a path to easily locate the resource in a hierarchical structure such as e.g. a file system or Zope’s folder structure. HTTP uses the URL syntax for the resource locator. It is up to the receiving HTTP server to determine what resource the resource locator does really identify.

 

 

Diagram by me

 

http commands

Here are the basic HTTP commands:

GET Request to read a Web page
HEAD Request to read a Web page
PUT Request to write a Web page
POST Append to a named resource (e.g. a Web page)
DELETE Remove the Web page
LINK Connects two existing resources
UNLINK Breaks an existing connection between two resources
Patch Used to modify the resource
Trace It is used to test whether the server is alive or not.
Connect Encryptes the data while connection is established.

 

 

 

 

GET is the most common HTTP method; it says “give me this resource”.

The semantics of the GET method changes to a conditional GET if the request message includes an If-Modified-Since header field. A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header. This is specially useful for proxies that keep a cache memory. If they have a recent version of the page and it has not been modified since they got it, the server will give the 304 error (Not Modified) so the proxy will not have to download the file again.

The HEAD method is identical to GET except that the server must not return any Entity-Body in the response. The meta-information contained in the HTTP headers in response to a HEAD request should be identical to the information sent in response to a GET request. This method can be used for obtaining meta-information about the resource identified by the Request-URI without transferring the Entity-Body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

The POST method gives client the chance to send information. It is used for example, to send forms, post a message in a bulletin, forums, mailing lists, webmail or when you upload your website by using not the FTP service but by straightaway uploading files through a HTML page.In this method there is a reversal of roles and now you become the server and the host you are connected to becomes the client.

 

 

SERVERRESPONSE CODES
 

editing

Well, after this request the server responds to your browser with an initial line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are:

HTTP/1.0 200 OK
HTTP/1.0 404 Not Found

The HTTP version is in the same format as in the request line, “HTTP/x.x” as well.The status code is meant to be computer-readable; the reason phrase is meant to be human-readable, and may vary. The status code is a three-digit integer, and the first digit identifies the general category of response:

1xx Indicates an informational message only
2xx Indicates success of some kind
3xx Redirects the client to another URL
4xx Indicates an error on the client’s part
5xx Indicates an error on the server’s part

Specified Errors in HTTP/1.0:

200 OK
201 Created
202 Accepted
204 No Content
301 Moved Permanently
302 Moved Temporarily
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable

Note that 1xx are not used, this is because they are reserved for future versions.

 

 

 

 

 

 

Web servers

A Web server is a program that, using the client/server model and the World Wide Web’s Hypertext Transfer Protocol ( HTTP ), serves the files that form Web pages to Web users (whose computers contain HTTP clients that forward their requests). Every computer on the Internet that contains a Web site must have a Web server program. Two leading Web servers are Apache , the most widely-installed Web server, and Microsoft’s Internet Information Server ( IIS ). Other Web servers include Novell’s Web Server for users of itsNetWare operating system and IBM’s family of Lotus Domino servers, primarily for IBM’sOS/390 and AS/400 customers.

Web servers often come as part of a larger package of Internet- and intranet-related programs for serving e-mail, downloading requests for File Transfer Protocol ( FTP ) files, and building and publishing Web pages. Considerations in choosing a Web server include how well it works with the operating system and other servers, its ability to handle server-side programming, security characteristics, and publishing, search engine, and site building tools that may come with it.

Web clients

A client-tier component may be an application or Web client. A Web client contains two parts: dynamic Web pages and the Web browser. Dynamic Web pages are produced by components that run in the Web tier, and a Web browser delivers Web pages received from the server.

A Web client is also known as a thin client because it does not execute heavy-duty operations such as querying databases, performing complex business tasks, or connecting to legacy applications. Heavy-duty operations are performed by the J2EE server, which is secure, fast, and reliable.

 

Example of web server and clients

Your web browser is an example of a web client. The remote machine containing the document you requested is called a web server. The client and server communicate using a special language (a “protocol”) called HTTP. Figure 1-1 demonstrates the relationship between web clients and web servers.

Figure 1-1. Client and server relationship

To keep ourselves honest, we should get a little more specific now. Although we commonly refer to the machine that contains the documents as the “server,” the server isn’t the hardware itself, but just a program that runs on that machine. The web server listens on a port on the network, and waits for client requests using the HTTP protocol. After the server responds to the request (using HTTP), the network connection is dropped and the browser processes the relevant data that it received, then displays it on your screen.

In practice, many clients can be using the same server at the same time, and one client can also use many servers at the same time (see Figure 4-1).

Figure 4-1. Multiple clients and servers

As you can see, at the core of the Web is HTTP. If you master HTTP, you can request documents from a server without needing to go through your browser. Similarly, you can return documents to web browsers without being limited to the functionality of an existing web server. HTTP programming takes you out of the realm of the everyday web user and into the world of the web power user.

 

 

 

 

 

Web site

A website (also called an Internet site or a home page in the case of a personal site) is a group of HTML files that are stored on a hosting computer which is permanently connected to the Internet (a web server).

A website is normally built around a central page, called a “welcome page“, which offers links to a group of other pages hosted on the same server, and sometimes “external” links, which lead to pages hosted by another server.

A URL looks something like this:

http://en.abc.net/www/wwwintro.php3

Let’s take a closer look at this address:

  • http://indicates that we want browse the web using the HTTP protocol, the default protocol for browsing the Web. There are other protocols for other uses of the Internet.
  • comment.netcorresponds to the address of the server that hosts the web pages. By convention, web servers have a name that begins with www., to make it clear that they are dedicated web servers and to make memorising the address easier. This second part of the address is called thedomain name. A website can be hosted on several servers, each belonging to the same name: www.comment.net
  • /www/www-intro.php3indicates where the document is located on the machine. In this case, it is the file www-intro.php3 situé located in the directory


 

 

 

 

url uri urn

 

 

As the image above indicates, there are three distinct components at play here. It’s usually best to go to the source when discussing matters like these, so here’s an exerpt from Tim Berners-Lee.

 

“A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.”

“A URI can be further classified as a locator, a name, or both. The term “Uniform Resource Locator” (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network “location”).”

 

“One can classify URIs as locators (URLs), or as names (URNs), or as both. A Uniform Resource Name (URN) functions like a person’s name, while a Uniform Resource Locator (URL) resembles that person’s street address. In other words: the URN defines an item’s identity, while the URL provides a method for finding it.”

So we get a few things from these descriptions:

  1. First of all (as we see in the diagram as well) a URL is a type of URI. So if someone tells you that a URL is not a URI, he’s wrong. But that doesn’t mean all URIs are URLs. All butterflies fly, but not everything that flies is a butterfly.
  2. The part that makes a URI a URL is the inclusion of the “access mechanism”, or “network location”, e.g. http:// orftp://.
  3. The URN is the “globally unique” part of the identification; it’s a unique name.