Home | Networks & The Internet | All About Computers | Our Warped Views | Downloads | Forums
4th December, 2008
Contact Us | Usage Agreement

Email And The World Wide Web


 

HTTP - The Guts of WWW

Created: 13th June 2008

 

Cacheable content

Towards the end of the 90's, HTTP had realized the need to support much more then just clients and servers. HTTP transactions needed far more granular control, and with more and more clients requesting content from the same servers, bottlenecks appeared, causing slow responses.

The solution came in the form of Proxy Servers (or cache content servers, web caches etc). They basically acted as 'proxies' between the server and client, storing commonly requested content for whichever clients were configured to use them. Many large companies that wanted to reduce the amount of outgoing HTTP requests would install proxy servers to further control the flow of traffic.

A web cache\cache content server\proxy server all basically performs the same task. When a request arrives at a caching device, it checks to see if a 'local' copy is available. If it is, it will serve the client its local copy instead of requesting it from the server. However processes have to be in place to ensure the cache is supplying the client with the most up to date objects.

HTTP/1.0 had virtually no cache control apart from the 'If-Modified-Since' header, which allowed for basic checking of newly updated content. HTTP/1.1 addressed these shortcomings with an array of new headers. Though 'If-Modified-Since' is still very much one of the main (and simplest) ways a caching device can check.

If-Modified-Since works by sending a request to the server once a browser realizes that it already has a local copy stored in its cache. All modern browsers have local cache stores (or temporary internet files as Internet Explorer calls it), and instead of requesting a page from the server, it will use its local copy. But before it can do this - the browser must check the local copy is 'fresh' enough. It does this by sending an If-Modified-Since header.

The If-Modified-Since header has a value of the Last-Modified date that was originally sent in the server response when the object was first requested. If the Last-Modified date on the server is not any newer then the client has, the server returns a 304 Not Modified header with no content, and the cache (or browser) knows it can safely serve its own copy.

Sounds complicated to start with, but it really is just comparing dates. Things get more confusing when you use other methods HTTP/1.1 introduced to resolve several issues. For more information on advanced caching methods, I'd highly recommend the book HTTP: The definitive guide by David Gourley and Brian Totty.

HTTP Transport

HTTP has come along way from the days of retrieving a document in plain text. When the need for secure connections arose, HTTP had to once again adjust to meet the growing needs of the internet population.

HTTP in itself does not provide any security other then basic authentication, and even this does not actually protect the data from being stolen or manipulated on the wire. HTTP did have one advantage over others… it makes a great transport protocol. Sometime between HTTP/1.0+ and HTTP/1.1, the CONNECT method was added to HTTP. CONNECT allows HTTP to instruct intermediate devices to connect to resources using different ports and protocols. This came in handy because clients configured to use proxies had no choice about which port\server they connect to.

To see the problem, let's take a look at a typical connection to a secure site without a proxy:

A simple SSL connection
*A standard secure connection between a browser and web server.

The client initiates a connection on port 443 using TCP. They then exchange data using a negotiated version of SSL or TLS - nothing to do with HTTP so far.

However if the client uses a proxy - the connection to 443 will most likely get rejected (unless the proxy is explicitly listening on port 443). Browsers get around this issue by changing their behavior depending if they are configured to use a proxy or not. The following diagram shows what the same request above looks like when the browser is configured to use a proxy:

HTTP CONNECT method used by browsers
*The browser sends a HTTP CONNECT request to the proxy, asking it to create the SSL connection.

The browser knows it's trying to connect to a HTTPS site (note the schema in the URL), and it also knows it's configured to go via a proxy, therefore it sends a HTTP CONNECT request over the HTTP port configured instead of creating a connection on port 443 (SSL). Proxies must know how to interpret HTTP CONNECT methods, and a decent implementation will create an SSL connection to the server, relaying the encrypted messages back to the client using HTTP (in fact, decent proxies will create two connections, one to the server and one to the client, so HTTPS is maintained throughout the entire path of the data).

SSL is just one example of a protocol being tunneled over HTTP. Many other protocols can be used, such as XML and WebDev.

Conclusion

HTTP is probably the most dynamic, robust protocol that's in use today, it's had to evolve constantly to adapt to the ever increasing demands of the web, and with HTTP/2.0 round the corner (actually it's been in development since 1997!), it's only going to get more and more functional.

For more information, you can view the RFC's (Request for Comments). These documents are the manuals that define the protocol, and although they are a very heavy read, they will explain absolutely everything.

Hypertext Transfer Protocol -- HTTP/1.0 - Main RFC for HTTP/1.0, also covers HTTP/0.9
http://www.ietf.org/rfc/rfc1945.txt

Hypertext Transfer Protocol -- HTTP/1.1 - Main RFC for HTTP/1.1
http://www.w3.org/Protocols/rfc2616/rfc2616.html

© 2008 John Payne, The-serpent.co.uk
<< Page 1 | Page 2
All logos and trademarks in this site are property of their respective owner.
The Serpent.co.uk © 2005 by John Payne. Site owned and maintained by John Payne. For emails to the webmaster, please use the feedback form.
All articles, guides and tutorials are subject to The Serpent Usage Agreement. Please read before following any advice on this site.

About Us | Contact Us | Privacy Policy | © 2005 The-serpent.co.uk