Introduction to HTTP

To understand how CGI works, you need some understanding of how HTTP works.

HTTP stands for HyperText Transfer Protocol, and (not very surprisingly) is the protocol used for transferring hypertext documents such as HTML pages on the World Wide Web.

For the purposes of this course, we will only be looking at HTTP version 1.0. The current version, 1.1, is specified in RFC 2068 and contains many more features, but none of them are necessary for a basic understanding of CGI programming. An HTTP cheat-sheet, containing some common terminology and a table of status codes, appears in Appendix E.

RFCs, or "Request For Comment" documents, can be obtained from the Internet Engineering Task Force (IETF) website or from mirrors such as the RFC mirror at Monash University.

A simple HTTP transaction, such as a request for a static HTML page, works as follows:

  1. The user types a URL into his or her browser, or specifies a web address by some other means such as clicking on a link, choosing a bookmark, etc

  2. The user agent connects to port 80 of the HTTP server

  3. The user agent sends a request such as GET /index.html

  4. The user agent may also send other headers

  5. The HTTP server receives the request and finds the requested file in its filesystem

  6. The HTTP server sends back some HTTP headers, followed by the contents of the requested file

  7. The HTTP server closes the connection

When a user requests a CGI program, however, the process changes slightly:

  1. The user agent sends a request as above

  2. The HTTP server receives the request as above

  3. The HTTP server finds the requested CGI program in its file system

  4. The HTTP server executes the program

  5. The program produces output

  6. The output includes HTTP headers

  7. The HTTP server sends back the output of the program

  8. The HTTP server closes the connection

Terminology

authentication

The process by which a client sends username and password information to the server, in an attempt to become authorized to view a restricted resource.

client

An application program that establishes connections for the purpose of sending requests.

Content-type

The media type of the body of the response, as given in the Content-type: header. Examples include text/html, text/plain, image/gif, etc.

method

Indicates what the server should do with a resource. Case sensitive. Valid methods include: GET, HEAD, POST

request

An HTTP request message sent by a client to a server

resource

A network data object or service which can be identified by a URI.

response

An HTTP response message sent by a server to a client

server

An application program that accepts connections in order to service requests by sending back responses.

status code

A 3-digit integer indicating the result of the server's attempt to understand and satisfy the request. A table of status codes and their meanings appears below.

Uniform Resource Identifier (URI)

URIs are formatted strings which identify - via name, location, or any other characteristic - a network resource.

Uniform Resource Locator (URL)

A web address. May be expressed absolutely (eg http://www.example.com/services/index.html) or in relation to a base URI (eg ../index.html) See also URI.

user agent

The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots) or other end-user tools.

HTTP status codes

Table 2-1. HTTP status codes

CodeMeaning
200OK
201Created
202Accepted
204No Content
301Moved Permanently
302Moved Temporarily
304Not Modified
400Bad Request
401Unauthorized
403Forbidden
404Not Found
500Internal Server Error
501Not Implemented
502Bad Gateway
503Service Unavailable

HTTP Methods

GET

The GET method means retrieve whatever information is identified by the request URI. If the request URI refers to a data-producing process (eg a CGI program), it is the produced data which is returned, and not the source text of the process.

HEAD

The HEAD method is identical to GET except that the server will only return the headers, not the body of the resource. The meta-information contained in the HTTP headers in response to a HEAD request should be identical to the information sent in response to a GET request. This method can be used to obtain meta-information about the resource without transferring the body itself.

POST

The POST method is used to request that the server use the information encoded in the request URI and use it to modify a resource such as:

  • Annotation of an existing resource

  • Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles

  • Providing data {such as the result of submitting a form} to a data-handling process

  • Updating a database

Exercises

The HTTP request/response process is usually transparent to the user. To see what's going on, let's connect directly to the web server and see what happens.

Login to the system as for the Introduction to Perl course:

  1. Open the telnet program, TeraTerm

  2. Connect to the training server (your instructor will give you the hostname or IP number)

  3. Login using the username and password you were given

  4. From the Unix command line, type telnet localhost 80 -- this connects to port 80 of the server, where the HTTP daemon (aka the web server) is listening. You should see something like this:

    training:~> telnet localhost 80
    Trying 1.2.3.4
    Connected to training.netizen.com.au.
    Escape character is '^]'.
  5. Ask the web server for a static document by typing: GET /index.html HTTP/1.0 then press enter twice to send the request. Note that this command is case sensitive.

  6. Look at the response that comes back. Do you see the headers? They should look something like this:

    HTTP/1.1 200 OK
    Date: Tue, 28 Mar 2000 02:42:37 GMT
    Server: Apache/1.3.6 (Unix)
    Connection: close
    Content-Type: text/html

    This will be followed by a blank line, then the content of the file you asked for. Then you will see "Connection closed by foreign host", indicating that the HTTP server has closed the connection.

    Tip

    If you miss seeing the headers because the body is too long, try using the HEAD method instead of GET.

  7. Telnet to port 80 again and ask the web server for a CGI script's output by typing GET /cgi-bin/localtime.cgi HTTP/1.0

  8. Now let's get some status codes other than 200 OK from the web server:

    • GET /not_here.html HTTP/1.0 (a file which doesn't exist)

    • GET /unreadable.html HTTP/1.0 (a file with the permissions set wrong)

    • GET /protected.html HTTP/1.0 (a file protected by HTTP authentication - we cover this later on today)

    • GET /redirected.html HTTP/1.0 (a file which is redirected to a different URL)

    • ENCRYPT /index.html HTTP/1.0 (a method which isn't known to our server)