To understand how CGI works, you need some understanding of how HTTP works.
HTTP stands for HyperText Transfer Protocol, and (not very surprisingly) is the protocol used for transferring hypertext documents such as HTML pages on the World Wide Web.
For the purposes of this course, we will only be looking at HTTP version 1.0. The current version, 1.1, is specified in RFC 2068 and contains many more features, but none of them are necessary for a basic understanding of CGI programming. An HTTP cheat-sheet, containing some common terminology and a table of status codes, appears in Appendix E.
RFCs, or "Request For Comment" documents, can be obtained from the Internet Engineering Task Force (IETF) website or from mirrors such as the RFC mirror at Monash University.
A simple HTTP transaction, such as a request for a static HTML page, works as follows:
The user types a URL into his or her browser, or specifies a web address by some other means such as clicking on a link, choosing a bookmark, etc
The user agent connects to port 80 of the HTTP server
The user agent sends a request such as GET /index.html
The user agent may also send other headers
The HTTP server receives the request and finds the requested file in its filesystem
The HTTP server sends back some HTTP headers, followed by the contents of the requested file
The HTTP server closes the connection
When a user requests a CGI program, however, the process changes slightly:
The user agent sends a request as above
The HTTP server receives the request as above
The HTTP server finds the requested CGI program in its file system
The HTTP server executes the program
The program produces output
The output includes HTTP headers
The HTTP server sends back the output of the program
The HTTP server closes the connection
The process by which a client sends username and password information to the server, in an attempt to become authorized to view a restricted resource.
An application program that establishes connections for the purpose of sending requests.
The media type of the body of the response, as given in the Content-type: header. Examples include text/html, text/plain, image/gif, etc.
Indicates what the server should do with a resource. Case sensitive. Valid methods include: GET, HEAD, POST
An HTTP request message sent by a client to a server
A network data object or service which can be identified by a URI.
An HTTP response message sent by a server to a client
An application program that accepts connections in order to service requests by sending back responses.
A 3-digit integer indicating the result of the server's attempt to understand and satisfy the request. A table of status codes and their meanings appears below.
URIs are formatted strings which identify - via name, location, or any other characteristic - a network resource.
A web address. May be expressed absolutely (eg http://www.example.com/services/index.html) or in relation to a base URI (eg ../index.html) See also URI.
The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots) or other end-user tools.
Table 2-1. HTTP status codes
Code | Meaning |
---|---|
200 | OK |
201 | Created |
202 | Accepted |
204 | No Content |
301 | Moved Permanently |
302 | Moved Temporarily |
304 | Not Modified |
400 | Bad Request |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
500 | Internal Server Error |
501 | Not Implemented |
502 | Bad Gateway |
503 | Service Unavailable |
The GET method means retrieve whatever information is identified by the request URI. If the request URI refers to a data-producing process (eg a CGI program), it is the produced data which is returned, and not the source text of the process.
The HEAD method is identical to GET except that the server will only return the headers, not the body of the resource. The meta-information contained in the HTTP headers in response to a HEAD request should be identical to the information sent in response to a GET request. This method can be used to obtain meta-information about the resource without transferring the body itself.
The POST method is used to request that the server use the information encoded in the request URI and use it to modify a resource such as:
Annotation of an existing resource
Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles
Providing data {such as the result of submitting a form} to a data-handling process
Updating a database
The HTTP request/response process is usually transparent to the user. To see what's going on, let's connect directly to the web server and see what happens.
Login to the system as for the Introduction to Perl course:
Open the telnet program, TeraTerm
Connect to the training server (your instructor will give you the hostname or IP number)
Login using the username and password you were given
From the Unix command line, type telnet localhost 80 -- this connects to port 80 of the server, where the HTTP daemon (aka the web server) is listening. You should see something like this:
training:~> telnet localhost 80 Trying 1.2.3.4 Connected to training.netizen.com.au. Escape character is '^]'. |
Ask the web server for a static document by typing: GET /index.html HTTP/1.0 then press enter twice to send the request. Note that this command is case sensitive.
Look at the response that comes back. Do you see the headers? They should look something like this:
HTTP/1.1 200 OK Date: Tue, 28 Mar 2000 02:42:37 GMT Server: Apache/1.3.6 (Unix) Connection: close Content-Type: text/html |
This will be followed by a blank line, then the content of the file you asked for. Then you will see "Connection closed by foreign host", indicating that the HTTP server has closed the connection.
![]() | If you miss seeing the headers because the body is too long, try using the HEAD method instead of GET. |
Telnet to port 80 again and ask the web server for a CGI script's output by typing GET /cgi-bin/localtime.cgi HTTP/1.0
Now let's get some status codes other than 200 OK from the web server:
GET /not_here.html HTTP/1.0 (a file which doesn't exist)
GET /unreadable.html HTTP/1.0 (a file with the permissions set wrong)
GET /protected.html HTTP/1.0 (a file protected by HTTP authentication - we cover this later on today)
GET /redirected.html HTTP/1.0 (a file which is redirected to a different URL)
ENCRYPT /index.html HTTP/1.0 (a method which isn't known to our server)