Authentication and access control for CGI scripts

A common question asked by new CGI programmers is "How do I protect my web site with a CGI script?" There are various ways to use CGI programs to ask for usernames and passwords and perform authentication, but in fact the best way to perform authentication and access control comes with your web server and doesn't require any programming at all.

The reason that password protection is often connected with CGI programs is that CGI programs are more likely to interact with the web server's underlying file system, backend databases, or other things which need to be kept secure. Many programmers assume that because CGI can be used for password protection, it is the right choice for the job. This is not necessarily true.

One of the best ways to password protect web pages is by using the web server's own authentication and access control mechanisms. Since we're using the Apache web server, we'll look at how to do it with that.

Why is CGI authentication a bad idea?

Authentication (i.e. username and password checking) is hard to do correctly in CGI. Some common pitfalls include:

On the other hand, the main disadvantage of HTTP authentication is that the authentication tokens remain active until the user shuts their browser down. This can be a problem in public computer labs and other locations where users may share PCs.

HTTP authentication

If a web page or CGI script requires a username and password to view it, the HTTP conversation between the client and the server goes like this:

  1. The user specifies a URL

  2. The user agent connects to port 80 of the HTTP server

  3. The user agent sends a request such as GET /index.html

  4. The user agent may also send other headers

  5. The HTTP server realises that authentication must be performed {usually by looking up configuration files}

  6. The HTTP server returns a status code 401, meaning "Unauthorized", and also a header saying WWW-Authenticate: and the name of the authentication domain, for instance "Acme Widget Co. Staff". This usually appears in the browser's dialog box as "Please provide a username and password for Acme Widget Co. Staff".

  7. The browser presents a dialog box or other means by which the user can enter their username and password, which the user fills in then clicks "OK"

  8. The browser sends a new request, this time including an extra header saying Authorization: and the appropriate credentials

  9. If the HTTP server finds that the credentials are valid, it sends back the resource requested and closes the connection

  10. Otherwise, it sends back another response with status code 401 (and probably a body containing an error message), which the user agent should recognise as meaning that the authentication failed, and display the body.

Access control

The way access control is handled varies from one web server to another. If your web server is not Apache, you will need to contact your web server administrator or read the documentation it came with, as only Apache is covered in this course.

Apache implements HTTP authentication with the use of a password file and either server configurations or a .htaccess file in the web directory, which contains server configuration directives. Our server has been set up to allow you to use the .htaccess file.

A password file has already been set up for your use. It's /etc/apache/training.passwd and uses the same usernames and passwords as your login accounts. You can look at it by typing cat /etc/apache/training.passwd

To use this password file, create a file in your public_html directory called .htaccess, containing the following text:

AuthType Basic
AuthName "Secret stuff"
AuthUserFile /etc/apache/training.passwd
require valid-user

This authentication will apply to the directory in which the .htaccess file is placed and any subdirectories.

Exercises

  1. Create a .htaccess file in your public_html directory, as above

  2. Use your web browser to request one of your HTML files or CGI scripts, and observe the authentication process

  3. Why would it be a bad idea to put the password file in the same directory as the web pages or CGI scripts?