open() and friends - the gory details

Opening a file for reading, writing or appending

The open() function is used to open a file for reading or writing (or both, or as a pipe - more on that later).

The open() function is documented on pages 191-195 of the Camel book, and also in perldoc perlfunc. Read the documentation for open() before going any further.

In a typical situation, we might use open() to open and read from a file:

open(LOGFILE, "/var/log/httpd/access.log")

Note that the < (less than) used to indicate reading is assumed; we could equally well have said "</var/log/httpd/access.log".

You should always check for failure of an open() statement:

open(LOGFILE, "/var/log/httpd/access.log") || die "Can't open
        /var/log/httpd/access.log: $!";

$! is the special variable which contains the error message produced by the last system interaction. It is documented in chapter 2 of the Camel, on page 134.

Once a file is opened for reading or writing, we can use the filehandle we specified (in this case LOGFILE) for a variety of useful purposes:

open(LOGFILE, "/var/log/httpd/access.log") || die "Can't open
        /var/log/httpd/access/log: $!";

# use the filehandle in the in the <> line input operator...
while (<LOGFILE>) {
        print if /netizen.com.au/;
}

close LOGFILE;

# open a new logfile for appending
open(SCRIPTLOG, ">>myscript.log") || die "Can't open myscript.log: $!";

# print() takes an optional filehandle argument - defaults to STDOUT
print SCRIPTLOG "Opened logfile successfully.\n";

close SCRIPTLOG;

Note that you should always close a filehandle when you're finished with it (though admittedly any open filehandles will be automatically closed when your script exits).

You can also use sysopen() and friends to open a file in a C-like way. See page 229 of your Camel book for details or perldoc -f sysopen.

Exercises

  1. Write a script which opens a file for reading. Use a while loop to print out each line of the file.

  2. Use the above script to open a Perl script. Use a regular expression to print out only those lines not beginning with a hash character (i.e. non-comment lines). (Answer: exercises/answers/delcomments.pl)

  3. Create a new script which opens a file for writing. Write out the numbers 1 to 100 into this file. (Answer: exercises/answers/100count.pl)

  4. Create a new script which opens a logfile for appending. Create a while loop which accepts input from STDIN and appends each line of input to the logfile. (Answer: exercises/answers/logfile.pl)

  5. Create a script which opens two files, reads input from the first, and writes it out to the second. (Answer: exercises/answers/readwrite.pl)

Reading directories

It is also possible to open directories (using opendir() and read from them. However, it is not possible to read the contents of files in that directory simply by opening it and looping through it. Opening a directory simply makes the filenames in that directory accessible via functions such as readdir().

opendir() is documented on page 195 of the Camel. readdir() is on page 202. Don't forget that function help is also available by typing perldoc -f opendir or perldoc -f readdir

opendir(HOMEDIR, $ENV{HOME});

my @files = readdir(HOMEDIR);

closedir HOMEDIR;

foreach (@files) {
        open(THISFILE, "<$_") || die "Can't open file $_: $!");
        ...
        ...
        close THISFILE;
}

Exercises

  1. Use opendir() and readdir() to obtain a list of files in a directory. What order are they in?

  2. Use the sort() function to sort the list of files asciibetically (Answer: exercises/answers/dirlist.pl)

Opening files for simultaneous read/write

Files can be opened for simultaneous read/write by putting a + in front of the > or < sign. +< is almost always preferable, however, as +> would overwrite the file before you had a chance to read from it.

Read/write access to a file is not as useful as it sounds --- you can't write into the middle of the file using this method, only onto the end. The main use for read/write access is to read the contents of a file and then append lines to the end of it.

A more flexible way to read and write a file is to import the file into an array, manipulate the array, then output each element again.

# program to remove duplicate lines
open(INFILE, "file.txt") || die "Can't open file.txt for input: $!";
my @lines = <INFILE>;
close INFILE;

# dup-remover taken from The Perl Cookbook
my @unique = grep { ! $seen{$_} ++ } @lines;

open(OUTFILE, ">file.txt") || die "Can't open file.txt for output: $!";
foreach (@unique) {
        print OUTFILE $_;
}

close OUTFILE;

Note

One thing to watch out for here is memory usage. If you have a ten megabyte file, it will use at least that much memory as a Perl data structure.

Exercises

  1. Open a file, reverse its contents (line by line) and write it back to the same filename (Answer: exercises/answers/reversefile.pl)

Opening pipes

If the filename given to open() begins with a pipe symbol (|), the filename is interpreted as a command to which output is to be piped, and if the filename ends with a |, the filename is to be interpreted as a filename which pipes input to us.

This is often used when you want to take input from the system a line at a time. Here's an example which reads from the rot13 filter (a simple routine which rotates the letters of its input by 13 letters, providing a very simple cipher for encoding the answers to jokes, spoilers to movies, or other low-security information):

#!/usr/bin/perl -w

use strict;

open (ROT13, "rot13 < /etc/motd |") || die "Can't open pipe: $!";

while (<ROT13>) {
        print;
}

close ROT13;

Conversely, we can output something through rot13:

#!/usr/bin/perl -w

use strict;

open (ROT13, "|rot13") || die "Can't open pipe: $!";

print "This is some rot13'd text:\n";
print ROT13 "This is some rot13'd text.\n";

close ROT13;

If you reverse the two print lines above, the output will nevertheless be in the same order as before. You'll need to set $| to flush the output pipe. It's on page 130 of your Camel, or in perldoc perlvar.

Exercises

  1. Modify the second example above (provided for you as exercises/rot13.pl in your exercises directory to accept user input and print out the rot13'd version.

  2. Change your script to accept input from a file using open() (Answer: exercises/answers/rot13.pl)

  3. Change your script to pipe its input through the strings command, so that if you get a file that's not a text file, it will only look at the parts of the file which are strings. (Answer: exercises/answers/strings.pl)