Delimited text files

A delimited text file is one in which each line of text is a record, and the fields are separated by a known character.

The character used to delimit the data varies according to the type of data. Common delimiters include the tab character (\t in Perl) or various punctuation characters. The delimiter should always be one which does not appear in the data.

Delimited text files are easily produced by most desktop spreadsheet and database applications (eg Microsoft Excel, Microsoft Access). You can usually choose "File" then "Save As" or "Export", then select the type of file you would like to save as.

Imagine a file which contains peoples' given names, surnames, and ages, delimited by the pipe (|) symbol:

Fred|Flintstone|40
Wilma|Flintstone|36
Barney|Rubble|38
Betty|Rubble|34
Homer|Simpson|45
Marge|Simpson|39
Bart|Simpson|11
Lisa|Simpson|9

The file above is available in your exercises directory as delimited.txt.

Reading delimited text files

To read from a delimited text file:

#!/usr/bin/perl -w

use strict;

open (INPUT, "delimited.txt") or die "Can't open data file: $!";

while (<INPUT>) {
        chomp;                  # remove newline
        my @fields = split(/\|/, $_);
        print "$fields[1], $fields[0]: $fields[2]\n";
}

close INPUT;

This should print out:

Flintstone, Fred: 40
Flintstone, Wilma: 36
...

And so on.

Searching for records

One of the common uses of databases is to search for specific records.

#!/usr/bin/perl -w

use strict;

# Find out what record the user wants:

print "Search for: ";
chomp (my $search_string = <STDIN>);

open (INPUT, "delimited.txt") or die "Can't open data file: $!";

while (<INPUT>) {
        chomp;                  # remove newline
        my @fields = split(/\|/, $_);

        # test whether the search string matches given or family name
        if ($fields[0] =~ /$search_string/ 
                or $fields[1] =~ /$search_string/) {
                print "$fields[1], $fields[0]: $fields[2]\n";
        }
}

close INPUT;

Sorting records

Sorting records from a flat text database can be quite difficult. Simply sorting the items line by line is one simplistic approach:

#!/usr/bin/perl -w

use strict;

open (INPUT, "delimited.txt") or die "Can't open data file: $!";

my @records = sort <INPUT>;

foreach (@records) {
        chomp;                  # remove newline
        my @fields = split(/\|/, $_);
        print "$fields[1], $fields[0]: $fields[2]\n";
}

close INPUT;

The above technique can only sort on the first field of the data (in the case of our example, that would be the given name) and may have difficulties when it encounters the delimiter.

To sort by any other field, we would first need to load the data into a list of lists (using references), then use the sort() function's optional first argument to specify a subroutine to use for sorting:

#!/usr/bin/perl -w

use strict;

open (INPUT, "delimited.txt") or die "Can't open data file: $!";

while (<INPUT>) {
        chomp;
        my @this_record = split(/\|/, $_);
        
        # build a list-of-lists containing references to each record
        push (@records, \@this_record);
}

# sort takes an optional argument of what subroutine to use to sort
# the data...

my @sorted = sort given_name_order @records;

foreach $record (@sorted) {
        # we have to print the items via a reference to the array...
        print "$record->[1], $record->[0]: $record->[2]\n";
}

# subroutine to implement sorting order
sub given_name_order {
        $a->[0] cmp $b->[0];
}

Obviously this can be quite tricky, especially if the programmer is not totally familiar with Perl references. It also requires loading the entire data set into memory, which would be very inefficient for large databases.

Writing to delimited text files

The most useful function for writing to delimited text files is join, which is the logical equivalent of split.

#!/usr/bin/perl -w

use strict;

open OUTPUT, ">>delimited.txt" or die "Can't open output file: $!";

my @record = qw(George Jetson 35);

print OUTPUT join("|", @record), "\n";