Often, you will want to read a file several lines at a time. Consider, for example, a typical Unix fortune cookie file, which is used to generate quotes for the fortune command:
% Let's call it an accidental feature. -- Larry Wall % Linux: the choice of a GNU generation % When you say "I wrote a program that crashed Windows", people just stare at you blankly and say "Hey, I got those with the system, *for free*". -- Linus Torvalds % I don't know why, but first C programs tend to look a lot worse than first programs in any other language (maybe except for fortran, but then I suspect all fortran programs look like `firsts') -- Olaf Kirch % All language designers are arrogant. Goes with the territory... -- Larry Wall % We all know Linux is great... it does infinite loops in 5 seconds. -- Linus Torvalds % Some people have told me they don't think a fat penguin really embodies the grace of Linux, which just tells me they have never seen a angry penguin charging at them in excess of 100mph. They'd be a lot more careful about what they say if they had. -- Linus Torvalds, announcing Linux v2.0 % |
The fortune cookies are separated by a line which contains nothing but a percent sign.
To read this file one item at a time, we would need to set the delimiter to something other than the usual \n - in this case, we'd need to set it to something like \n%\n.
To do this in Perl, we use the special variable $/.
$/ = "\n%\n"; |
Conveniently enough, setting $/ to "" will cause input to occur in "paragraph mode", in which two or more consecutive newlines will be treated as the delimiter. Undefining $/ will cause the entire file to be slurped in.
undef $/; $_ = <FH>; # whole file now here |
Special variables are covered in Chapter 2 of the Camel book, from page 127 onwards. We're going to be looking at more special variables soon, so mark the page now. The information can also be found in perldoc perlvar.
Since $/ isn't the easiest name to remember, we can use a longer name by using the English module:
use English; $INPUT_RECORD_SEPARATOR = "\n%\n"; # long name for $/ $RS = "\n%\n"; # same thing, awk-like |
The English module is documented on page 403 of the Camel or in perldoc English.
In your directory is a file called exercises/linux.txt which is a set of Linux-related fortunes, formatted as in the above example. Use multiline regular expressions to find only those quotes which were uttered by Larry Wall. (Answer: exercises/answers/larry.pl)
The /s and /m modifiers can be used to treat the string you're matching against as either a single or multiple lines. In single line mode, ^ will match only at the start of the entire string, and $ will match only at the end of the entire string. In multiline mode, they will match at embedded newlines as well.
my $string = qq( This is some text and some more text spanning several lines ); if ($string =~ /^and some/m) { # this will match print "Matched in multiline mode\n"; } if ($string =~ /^and some/s) { # this won't match print "Matched in single line mode\n"; } |
In single line mode, the dot metacharacter will match \n. In multiline mode, it won't.
The differences between default, single line, and multiline mode are set out very succinctly by Jeffrey Friedl in Mastering Regular Expressions (see the Bibliography at the back of these notes for details). The following table is paraphrased from the one on page 236 of that book.
His term "clean multiline mode" refers to a mode which is similar to multi-line, but which does not strip the newline character from the end of each line.
Table 3-2. Effects of single and multiline options
Mode | Specified with | ^ matches... | $ matches... | Dot matches newline |
---|---|---|---|---|
default | neither /s nor /m | start of string | end of string | No |
single-line | /s | start of string | end of string | Yes |
multi-line | /m | start of line | end of line | No |
clean multi-line | both /m and /s | start of line | end of line | Yes |