Backreferences

Special variables

There are several special variables related to regular expressions.

All these variables are modified when a match occurs, and can be used in any way that other scalar variables can be used.

# this...
my ($match) = m/^(\d+)/;
print $match;

# is equivalent to this:
m/^\d+/;
print $&;

# match the first three words...
m/^(\w+) (\w+) (\w+)/;
print "$1 $2 $3\n";

You can also use $& and other special variables in substitutions:

$string = "It was a dark and stormy night.";
$string =~ s/dark|wet|cold/very $&/;

If you want to use parentheses simply for grouping, and don't want them to set a $1 style variable, you can use a special kind of non-capturing parentheses, which look like (?: ... )

# this only sets $1 - the first two sets of parentheses are non-capturing
m/^(?:\w+) (?:\w+) (\w+)/;

The special variables $1 and so on can be used in substitutions to include matched text in the replacement expression:

# swap first and second words
s/^(\w+) (\w+)/$2 $1/;

However, this is no use in a simple match pattern, because $1 and friends aren't set until after the match is complete. Something like:

my $word = "this";
print if m/($word) $1/;

... will not match "this this". Rather, it will match "this" followed by whatever $1 was set to by an earlier match.

In order to match "this this" we need to use the special regular expression metacharacters \1, \2, etc. These metacharacters refer to parenthesized parts of a match pattern, just as $1 does, but within the same match rather than referring back to the previous match.

my $word = "this";
print if m/($word) \1/;

Exercises

  1. Write a script which swaps the first and the last words on each line (Answer: exercises/answers/firstlast.pl)

  2. Write a script which looks for doubled terms such as "bang bang" or "quack quack" and prints out all occurrences. This script could be used for finding typographic errors in text. (Answer: exercises/answers/double.pl)

Advanced

  1. Modify the above script to work across line boundaries (Answer: exercises/answers/multiline_double.pl)

  2. What about case sensitivity?