There are several special variables related to regular expressions.
$& is the matched text
$` is the unmatched text to the left of the matched text
$' is the unmatched text to the right of the matched text
$1, $2, $3, etc. The text matched by the 1st, 2nd, 3rd, etc sets of parentheses.
All these variables are modified when a match occurs, and can be used in any way that other scalar variables can be used.
# this... my ($match) = m/^(\d+)/; print $match; # is equivalent to this: m/^\d+/; print $&; # match the first three words... m/^(\w+) (\w+) (\w+)/; print "$1 $2 $3\n"; |
You can also use $& and other special variables in substitutions:
$string = "It was a dark and stormy night."; $string =~ s/dark|wet|cold/very $&/; |
If you want to use parentheses simply for grouping, and don't want them to set a $1 style variable, you can use a special kind of non-capturing parentheses, which look like (?: ... )
# this only sets $1 - the first two sets of parentheses are non-capturing m/^(?:\w+) (?:\w+) (\w+)/; |
The special variables $1 and so on can be used in substitutions to include matched text in the replacement expression:
# swap first and second words s/^(\w+) (\w+)/$2 $1/; |
However, this is no use in a simple match pattern, because $1 and friends aren't set until after the match is complete. Something like:
my $word = "this"; print if m/($word) $1/; |
... will not match "this this". Rather, it will match "this" followed by whatever $1 was set to by an earlier match.
In order to match "this this" we need to use the special regular expression metacharacters \1, \2, etc. These metacharacters refer to parenthesized parts of a match pattern, just as $1 does, but within the same match rather than referring back to the previous match.
my $word = "this"; print if m/($word) \1/; |
Write a script which swaps the first and the last words on each line (Answer: exercises/answers/firstlast.pl)
Write a script which looks for doubled terms such as "bang bang" or "quack quack" and prints out all occurrences. This script could be used for finding typographic errors in text. (Answer: exercises/answers/double.pl)
Modify the above script to work across line boundaries (Answer: exercises/answers/multiline_double.pl)
What about case sensitivity?