Regular expressions, which you have had a glimpse of in the previous section, allow us to specify complex patterns relatively simply. They can make complicated operations possible in two or three lines of code. There are times when you want to perform a more complicated pattern match than those we used as example in the previous section, such as finding the first 5 letters of a string, the first occurrence of an embedded numeric value or even the first character of a string. These matches will require the use of regular expressions. Below is a table containing a few of the most common regular expressions:

Regular Expressions

Regular Expression Description Example
Note that all the if statements return a TRUE value
. Matches an arbitrary character, but not a newline.
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
  print "$string1 has length >= 5\n";
}
( ) Groups a series of pattern elements to a single element. When you match a pattern within parentheses, you can use any of $1, $2, ... $9 later to refer to the previously matched pattern.
Program:
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'\n";
}
Output:
We matched 'Hel' and 'o W';
+ Matches the preceding pattern element one or more times.
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
  print "There consecutive l's in $string1";
}
? Matches zero or one times.
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
  print "There is an 'H' and a 'e' no more ";
  print "than 2 characters afterwards.\n";
}
? Matches the *, +, or {M,N}'d regexp that comes before as few times as possible.
$string1 = "Hello World\n";
if ($string1 =~ m/(l+?o)/) {
  print "There small match with 1 or more 'l'
  print "followed by an 'o' is 'lo', not 'llo'.\n";
}
* Matches zero or more times.
$string1 = "Hello World\n";
if ($string =~ m/el*o/) {
  print "There is a 'e' followed by some";
  print "'l' (maybe) followed by 'o'\n";
}
{M,N} Denotes the minimum M and the maximum N match count.
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
 print "There exists a substring with 1";
 print "or 2 l's in $string1";
}
[...] Denotes a set of possible matches.
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]/) {
  print "$string1 contains a vowel\n";
}
| Matches one of the left or right operand.
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi)/) {
  print "Hello or Hi is ";
  print "contained in $string1";
}
\b Matches a word boundary>
$string1 = "Hello World\n";
if ($string1 =~ m/\bllo\b/) {
  print "This will not match because";
  print "the llo is not a word.";
}
\w Matches alphanumeric, including "_".
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
  print "There is at least one alpha-";
  print "numeric char in $string1";
}
\W Matches a non-alphanumeric character.
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric\n";
}
\s Matches a whitespace character (space, tab, newline, formfeed)
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
  print "There are TWO whitespace ";
  print "characters in $string1";
}
\S Matches anything BUT a whitespace.
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
  print "There are TWO non-whitespace ";
  print "characters in $string1";
}
\d Matches a digit, same as [0-9].
$string1 = "99 beers on the wall\n";
if ($string1 =~ m/\d.*\d/) {
  print "There are TWO digits in $string1";
}
\D Matches a non-digit.
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
  print "There is >= 1 non-digit in $string1\n";
}
^ Matches the beginning of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
  print "$string1 starts with a He\n";
}
$ Matches the end of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/rld\n?$/) {
  print "$string1 is a line or string";
  print "that ends with rld\n";
}

To allow for greater leeway in parsing, the slashes (/) in these operators can be replaced by any non-alphanumeric character such as: ~,!,@,#,$, etc. This is particularly useful when you don't want to escape the slashes in a search pattern.  For example,

$string1 = "Escaped slashes look like valleys \/\/\/.";
if ($string1 =~ m/Escaped slashes look like valleys ///./) {
  print "Whatchew talkin' 'bout, Willis?\n";
}

is illegal, whereas

$string1 = "Escaped slashes look like valleys \/\/\/.";
if ($string1 =~ m%Escaped slashes look like valleys ///.%) {
  print "Back to legal Perl\n";
}

is legal. Think for a second about why the first program is illegal, while the second is valid Perl. Try running each of them to verify which works and which does not. In the latter case, discover what it is that causes the Perl interpreter to complain.

Regular expressions are one of the more powerful features of the Perl programming language. If you decide to master only one aspect of the language, make regular expressions that aspect. In fact, regular expressions appear in so many different places in the Unix environment that it would be a crime not to master them.