The Perl Scripting Language

Perl is somewhat different from the other programming languages that you may be familiar with. Its evolution and purpose are different; it is hailed as the language of "getting the job" done. If the jobs are simple, Perl makes them very simple. If the jobs are difficult, Perl will still make implementing them possible.

The Perl course notes are separated into the following categories:

Perl History and Philosophy

In the beginning, Perl was a language that Larry Wall wrote to print formatted reports. Due to the extreme report formatting capabilities of the language, Wall decided to call it the Practical Extract Report Language. Today, Perl is better known as the Pathological Eclectic Rubbish Lister.

Perl inherits its eclectic nature from languages such as AWK, SED, sh and C. The combination of these languages made Perl into its own unique language. You will find that many of your gripes about languages such as C or C++ have been eliminated in Perl. For example:

On the other hand, Perl does very little error-checking by default, and hence bugs can easily cause your program to misbehave without producing any error messages. The next section tells you how to make the Perl interpreter pay a bit more attention to what your code is doing, so it will detect many of these mistakes and produce an error message.

You may find Perl to be a different, and possibly more "natural" programming language than some of the other programming languages that you are familiar with. This is in part due to being designed by someone familiar with linguistics.

Perl can be used for small jobs (scripts), or for much larger applications. For instance, the original version of WebCT consisted of over 50,000 lines of Perl. But one of the nicest things about learning Perl and using it to make your life easier is that very useful programs can be squeezed into a single line. For example,

% perl -i -pe 's@search pattern@replace pattern@g' file
replaces all occurrences of search pattern with replace pattern in file, something that is often useful but difficult to do in UNIX without some scripting knowledge (do not worry about the syntax for now). As another example, just recently HotWired's Webmonkey published an entire search engine written in 4 lines of Perl. However, this programming practice often makes for unreadable "write-only" programs, and we strongly urge you not to attempt this kind of feat.

Why is it that Perl is so often associated with a camel? There are numerous explanations, ranging from Biblical references to a random selection. The camel is a funny and awkward looking animal at first glance, and not the most attractive animal in the wild kingdom. But, when you find yourself in the extreme harshness of the desert, it won't take long for you to appreciate this extremely well adapted and versatile creature. At times the desert seems a much more tranquil and safe place than a UNIX environment. Here is a quote playing on the idea of a camel being designed by a committee:

No committee could ever come up with anything as revolutionary as a camel -- anything as practical and as perfectly designed to perform effectively under such difficult conditions.

           --Laurence J. Peter

How To Run and Debug a Perl Script

Perl is defined by the familiar ASCII character set, with white space separating the tokens that could be confused as a single token. Comments are inserted by placing the # character at the beginning of the line. That is, they are exactly the same as for Shell scripts (# works the same way as // in C++). You can also comment out sections of your file by placing an = at the beginning of the line, and ending the comment with a =cut. This is usually used for documentation, and the Perl distribution provides some utilities that will convert this documentation to LaTeX, HTML, or manpage documents.

There are three ways to run a Perl script in the Unix environment.

  1. Entering the script directly at the shell prompt.
  2. Saving the Perl script in a file, then providing the filename as an argument to the Perl interpreter.
  3. Saving the Perl script in a file with #! ("shebang") notation, making the file executable, and typing the filename to execute the script.

Entering a script at the shell prompt

Perl scripts can be entered directly at the shell prompt with the following syntax:

% perl -e 'script'

This runs the Perl interpreter with the script as an argument. The -e option prevents the Perl interpreter from looking for a file named 'script', and from complaining when it is unable to find one. Also, the single quotes surrounding the script are crucial, since otherwise the shell may try to interpret parts of your Perl script!

For instance, the following command prints the string "Hello World!" to the screen followed by a newline character:

% perl -e 'print "Hello World!\n";'

Although this method is very fast for executing short "run-once" type scripts, it can become extremely tedious when a script is used more frequently.

Providing a Filename Argument to the Perl Interpreter

Suppose that you are so fond of the script that prints "Hello World!" to the screen that you run it at least ten times a day. You will quickly find that you are tired of typing the 34 characters (including the carriage return) required to execute your script. Thankfully, the Perl interpreter is just as happy to execute a script saved in a file as it is to execute a script provided on the command line. For instance, consider the following example:

% cat > hi.pl
print "Hello World!\n";
% perl hi.pl
Hello World!

We stored our script in the file hi.pl (the .pl extension is a naming convention, similar to the .cpp extensions that you are probably familiar with for C++ program files). The Perl interpreter then read the script from the file, and executed it.

Making an Executable Interpreted File using "Shebang Notation"

Suppose now that we would like people to be able to run the script without knowing if it is a Perl script, a shell script, or an executable file obtained by compiling a C++ program. We need to avoid having to specify the name of the Perl interpreter on the command line. This can be done by including all of the necessary information inside the file. If we set the file's execute permissions, then all we will have to type to execute the script is the filename. The following example illustrates how this is done:

%  cat > hi.pl
#!/cs/local/bin/perl
print "Hello World!\n";
%  chmod +x hi.pl
%  ./hi.pl
Hello World!

The first line in file hi.pl is what is known as "shebang notation". This is because the # character is often referred to as sharp and the ! character is referred to as bang. Immediately following these two characters is the path of the Perl interpreter used to execute the script. Note that this path may be different outside of the Undergraduate servers. For this notation to work, this line must be the first line of the file. Recall that the command chmod +x hi.pl sets the user execute permission bit of the hi.pl file. Now all that is left to do is type the name of the script at the command line to run the script. The process of executing a Perl script cannot be simplified further.

Debugging, and other Perl Interpreter Options

There are various interpreter options that can be specified when running a Perl script. This is tantamount to the various compiler options that can be specified when compiling a C++ program.

If something goes wrong, you may get error messages or you may get nothing. You should always run the program with all warnings enabled using the command:

% perl -w filename
at the prompt. This will display warnings and other messages before the interpreter attempts to execute the program. We also strongly recommend that you insert the command
use strict;
as the very first line in every Perl script you write. Perl will check for potential mistakes, thereby helping you to avoid common Perl pitfalls.

To run the program with a debugger you can use the command:

% perl -d filename
or, alternatively,
% ddd filename.pl
and as long as filename ends with .pl, ddd will figure out that it is a Perl script that you want to debug.

If you would like to print out information regarding the version of Perl that you are currently running, you can use the -v option. Hence the command:

% perl -v
will display all of the version information.

Perhaps the most important option is -h. This displays a summary of all of the Perl interpreter options. The command:

% perl -h
will display the options available. Type this command at the prompt now and browse through the various options.

For more information about the execution of Perl scripts, you can consult the "Execution" section of the Perl man pages by typing man perlrun at the prompt. Variables

Just like every other programming or scripting language you have been exposed to, Perl allows you to create and use variables (we will call variable an association between a name and a value). The languages that you are already familiar with fall into two categories:

Perl falls somewhere in between these two categories: it has several kinds of variables, and we can determine the kind of a variable by looking at the first character of its name. However, the exact type of the value associated with a given variable name may not be known until the program is run. Perl uses the following conventions for variable names:

Kinds of Variables

Scalars

Scalars are the most commonly used variables in Perl. A scalar variable (or, more precisely, the use of a variable in a scalar context), always starts with a $ sign, and contains a single value by definition.

If you attempt to use a scalar variable that has not been assigned a value, you will get the value 0 or "", depending on whether it is to be used as number or as a string. The statement use strict;, mentioned in the section How to Run and Debug a Perl Script, will warn about such attempts.

Unless you know what you are doing, do not use numeric or special characters immediately after the $ in the name of one of you variables. For instance, the variables $1, $9, $$, $&, all have predefined meanings. Unlike normal scalars, these predefined scalars have a default value that is not necessarily 0 or "", and in most cases they should only be read, and not modified.

When looking at a Perl program, because of your C++ background, you will probably be asking yourself: are these scalars of type int, float, char or string? The answer is: all of the above. Think of scalars as magical strings. Their type will change according to the value that they contain. For example, the program

$string_or_int = "Hello";
print "$string_or_int World\n";

$string_or_int = 12;
print "Twelve * two equals: ";
print $string_or_int * 2;
print "\n";

would output

Hello World
Twelve * two equals: 24

Arrays

Arrays are similar to their C++ counterpart, and contain a group of scalars that are accessed by their position in the array. An array name must start with the @ character, and is usually followed by one or more alphabetic characters. We can create and initialize an array by putting the elements of this array in parentheses, separated by commas. For instance,

@letters = ('a', 'b', 'c', 'd');

creates an array with 4 elements. Similar to predefined scalars, there also exist predefined arrays such as @ARGV (the command line arguments to a Perl script) and @INC (the list of directories that Perl will search for modules). All predefined arrays are capitalized and hence you should name your arrays using lower case characters to avoid potential name conflicts.

As in most other programming languages, a specific element in a Perl array is referenced using the [] operator:

$arrayname[index]

Observe that because each array element is a scalar, we use $arrayname[index] and not @arrayname[index]. The first element of the array is $arrayname[0], and the last element is $arrayname[$#arrayname], where $#arrayname is a predefined scalar whose value is the position of the last element of array @arrayname.

To assign more than one element of an array to individual variables, one can use the [] operator and perform the assignments individually. An alternative option is to do the assignments in parallel using the following syntax:

(var1, var2, ..., vart) = @arrayname;
by which var1 gets assigned $arrayname[0]$, var2 gets assigned $arrayname[1]$, etc. If one of var1, var2, vart is not a scalar (e.g. if it is an array), then it will be assigned the remaining elements of @arrayname. For instance, the statement
($x, $y, @a, $z) = (1, 2, 3, 4, 5)

would assign 1 to $x, 2 to $y, the array (3, 4, 5) to @a, and nothing (the undefined value) to $z.

Note that if we use an array name, such as @letters, in a context where Perl expects a scalar, Perl will use the number of elements of the array as the value. So, in the following example, $size is assigned the value 4:

@numbers = (5, 6, 7, 8);
@letters = ('a', 'b', 'c', 'd');
@numbers_and_letters = (@numbers, @letters);

$size = @numbers."";

print "Look mom, I can count: @numbers\n";
print "I know some characters too.  See: @letters\n";
print "I'm on a roll now: @numbers_and_letters\n";
print "The first letter in the alphabet is $letters[0]\n";
print "The last element in my numbers array is $numbers[$#numbers]\n";
print "The number of elements stored in my numbers array is $size\n";

and the output produced is:

Look mom, I can count: 5 6 7 8
I know some characters too.  See: a b c d
I'm on a roll now: 5 6 7 8 a b c d
The first letter in the alphabet is a
The last element in my numbers array is 8
The number of elements stored in my numbers array is 4

As you can see from this example, the use of the array data type in Perl is similar to the use of arrays in any other programming language.

Hashes

The third and final type of Perl variable is referred to as a hash or associative array. An associative array is simply a convenient form of one-to-one hashing. Think of it as an array indexed by strings, instead of by integers. More generally, the key and the value associated with the key are both scalars.

The name of a hash begins with the % character, and is usually followed by one or more alphabetic characters. Like arrays, there are a few predefined hashes like %ENV (contains your current environment, $PATH, $DISPLAY, etc.) and %SIG. All predefined hashes are capitalized and hence you should name your associative arrays with lower case characters if you are unsure if there is a name conflict. Here is an example of how to use a hash. The program

%responsibilities = ( 
    "Ian" => "instructor", "Moyra" => "systems administrator", 
    "George" => "instructor", "Bob" => "department head"
);

print "Ian's job: $responsibilities{'Ian'} \n";
print "Bob's job: $responsibilities{'Bob'} \n";

will produce the following output:

Ian's job: instructor
Bob's job: department head

Keys must be unique. However, values do not have to satisfy this condition and hence we can have many keys with the same value (e.g. "instructor"). You can see how the hash data type has the potential to be very useful. A hash data type is to Perl what the alist (association list) data structure is to Scheme.



Helpful Tips

Hashing - is the process of placing an item into a data structure using a key-to-address transformation.  Hashing is used to increase efficiency for searching and retrieving items in the data structure.  The data structure that is most often used is a table (often called a hash table). The key, which is used to index the hash table, is most often the item to be inserted or searched for.  The address of the item in the hash table is generated by applying a "hashing function" to the key. One of the most common hashing functions is the modulo division function. Given a key (K) we can generate an index (I) into the hash table of size (H) with the division function:

I = K mod H

Suppose we have a hash table of size 10 that we will use to store integer items. First we wish to insert the integer 1. Using the modulo division hash function, we obtain the address 1 (since 1 mod 10 = 1). Next we want to insert the integer 11. Using the modulo division hash function, we obtain the address 1 (since 11 mod 10 = 1). These two insertion operations have resulted in a collision. That is, two keys have mapped to the same index in the hash table. What can we do to resolve the collision? The simplest collision resolution technique is called Linear Probing. Linear Probing works as follows. If you encounter a collision at address A, look at address A+1 to see if it is occupied. If it is occupied, look at address A+2 and so on. When an unoccupied address is identified, insert the item into it.

To retrieve items from a hash table that uses linear probing collision resolution, we apply the hash function to the requested item and obtain an address, A. We look up the address A in the hash table. If the address is not occupied, we conclude that the item is not in the hash table. If the address is occupied, we check to see that the correct item is the occupant. If this is not the case, we look at A+1, and so on.

The advantage of using hashing for storing a large number of items instead of using linked-lists is that the search and insert operations can be done in some constant amount of time (the amount of time used to compute the hash function). When using a linked-list, a traversal of the list is required to search for and insert items. This takes time proportional to the size of the linked list. Compared to standard arrays, hash tables make more efficient use of space for items taken from large data sets. Consider the case where I wish to store two numbers 1 and 1000. In an array, I would store the number 1 in index 1 and the number 1000 in index 1000. Thus, I must use an array with minimum size 1001. If I was using a hash table, I could use a table of size 5, store the number 1 at index 1 and the number 1000 at index 3 (using the modulo division hash function). Context, and more on variables

The last section discussed Perl variables: scalars, arrays, and hashes. Variables are one of the important terms that you will need to understand in order to master Perl. In this section, you will also learn about literals, and about the context in which terms operate.

Variables

Recall from the last section that each of the 3 types of variable is labeled with a prefix character: $ for scalars, @ for arrays and % for hashes. It may help you to think of the $ character as being analogous to the "the" article in English:

Scalar Variables
Construct Meaning
$books A simple scalar variable.
$books[23] The 24th book in the array @books.
$books{'The Pearl'} A specific value from the hash %books.
$#books The position of the last element of the array @books.
@books."" The size of array @books.

The @ prefix can be interpreted as the "these" or "those" articles:

Arrays and Array Slices
Construct Meaning
@books ($books[0], $books[1], . . . , $books[n])
@books[2..4] ($books[2], $books[3], $books[4]) or @books[2,3,4]
@books{'The Pearl', 'Cannery Row'} ($books{'The Pearl'}, $books{'Cannery Row'})

Similarly, the % prefix is used to refer to hashes:

Hashes
Construct Meaning
%books ('The Pearl' => 1947, 'Cannery Row => 1945, . . . )

There are some important points to keep in mind when using variables.

Literals

Perl does not make any radical departures from the usual floating point, integer or string formats as they are used in UNIX. Since Perl borrows numerous conventions from other programming languages, including C++, you will not need to relearn them for Perl.

Numeric literals are the numbers used within your program, as opposed to strings or data read in from other files.

Numeric Literals
Format Meaning
1234567 Integer
12345.67 Floating point
1.23E45 Scientific notation
0xffff Hexadecimal
0377 Octal
1_234_567 Uses _ between groups of digits, as a delimiter

Note that in the style of C++, the leading 0 denotes octal (base 8) and 0x denotes hexadecimal (base 16). You should keep this in mind when processing decimal numbers that contain leading 0s: they will be interpreted in octal.

String literals are similar to those used in shell scripts: double quoted literals will use special characters (denoted by a backslash) and interpolate variables, while single quoted strings do not interpolate variables. Keep in mind that (\') and (\\) will always be interpolated so that you can use the backlash and single quote characters in single-quoted strings. The following example illustrates the difference between single and double quotes:

$Pay = '$4';                               # not interpolated
print "They pay me $Pay per hour.\n";      # interpolated
print 'They pay me $Pay per hour.\n';      # not interpolated

It produces the following output:

They pay me $4 per hour.
They pay me $Pay per hour.

However, the function of quotes goes beyond denoting literals. You can think of quotes as operators. As usual, TIMTOWDI, and you can choose your quotes from the table below, keeping in mind the special behavior associated with some of them.

Quotes
Customary Generic Meaning interpolates?
'' q// Literal No
"" qq// Literal Yes
`` qx// Command Yes
() qw// Word list No
// m// Pattern match Yes
s/// s/// Substitution Yes
tr/// y/// Translation No

The customary and generic forms can be used interchangeably. For instance, the statement $x = 'I am paid $4'; is the same as the statement $x = q/I am paid $4/;. Choose the style that suits you and be consistent. Quotes ', ", and ` behave exactly like they do for the C-Shell and the Bourne Shell. The last three entries in the table are discussed in the section on pattern matching.

Context

Before you can master Perl terms, you must have an understanding of the context in which those terms are used. This means that the data type (and thus the value) of an expression may depend on what Perl will do with the result. We already saw that if Perl expects a scalar, then the value of @arrayname is the length of the array.

You can think of context in terms of operators being overloaded if you are familiar with that concept. In this case, it is the operator that determines the type/value of the operands, rather than the other way around. The notion of context is intuitive since it is so common in natural languages. For example, the meaning of the verb "finish" varies from the phrase "I'm going to finish this course" to the phrase "I'm going to finish him off".

It is especially important to recognize the distinction between the singular context of scalars and the plural context of lists. Scalar context extends to string values, numeric values, and the special don't care context, where no conversion is done. An example of an operator that invokes the don't care context is the assignment operator: if you write $foo = someExpression, the variable $foo will take the subtype (numeric or string) of the scalar someExpression.

When you are using list operations, list context will be specified in the documentation of those operations. However, you are still able to use scalar context on the list by using the scalar() function.

For example, let us consider the output from the following set of statements:

@myArray1 = (1,2,3,4,5);
@myArray2 = (6,7,8,9,10);
print @myArray1+@myArray2;

The scalar operator + will place the arrays into their respective scalar contexts. The scalar context for an array is its length. Thus, the output from the print statement will be the integer 10. If what we really wanted was to print the concatenation of the two arrays, we would use the statement:

print (@myArray1,@myArray2);

You should also be aware of boolean context. In this case, a null string ("") or the number 0 represents the value false, while any other value including a reference will represent true. No conversions are made with boolean context, so it is analogous to the don't care context mentioned above.

If this seems confusing, don't worry. The main thing to remember is that using a list when Perl expects a scalar (or the other way around) will result in a possibly unwanted type conversion. Operators

Now that we have learned about terms, we need to know more about how to manipulate them. Precedence decides the order in which operations are evaluated -- those of highest precedence are evaluated first. That is, the expression 2 + 3 * 4 would be evaluated as 2 + (3 * 4) instead of (2 + 3) * 4, because operator * has higher precedence than operator +.

Important: Perl allows you to omit parentheses around the arguments to built-in (and other) functions. This feature introduces several complications with respect to the evaluation of expressions in which it is used. We will thus ignore it in the discussion that follows, and strongly recommend that you avoid it.

The following table lists the Perl operators sorted from highest precedence to lowest. A table similar to the one we present here, but more complete, appears in "Programming in Perl" (see references). The interested reader can consult this book for additional details.

Operators

Associativity Operators Meaning
Left Terms Numbers, values in parentheses or quotes, function calls
Left -> Infix dereference (see the section on references)
Nonassociative ++ -- Autoincrement, autodecrement
Right ** Exponentiation
Right ! ~ \ - (unary) Negation, reference (see the section on references)
Left =~ !~ Pattern matching operators (see the section on pattern matching)
Left * / % x Multiplication, division, modulo, repetition
Left + - . Addition, subtraction, string concatenation
Left << >> Bitwise shift left, shift right
Nonassociative < > <= >= lt gt le ge Comparison operators
Nonassociative == != <=> eq ne cmp Equivalence operators
Left & Bitwise AND
Left | ^ Bitwise OR, XOR
Left && Logical AND
Left || Logical OR
Right ?: Alternation (if-then-else) operator
Right = += -= *= /= Assignment operators
Left , => Like C++'s comma operator, digraph operator
Right not Logical NOT
Left and Logical AND
Left or xor Logical OR, XOR

Although this may seem like a lot to memorize, you will realize that it is quite intuitive when you are writing Perl. Since you are familiar with C++, remember that all of the common operators preserve their precedence. If you find yourself unable to recall some of the ordering, using parentheses will always force the precedence that you want.

We have separated the operators into seven groups based on the type of processing that they are used for. The groups are:

Term and List Operators

Terms, which you learned about in the last section, hold the highest precedence. These include variables, quote and quote-like operators, any expression in parentheses, and any function whose arguments are parenthesized.

Arithmetic Operators

Most arithmetic operators behave behave the same way in Perl as they do in C++. As such, the following table only contains those that do not exist in C++, or with which you may not be familiar.

Operator Description Result
$a ** $b Exponentiation $a raised to the $b power
$a % $b Modulo Division If $b is positive, $a minus the largest multiple of $b not greater than $a; If $b is negative, $a minus the smallest multiple of $b not less than $a
$a | $b, $a & $b, $a ^ $b Bitwise OR, AND, XOR Returns $a OR/AND/XORed bit by bit with $b.
$a << $b, $a >> $b Bitwise shift left, shift right Shifts $a left or right by $b bits
~$a Bitwise complement Returns the bitwise negation (1's complement) of $a
$a <=> $b Numeric compare Returns -1 if $a < $b, 0 if $a = $b, or 1 if $a > $b

Here are a few important points that should be kept in mind:

Here are a few simple examples:

1 << 3;                     # bitwise shift left, returns 8
16 >> 3;                    # bitwise shift right, returns 2
5**3;                       # exponentiation, returns 125
16 % 3;                     # modulo division, returns 1
0xffff & 0x0000;            # bitwise AND, returns 0x0000
0xff00 | 0x0011;            # bitwise OR, returns 0xff11

Assignment Operators

Beyond recognizing all of the standard C++ assignment operators, Perl also provides additional compound operators. Each of them takes an lvalue (the value on the left-hand-side of the operator), which can be a variable or array element, and assigns to it the value of the expression on the right of the operator. The compound operators are used in the form:

$var OP= $value
which has the same semantics as
$var = var OP $value

Every operator listed above (except function calls) has a corresponding compound operator. Hence, for instance, there is an **= operator such that

$x = 3; 
$x **= 4;

sets the value of $x to 81.

Also, recall from the Operators Table that assignment operators bind right to left, so compound assignments are not a problem. The statement

$x = $y = $z = 3;
results in assigning 3 to $z, then to $y, and finally to $x.

Conditional and Logical Operators

Instead of using the if-then-else statement, you can also sometimes use the conditional operator (?:), which has the following form:

test_expression ? iftrue_expression : iffalse_expression

For example, you could write

($today eq "saturday")  ? print "whoohooo"  : print "doh!";

instead of

if ($today eq "saturday")
{
    print "whoohooo"
}
else
{
    print "doh!";
}

Perl also has the same logical operators as C++. They are evaluated in the same way, that is, the right operand is only evaluated if the value of the expression can not be determined by looking at the left operand only. For instance, if the and operator is used and the first operand is false, then the second operand does not need to be evaluated.

Logical Operators

Operator Name Meaning
$x && $y or $x and $y AND Evaluates to $y if both $x and $y are true, and to false otherwise
$x || $y or $x or $y OR Evaluates to $x if $x is true and to $y otherwise

This allows you some interesting flexibility with control structures. For example:

open(SESAME, "filename") or die "go away";

will have the effect of first evaluating the open function. If that succeeds, the die function will not get evaluated. This makes for nicely readable code, since the syntax is similar to English. Note that older versions of Perl (before version 5) only support the symbolic operator && and ||, not the words and and or.

String Operators

Perl includes some useful string operators:

String Operators

Operation Description Result
$a . $b Concatenation Values of $a and $b joined as one long string
For example: "Hello " . "World" == "Hello World"
$a x $b Repeat Value of $a strung together $b times
For example: "hee" x 4 == "heeheeheehee"
substr($s, $pos, $len) Substring Substring of length $len starting at position $pos
For example: substr("Hello", "0", "2") == "He"

Perl also includes a set of string comparators. Note that the operators are not not same as the corresponding numeric operators.

String Comparators

Operation Description  Condition Returning True
eq Equal to "Hello" eq "Hello"
ne Not equal to "Hello" ne "World"
gt Greater than "a" gt "A"
ge Greater than or equal "abc" ge "A"
lt Less than "a" lt "z"
le Less than or equal "abc" le "z"

The (.) operator is used to concatenate two strings.However, this can simply be done by placing the two strings within double quotes. That is, $date = $day . $month . $year; is equivalent to $date = "$day $month $year";

The repetition operator can also be used to initialize lists, arrays, or hash slices. For example:

@stars = ("*") x 50;          # initializes a list of 50 *'s

File Test Operators

Perl's history is very closely associated with reading and writing files, and as a result, Perl has numerous built-in file test operators. These take either a filename or a filehandle (filehandles are discussed in another section), and test the file for the specified condition. The file test value usually returns 1 when the condition evaluates to true, and the empty string when it is false. Several of these operators will be familiar to you from the notes on Unix and scripting.

File Test Operators

Operator Description
-r File is readable
-w File is writable
-x File is executable
-o The person running the script owns the file


-e File exists
-z File has zero size
-s File has non-zero size (returns size)


-f File is a plain file
-d File is a directory
-l File is a symbolic link


-T File is a text file
-B File is a binary file (opposite of -T)


-M Age of file (at startup) in days since last modification
-A Age of file (at startup) in days since last access

Miscellaneous Operators

Another useful operator is the range (..) operator. It can be used, for instance, to denote the range for a for loop. If we wanted to print "Hello World" 10 times, we could use the range operator in the following way:

for $i (1..10) {
    print "$i Hello World\n";
}
Flow Control

In Perl, like in C++, all simple statements must end with a semicolon. A compound statement is a sequence of statements and is commonly referred to as a block.  A block has its own scope, and is usually surrounded by curly {} braces. Perl uses control structures such as if ... else and while in a way that is very similar to other languages like C, C++ and Java. However, it gives you more flexibility with their syntax that these other languages.

Conditionals

If statements

An if statement in Perl looks almost identical to the if statements you are already familiar with. It has the general form:

if (condition) block { elsif (condition) block } [ else block ]

where { x } means that x can occur zero or more times, and [ x ] means that x can occur zero or one time. Note that, contrarily to C, C++ and Java, the block must be enclosed in curly braces, even if it is a single statement. In other words,

if ($a > 5) print "TRUE\n";

is illegal in Perl, and instead we need to use

if ($a > 5) { print "TRUE\n"; }

Unless statements

There is also unless statement that works in exactly the same way as the if statement, except that the first condition is negated. That is, the statement:

if ($a > 5) 
{ 
    print "TRUE\n"; 
} 
elsif ($a == 3) 
{ 
    print "MAYBE\n";
} 
else 
{
    print "FALSE\n";
}

is the same as the statement:

unless ($a <= 5) 
{ 
    print "TRUE\n"; 
} 
elsif ($a == 3) 
{ 
    print "MAYBE\n";
} 
else 
{
    print "FALSE\n";
}

Note that there is no elsunless keyword.

Switch statements

Unlike the if and unless statements, there is no official switch statement. Fortunately, due to the flexibility of Perl, there are many ways to create a switch statement. The easiest, but perhaps not the most attractive, way to mimic the behaviour of a switch statement is to use a series of if-elsif statements nested inside of a for loop. Another alternative, which resembles C/C++ switch statements more closely, is to use a label, a block, some if statements, and the last keyword. The label, which is SWITCH in the example below, is simply used to identify the block. This facilitates the use of control statements such as last, which we use here, next, and redo. The last keyword is like the C/C++ break keyword. It is used to immediately terminate execution in the labeled block, or if no label is given, the innermost block. Here is a sample program:

@factor = ("quadruple", "halve", "double", "triple");

for ($i = 0; $i < @factor.""; $i++)
{
    $amount = 1000;

    SWITCH: {
	if ($factor[$i] eq "double") 
	{ 
	    $amount *= 2; 
	    last SWITCH; 
	}
	if ($factor[$i] eq "triple") 
	{ 
	    $amount *= 3; 
	    last SWITCH; 
	}
	if ($factor[$i] eq "quadruple") 
	{ 
	    $amount *= 4; 
	    last SWITCH; 
	}
	
	print "I do not know how to $factor[$i].\n";
	$amount = 0;
    }

    print "You wanted to $factor[$i] the amount.\n";
    print "The new amount is $amount.\n\n";
}
The output of the program is:
You wanted to quadruple the amount.
The new amount is 4000.
 
I do not know how to halve.
You wanted to halve the amount.
The new amount is 0.
 
You wanted to double the amount.
The new amount is 2000.
 
You wanted to triple the amount.
The new amount is 3000.

For more examples of how to create a switch statement, please consult the following pages: http://www.perl.com/CPAN-local/doc/manual/html/pod/perlsyn.html#Basic_BLOCKs_and_Switch_Statemen, http://language.perl.com/misc/fmswitch, or http://www.perl.com/CPAN-local/doc/manual/html/pod/perlfaq7.html#How_do_I_create_a_switch_or_case.

Loops

Perl has a while loop and a for loop that are almost identical to their C++ or Java counterparts, the only difference being that the body of the loop must be enclosed between curly braces (just like the blocks in the if statement). The other two looping constructs that it supports are the do...until loop, and the foreach loop.

do...until loops

The do...until loop is similar to the do...while loop in C++, except that the condition is negated. Its general form is: do block until (condition)

which is equivalent to do block while (!condition)

in C++.

foreach loops

A foreach loop has the form: foreach $scalar (@array) block

and executes block as many times as there are elements in @array, with $scalar successively equal to the first, second, third, etc, element of the array. It is suggested that you use the foreach construct in favour of the regular for and while constructs whenever it is possible. In most cases, this makes for a smaller program and eliminates the notorious "off-by-one-error".

Note that the keywords for and foreach are, in fact, interchangeable. However, for readability reasons we recommend that you use them as described in this section.

Statement modifiers

Finally, Perl does give statements one unexpected option: they can have one of the following modifiers, placed after the statement, and not to be confused with the control structures described above (even though the two are closely related).

Statement Modifiers

Modifier Meaning Example
if expr Simple conditional if talk_to('me') if $you_care;
unless expr Akin to if-not call('me') unless $you_want_to_die;
while expr Evaluates repeatedly as long as expr is true $counter++ while -e "$file$counter";
until expr Evaluates repeatedly as long as expr is false run('you') until $you_drop;

Reference Data Types

A reference is a scalar that is like an arrow pointing to another Perl variable. In the real world, names are one kind of reference that you are already familiar with. For instance, consider the person who developed the theory of relativity. He was a complicated living organism (a human being), but if you want to talk about him, you would refer to him by his name, Albert Einstein. Unlike the name of a person (e.g. Peter Smith), a reference is unambiguous because it can only point to one distinct variable at a time.

Creating References

There are two ways to create references to variables. The first method is to precede the variable name with a \ symbol, such as:

$arrayRef = \@myArray;         # $arrayRef holds a reference to @myArray
$hashRef = \%myHash;           # $hashRef holds a reference to %myHash

The second method uses un-named references, or reference literals. This is analogous to using the number 55 or the string "Hi Bob\n" in a program without storing them in variables first. A statement of the form [ item, item, item, ...] creates a new array and returns a reference to the array. A statement of the form { item, item, item, ... } creates a new hash and returns a reference to the hash. For example:

$arrayRef = [2 , "hello", undef, 13 ]; # $arrayRef holds a reference to an array
$hashRef = { APR => 4, AUG => 8 };     # $hashRef holds a reference to a hash

The references created by the two methods are equivalent. For example, the statement: $arrayRef = [ 1, 2, 3 ]; is equivalent to:

@array = (1, 2, 3);
$arrayRef = \@array;

Using References

Since a reference is a scalar variable, you can store and assign the reference like any other scalar variable. For example, if $arrayRef is a reference to an array and $hashRef is a reference to a hash, then you can write:

$x = $arrayRef;               # $x holds a reference to the array
$arr[3] = $hashRef;           # $arr[3] holds a reference to the hash
$y = $arr[3];                 # $y holds a reference to the hash

One way to access the data that is referred to by the reference is to use the {} operator. If we store a reference to an array in the variable $arrayRef, then we can access the array that is being referenced by using the statement {$arrayRef}. Compare the use of an array and the use of a reference to an array:

Statement using a normal array Equivalent statement using a reference to the same array Result of executing either statement
@myArray @{$myArrayRef} An array
sort @myArray sort @{$myArrayRef} Sort the array
$myArray[5] ${$myArrayRef}[5] The 6th element of the array
$myArray[5] = 55 ${$myArrayRef}[5] = 55 Assign the value 55 to the 6th element of the array

References to hashes can also use the {} operator:

Statement using a normal hash Equivalent statement using a reference to the same hash Result of executing either statement
%myHash %{$myHashRef} A hash
keys %myHash keys %{$myHashRef} Obtain the keys in the hash
$myHash{'google'} ${$myHashRef}{'google'} The element in the hash indexed by the string 'google'
$myHash{'google'} = "Search Engine" ${$myHashRef}{'google'} = "Search Engine" Assign element in the hash indexed by the string 'google' the string value "Search Engine"

You may find the syntax using the {} operator difficult to read in some cases. There is an alternative syntax using the -> operator. Here are a few examples:

This statement... ...is equivalent to this statement
${$myArrayRef}[2] $myArrayRef->[2]
${$myHashRef}{'google'} $myHashRef->{'google'}

Be sure not to confuse the statements $myArrayRef->[2] and $myArrayRef[2]; they are totally different. The first returns the 3rd element of an array referenced by the variable $myArrayRef. The second returns the 3rd element of an array variable @myArrayRef. The same thing applies to the arrow syntax and hash references. Make sure you understand the difference between the statements $myHashRef->{'google'} and $myHashRef{'google'}.

Examples

Now that you have been introduced to references and their syntax, let us look at some examples of what references can do. One use of references is to combine them with Perl's three built-in data types to create new data structures. Our two examples will demonstrate how to accomplish this.

In the first example, we will use references to create multi-dimensional arrays. Recall that the value of the expression ['foo', 'bar', 'baz'] is a reference to an anonymous array containing three elements (the strings 'foo', 'bar', and 'baz'). Now examine this:

@twoDArray = ( 
    [1],
    [2, 3]
    [4, 5, 6],
    [7, 8, 9, 10]
);

The array @twoDArray contains four elements, each of which is a reference to an array.  For example, $twoDArray[2] is one of the references; it refers to the array (4, 5, 6). How would we access elements of the array (4, 5, 6)? We can do this using the arrow operator (->). For example:
$twoDArray[2]->[0]             # refers to the element 4
$twoDArray[2]->[2]             # refers to the element 6

You should now see that the variable @twoDArray behaves very similarly to a two dimensional array, with the additional benefit that not all the rows are required to have the same number of columns. Accessing elements in the data structure is as simple as the statement:

$twoDArray[row]->[column]

where row and column denote the index of the desired element.

Perl allows you to omit certain syntax when using arrays. For instance, when working with an array of references-to-arrays (such as the above example), Perl allows you to omit the arrow (->) operator altogether. The following two statements are thus equivalent:

$twoDArray[0][0]
$twoDArray[0]->[0]

and both refer to the element in row 0, column 0.

This new syntax gives the illusion that Perl supports multidimensional arrays. Let us compare the three different ways of accessing a three dimensional array (an array of references to references to arrays). The array will look like:

@threeDArray = (
    [ [ 1, 2 ], [3, 4], [5, 6] ],
    [ [ 6, 5 ], [4, 3], [2, 1] ],
    [ [ 9, 1 ], [8, 2], [7, 3] ],
    [ [ 4, 6 ], [5, 5], [6, 4] ]
);

Any of the following three expressions will retrieve the 8 from the array:

print $threeDArray[2][1][0];       # this...
print $threeDArray[2]->[1]->[0];   # or this...
print ${${$threeDArray[2]}[1]}[0]; # or even this (yuck!)

In our second example, we are going to create a new data structure to store information about countries and their cities. Suppose that we are given a file that is a list of city-country comma separated pairs. Each pair resides on its own line in the file. For example:

Chicago, USA
Victoria, Canada
Frankfurt, Germany
St. Johns, Canada
Berlin, Germany
Vancouver, Canada
Washington, USA
Helsinki, Finland
New York, USA
Ottawa, Canada

would be a valid list. Our task is to list each country, with a sorted list of the cities it contains, in alphabetical order. The output should look like this:

Canada: Ottawa, St. Johns, Vancouver, Victoria.
Finland: Helsinki.
Germany: Berlin, Frankfurt.
USA: Chicago, New York, Washington.

Assuming that the list of cities is stored in the file city.txt, here is one possible solution (you can ignore the actual file operations. To understand the code, it suffices to know that by the time Perl gets to the chomp;, variable $_ contains the next line in the file).

1     open (INFILE, 'city.txt');
2     while (<INFILE>) {
3         chomp; # avoid \n on last line
4         my ($city, $country) = split /, /;
5         push @{$table{$country}}, $city;
6     }
7
8     foreach $country (sort keys %table) {
9        print "$country: ";
10       my @cities = @{$table{$country}};
11       print join ', ', sort @cities;
12       print ".\n";
13    }

First, let us get a feel for the design of this script. We are going to use a hash of references-to-arrays as our main data structure. The keys in the hash will be the country names, while the values in the hash will be references-to-arrays. The referenced arrays will hold the city names.

The while loop in the program, parses the city.txt file and constructs the hash of references-to-arrays named %table (or country/city table). The line that does all of the work is line 5. The push function adds the city-name ($city at the end of the anonymous array accessed by looking in the hash table ($table) at index $country and following the reference. To make things clearer, look at the line:

push @array, $city;

Now replace the array name with the reference {$table{$country}} and you should begin to see this more clearly. Remember, we are indexing the hash variable $table with the key $country. The result is a reference to an (anonymous) array. Dereferencing this reference to an array with the {} gives us the array itself, to which we add the value held in variable $city.

The foreach loop prints out the cities in sorted order by sorting the keys in %table. The @cities array variable is constructed by indexing the table with the country name and following the returned reference. The @cities array is then sorted and joined into a string with each element separated by a comma (,) that is printed to the screen.

Suppose that the program has just read the first line in its input. Execution is at line 5, $country is 'USA', and $city is 'Chicago'. Since this is the first city in the USA, there is no key named 'USA' in the table yet, and so the value $table{$country} is undefined. Perl sees that you are trying to push 'Chicago' onto an array that does not exist, and thus it creates a new array, adds the value 'Chicago' at the end of the array, and places a reference to the array into $table.

Loose Ends

Remember that you can create a reference to any data type in Perl. This includes scalars, arrays, hashes, functions, and even other references. You should also know that it is possible to omit the curly brackets when following a reference, for example:

@$ref        # This is the same as...
@{$ref}      # this.

$$ref[5]     # And this...
${$ref}[5]   # is the same as this.

You can check whether or not a variable is a reference or not using the ref function. For example:

$aRef = [1, 2, 3];
print "A reference!\n" if ref($aRef);

The ref function returns true if its argument is a reference, false otherwise. If the argument provided to the ref function is a reference to a hash data type, it will return the string "HASH". If the argument is a reference to an array data type, it will return the string "ARRAY".

What happens if you try to use a reference in the same manner as a string by attempting to print it out? You will display strings such as "ARRAY(0x80f5dec)" or "HASH(0x826afc0)". If you ever encounter output from a script (that you are not debugging) that is similar, you will know right away that you have displayed a reference by accident. It is possible to compare the string representations of references with the eq operator. However, if you wish to test whether or not two references are the same, use the == operator on the references directly as it is much faster.

Lastly, there is the concept of symbolic (also called soft) references whereby a string can be used as though it was a reference. For example, this:

@myArray = ("one\n", "two\n", "three\n", "four\n");
$aRef = \@myArray;

print ${$aRef}[0];
print ${$aRef}[1];
print ${$aRef}[2];
print ${$aRef}[3];

is the same as this:

@myArray = ("one\n", "two\n", "three\n", "four\n");
$aRef = "myArray";

print ${$aRef}[0];
print ${$aRef}[1];
print ${$aRef}[2];
print ${$aRef}[3];
Subroutines

A Perl subroutine (function) takes the form:

sub routine_name { body... return returnval; }

where returnval is either a scalar, an array, or an associative array.

All subroutines must be defined with the sub keyword specified immediately before the routine name. To make a subroutine call, only the subroutine name is necessary:

subroutine_name($arg1, $arg2, ...);

The arguments $arg1, $arg2, ... are passed to the subroutine in a predefined array called @_. These arguments are usually assigned to local variables inside the subroutine using a statement such as

my ($parm1, $parm2, ..., $parmt) = @_;

where the my keyword indicates that the scope of the variables being assigned is restricted to the block that contains the my keyword (within the {}). The return keyword within a Perl subroutine is optional. If unspecified, the default return value is the value of the last statement executed. Here is an example. The program:

sub double 
{
    my($num) = @_;

    my $result = $num + $num;
    return $result;
}
$ans = double(10);
print double($ans);

would output the integer 40. Observe that the my is not only useful for the parameter list, but can also be used elsewhere to create local variables.

To use subroutines you created that exist in a different file, you need to add the line:

require "otherfile.pl";

to your current file, where otherfile.pl is the name of the file containing that subroutine.

Perl also contains a large number of predefined subroutines such as chop,each, eval, etc. In order to be able to do useful things in Perl, you must familiarize yourself with these numerous functions. Since they are far too many to list here, you should take the time to browse through them using one of the numerous online resources available, such as the Perl manpages. The Perl man pages on the Undergraduate machines are also very helpful. To browse the documentation on the functions that are predefined in Perl, try man perlfunc. Note that the Perl man pages are split into sections. To see which sections are available for separate browsing, use man perl.

Passing Variables by Reference

Passing variables by reference is necessary if you wish to pass multiple array or hash variables to a function, or if you would like to have a single array (or hash) maintain its integrity over a function call. If you are passing multiple arrays (or hashes) to a function and you do not explicitly pass them by reference, then they will be concatenated into one large array, which is almost certainly not what you want to do. For instance, the code

sub notWhatIWanted
{
    my (@A, @B) = @_;

    print "A = @A\n";
    print "B = @B\n";
}

@X = (1, 2, 3);
@Y = (4, 5, 6);

notWhatIWanted(@X, @Y);

would output

A = 1 2 3 4 5 6
B = 

because the value of @_ was (1, 2, 3, 4, 5, 6), and Perl (not knowing how to split this between @A and @B) just stores all of @_ into @A. The array @B is then initialized with an empty array.

Now that you know why it is sometimes necessary to pass variables by reference, let us look at a few examples.

sub popmany {
    my $aref;
    my @retlist = ();
    foreach $aref ( @_ ) {
        push @retlist, pop @$aref;
    }
    return @retlist;
}
@tailings = popmany ( \@a, \@b, \@c, \@d );

In this example, the array @tailings contains all of the last elements of the arrays @a, @b, @c, @d. The last elements of each array have also been popped off. Printing the contents of the arrays @a, @b, @c, @d after the call to popmany() would show that the last elements had been removed from each array.

There is one more thing that must be mentioned. Just as calling a function with multiple arrays or hashes causes problems without passing variables by reference, so does returning multiple arrays or hashes. Returning multiple non-scalar variables is a feature unique to Perl, so you have likely not seen this in C++ (where you might wrap the arrays in an object and return that object). We can write a return statement that returns multiple values like this:

return ($value1, $value2, ...);

That is, we are simply returning an array, whose individual elements can then be assigned to distinct variables as in:

($x, $y) = myFunction(...);

However, this does not work when returning multiple arrays and/or hashes: all of them are concatenated, and it is not possible for the caller to determine where one array ends and where the next one begins. For instance:

(@a, @b) = func(@c, @d);   # wrong, @a contains all elements, @b is empty
(%a, %b) = func(%c,%d);    # wrong, %a contains all keys/values, %b is empty

The solution to this problem is once again to use references. Consider the following example where references to two arrays are passed into a function, the array sizes are compared, and the two references are returned.

sub func {
    my ($cref, $dref) = @_;       # Local variables, two references to arrays
    if (@$cref > @$dref) {        # Compare the size of the two referenced arrays
        return ($cref, $dref);    # Return both references to arrays
    } else {                      
        return ($dref, $cref);    # with the larger of the two first
    }
}

($aref, $bref) = func(\@c, \@d); # Call the function with two array refs
print "@$aref has more elements than @$bref\n";

Returning references to non-scalar variables is very important when returning multiple variables. When returning either a single array or hash, you can use the normal calling convention (i.e. return @array;). Objects

To understand how Perl deals with objects, you first need to understand how references work in Perl. If you are still shaky with references, go back and re-read the section on references before continuing on. There are three main points that will be covered in this section:

  1. An object is a reference that knows which class it belongs to.
  2. A class is a package (a group of subroutines) that provides methods to deal with object references.
  3. A method is a subroutine that expects an object reference (or a package name, for class methods) as its first argument.

Creating a Class

Before creating a class, you first need to decide on the name of the class, since the name of the file that contains the class must be the same as the name of the class. For instance, we will look at a example consisting of a simple Person class. Thus, the name of the file must be Person.pm (the pm extension stands for Perl module). In Perl, the class module (.pm) file contains the definitions of all methods and data of the class. It is used in the same way as a .cpp (or .C) implementation file in C++. To use our Person class in a program, we will need to include the line

use  Person;
This is similar to the use of the #include "file.h" preprocessor directive to include header files in C++.

Now for the actual creation of the class. The first thing that all classes need, as you already know, is a constructor. This method, once invoked, returns a new object of the class. All constructors make use of a Perl function called bless() to handle the creation of the object. The bless() function turns its argument into an object, or more accurately, it allows its argument to be referred to as an object. Only after you have bless()-ed an object will you be able to use its methods.

Unlike C++, the name of the constructor method is not restricted to be the same as the name of the class: it can be anything at all. The name new has become a popular name for constructor methods, possibly because of the popularity of C++ and the number of migrating programmers. Note that new is not a keyword in Perl. It is just a popular choice for the name of the constructor.

We will see the code for a constructor later when we implement the Person class. First we look at how Perl represents objects.

Representing Objects

Perl uses an anonymous hash to represent the objects of a class. While other representations could be used, hashes make data retrieval easiest and simplify the implementation immensely. Here is an example of how an object or record can be simulated by a hash:

$record = { name => "Jason", age => 23, peers => [ "Norbert", "Rhys"] };

Now you can access the record with the statement $rec->{age} to get the value 23, or @{ $rec->{peers} } to get access to the array whose elements are "Norbert" and "Rhys".

This works wonderfully for records, but as you know in Object-Oriented Programming, the user should not have direct access to the state (member variables) of the object. Instead, the object should be manipulated through calls to member functions. Here is an example of how we might create and use a Person object:

use Person;

$him = Person->new();
$him->name("Jason");
$him->age(23);
$him->peers( "Norbert", "Rhys", "Phineas" );

printf "%s is %d years old.\n", $him->name, $him->age;
print "His peers are: ", join(", ", $him->peers), "\n";

This example shows how the interface for the class should be defined so that the user does not need to know the underlying implementation details. All that one needs to do in order to use our Person class is to call our constructor and our other member functions. In the next section, we will show you how to implement the Person class.

Constructors and Other Methods

As stated above, using a hash to hold the data for an object is very convenient. We will use this idiom to implement our Person class. For now, the Person class will only contain a constructor (named "new"), and three accessor methods ("name", "age", and "peers"). Here is one possible implementation of the Person class.

package Person;

##################################################
## The Object Constructor (Simplistic Version)  ##
##################################################
sub new 
{
    my $self  = {};
    $self->{NAME} = undef;
    $self->{AGE} = undef;
    $self->{PEERS} = [];
    bless($self);
    return $self;
}

##############################################
## Methods to Access Per-Object Data        ##
##                                          ##
## With args, they set the value. Without   ##
## any, they only retrieve it/them.         ##
##############################################
sub name 
{
    my $self = shift;
    if (@_) { 
        $self->{NAME} = shift; 
    }
    return $self->{NAME};
}

sub age
{
    my $self = shift;
    if (@_) { 
        $self->{AGE} = shift; 
    }
    return $self->{AGE};
}

sub peers
{
    my $self = shift;
    if (@_) { 
        @{$self->{PEERS}} = @_; 
    }
    return @{ $self->{PEERS}};
}

1; # VERY IMPORTANT: used so the "require" or "use" statement succeeds

The first line of the file, package Person;, declares the file to be the implementation for a Person class. The first subroutine defined is the constructor, named new. The constructor takes no arguments, and initializes the name and age of the object to undef (the equivalent of NULL in C++). It also initializes the peers of the newly created Person to be an empty list. Next, the constructor makes use of the bless function to allow the new Person to be referred to as an object. Finally, the constructor returns the new object.

Now that we know what the constructor is doing, let us look at the other member functions. They are all very similar. In fact, you will almost always see the same first line for all member functions in Perl. This line, my $self = shift;, stores a reference to the object that called the member function in variable self.

The shift function returns the first element of an array, and at the same time deletes it from the array. For instance, the code:

@array = (1, 2, 3, 4);
$x = shift @array;
print "X = $x and Array = @array\n";

would output:

X = 1 and Array = 2 3 4

Once we are able to reference the object, we are able to manipulate its member variables. Each accessor function in our Person class obtains a reference to the object itself (in our case, a reference to a hash), and checks whether or not the method was called with any arguments. If the method was called with an argument, the accessor sets the appropriate member variable to the value of the argument, and returns that value. If no argument was provided, then the method returns the value of the appropriate member variable. Again, it must be noted that we are using an anonymous hash to represent the Person object. The variable $self is a reference to this anonymous hash. Thus, we must use the arrow notation to follow the reference to the hash, followed by the curly brackets {} to index into the hash.

Right now our constructor is very simple. What if we were creating a larger object? We might want to have a method named initialize() to do the initialization of the object. In our Person class, we could move all of the initialization into a helper method. The code for the constructor might now be similar to:

sub new
{
    my $self = {};
    bless $self;
    $self->initialize();
    return $self;
}

Let us now discuss other member functions in a bit more detail. As you have already seen from looking at our basic Person class, Perl does not place any restrictions on how a method is defined (unlike C++). In Perl, a method is just a subroutine like any other. However, a Perl member function expects an object reference as its first argument. This object is usually shifted into a variable with a name such as $this or $self in the first one or two lines of the subroutine. This reference variable provides a direct link to the object for the rest of the methods body. For example:

sub greeting
{
    my $self = shift;
    my $myName = $self->{NAME};
    print "Hello, my name is $myName";
}

This is a member function of our Person class for a formal greeting. Notice how the object is shifted into a reference variable; the variable is dereferenced to access the object's private data.

Invoking Methods

So now we know how to define a subroutine, but what about calling conventions? There are two ways to call a subroutine in Perl. First we will show you the most intuitive way for C++ programmers, using the arrow operator (->). We will then present the indirect object syntax, which is an equally valid way to call subroutines/member functions. Here is an example that uses the arrow notation to invoke member functions:

$fred = Person->new();
$fred->name("Fredrick");
$fred->greeting();

This example creates a new Person object, sets the name of the person to "Fredrick", and has the Person give a greeting. Here is the equivalent code using the indirect object syntax.

$fred = new Person;
name $fred "Fredrick";
greeting $fred;

You can use either of these calling conventions interchangeably. However, you should pick the one most natural to you and stick to it throughout your code, in order to avoid confusing future programmers (including yourself) who might want to modify your code. The indirect object syntax can be confusing to C++ programmers because you must specify the method first and the object second. C++ works the other way around. Also, C++ programmers tend to think of the first name in a statement as being a keyword. This can cause them to confuse regular object constructors called new with a keyword of the language.

Object Destructors

Just as objects are created, they must also be destroyed. Destructors are called automatically when an object is destroyed. Perl uses a method called reference counting to determine whether or not an object is being used, or may be used again. Perl keeps track of the number of references to every object in a program. When the reference count is a positive integer, it means that the object can still be accessed through a reference. When the reference count for an object becomes 0, there is no longer any way to access the object, and so the object is destroyed.

For example, when a subroutine exits, all variables local to that subroutine must be destroyed (including objects). Likewise, all objects are destroyed when your program exits. In C++ you might write a destructor to take care of deallocating memory that was used by the deceased object. However, in Perl, when an object is created/destroyed, Perl takes care of memory allocation/deallocation for you. Why then, would we want an object destructor? To take care of other book-keeping, and clean-up operations before the memory for the object is deallocated.

For example, suppose we have a global integer variable named populationCount in our program (a better way to do this would be to define a class variable named populationCount. Class variables are shared by all instances of the class). This variable is incremented every time a Person object is created, and decremented every time a Person object is destroyed. We can modify our constructor to increment populationCount after it creates a new Person, but what about decrementing the count when someone dies? To do this, we will use a destructor. Here is an example of a destructor that decrements the populationCount every time an object is destroyed:

sub DESTROY
{
    $populationCount--;
}

There are only two more things worth mentioning about destructors. All object destructors must be named DESTROY. This is how Perl can tell the difference between a destructor member function and a normal method. All object destructors receive a single, read-only argument, a reference to the object being destroyed (in variable $_[0]). Since the argument is read-only, it is not possible to modify the argument and start accessing another object instead. Only the reference is read-only, the object being referenced can still be modified. We did not have to use the reference to the Person object being destroyed in our example.

This is only the beginning of Object-Oriented scripting in Perl. There are many details that we have left out. You can find more details in our list of references. Filehandles

Streams

As far as Unix and Windows are concerned, a file is nothing more than a sequence of characters. The MacIntosh operating system, on the other hand, attaches additional information (in the so-called resource fork) such as the type of the file, the program that created it, etc. In both cases though, programs deal with the file in a similar way, which can be understood by looking at a related problem. Suppose that you want to ask your friend for the solutions to a CPSC 219 lab (of course, we know you would never do that). Before talking to your friend, you must first pick up the phone and dial the appropriate phone number. Once you are connected, then you can start chatting.

Similarly, before a program can access the data contained in a file, it must setup an object (the phone) that will act as an intermediary between the file (your friend) and your program (you). This is called opening the file, and the object that allows your program to access the data in the file is called a stream, or handle.

Thus, a program, whether it is written in C++, Perl, Java, Pascal, or pretty much any other language, accesses a file using the following three step process:

The C++ library provides two basic classes for operations on streams: istream (for input, where we read the data from the file), and ostream (for output, where we write data to the file). For instance, cin is an object of type istream, while cout is an object of type ostream. Similar types of objects exist in most programming languages: the FILE structure in C, the InputStream and OutputStream classes in Java, the File type in Pascal, etc.

Filehandles

In Perl, a stream is represented by a filehandle. A filehandle can refer to a file, pipe, socket or device, but it is used in the same way in all cases. Here is how you would create a filehandle that reads from an existing file (where SESAME is the name of the filehandle):

open(SESAME, "filename");

It is customary to pick an uppercase name for a filehandle. Once you have opened a file for input, you can read lines from the file using the line reading (angle) operator <>. Simply place the filehandle (or STDIN if you want to use standard input) between the angle brackets. For example, the following program will prompt you for your name, and then greet you:

print STDOUT "What is your name?\n";
$name = <STDIN>;
print STDOUT "Hello ";
print STDOUT $name;

We now know how to open files for input, but what about opening files for output, or appending output to the end of already existing files? These tasks are easy to accomplish in Perl, as it suffices to insert the appropriate character(s) at the beginning of the file name. To open files for output, and appending, we simply insert the symbols >, and >> respectively. For example:

open(INFILE, $file);          # Open $file for input
open(INFILE, "<$file");       # Also open $file for input
open(OUTFILE, ">$file");      # Open $file for output
open(APPENDFILE, ">>$file");  # Open $file for appending

Once we have opened a file for output, we write to the file using a print statement. To accomplish this task, the print statement takes two arguments: a file descriptor and the string to be written to the file. Here is an example:

print OUTFILE "This line will be written to the file.\n";

If the first argument is omitted, then the string is printed onto the standard output. Of course, once you have finished working with a file that has been opened, you should close the file.  In Perl, this is accomplished with the "close()" function.  The "close()" function takes a filehandle as an argument, for example:

close(INFILE);    # Close INFILE

The following example ties everything together. Although the actual processing performed on the file is minimal, the example illustrates the basics of opening and closing files for both input and output.

open(S_OUT, '>-');                    # Open stdout and
print S_OUT "Starting Execution...";  # Write a message

$file = '/etc/shells'
open(INFILE, "<$file");               # Open /etc/shells to read input

$file = "copy_shells.txt";
open(OUTFILE, ">$file");              # Open 'copy_shells.txt' for output

@lines = <INFILE>;                    # Read in the '/etc/shells' file and
print OUTFILE @lines;                 # Write a copy of it to the output file

close(INFILE);                        # Close both input and
close(OUTFILE);                       # Output files

print S_OUT "DONE!\n";                # Write a 'done' message and
close(S_OUT);                         # Close stdout

Perl filehandles make opening and processing files clean and very easy. Most of the unpleasantness associated with processing text files in C++ has been hidden by Perl. Pattern Matching

Pattern matching and regular expressions, which we discuss in the next section, are two of Perl's greatest strengths. For instance, they can be used to easily replace some keyword in a large number of files (e.g. to change the background color of every page on your web site). Here is a simple 3-line example, which displays all of the lines in a file that are http links:

while ($line = <FILE>) { 
  if ($line =~ /http:/) { 
    print $line;
  }
}

The =~ operator tells Perl to check whether or not the line contains the string "http:" (the right operand of =~ is actually treated as a regular expression; these are covered in the next section). There is also a much shorter way to write this code by using the following two tricks:

This results in the following equivalent, but much shorter, loop:

while (<FILE>) {  
  print if /ftp:/; 
}

This is a standard idiom of Perl programming, and as such, you are likely to see it often if you need to read code written by others.

Perhaps the heart and soul of Perl lies within its ability to parse text. Parsing text consists of matching and manipulating text (a string). The three matching operators provided by Perl are:

Matching Operators

Operator Description
=~ m/pattern/ Pattern matching
=~ s/search string/replace string/ Substitution
=~ tr/search list/replace list/ Translate characters

All of these operators begin with the =~ construct. This construct roughly means: perform the matching operation indicated by the right argument to the string on the left of the =~. Its general form is:

string =~ operator

The Pattern Matching Operator

The first operator, the pattern matching operator, does no manipulation on the string to the left of =~ whatsoever. Its most basic use is to return a boolean indicating whether or not the string to the left of =~ matches the pattern within the slashes. There are a few options to this operator, but only one is really worth mentioning: the i option. If i is included at the end of this operator, case sensitivity will be ignored.

Here is an example. The program

$string1 = "Hello World\n";
$pattern = "ello";
if ($string1 =~ m/$pattern/) { 
  print "$pattern is contained in $string1";
}
if ($string1 =~ m/EllO/i) {
  print "A case permutation of EllO is contained in $string1";
}

will print out

ello is contained in Hello World
A case permutation of EllO is contained in Hello World

Because the m/pattern/ operator is so frequently used, the m can be omitted if and only if the pattern is delimited with slashes. Hence the following is legal:

$string1 = "Hello World\n";
if ($string1 =~ /Hello/)
{
    print "Back to legal Perl\n";
}

whereas the following is not.

$string1 =   "Hello World\n";  
if ($string1 =~ %Hello%)
{
    print  "Whatchew talkin' 'bout, Willis?\n";
}
The Substitution Operator

Unlike the pattern matching operator, the substitution operator (the s/search string/replace string/ operator) may change the string to the left of =~. It replaces the first occurrence of search string with replace string. The most popular option for this operator is the g option. It will cause every occurrence of search string in the string to the left of =~ to be replaced with replace string, instead of just the first one. For instance:

$string1 = "Hello World\n";
$searchpattern = "Hello";
$replacepattern = "Good Morning";
$string1 =~ s/$searchpattern/$replacepattern/;
print $string1;
$string1 =~ s/o/l/;
$string1 =~ s/o/u/g;
print $string1;
would produce the output:
Good Morning World
Glud Murning Wurld
The Translation operator

This third operator also mutates the string to the left of the =~ operator. Each character in the search list will be replaced by the character at the same position in the replace list, assuming that the size of the replace list is the same of that as the search list. This operator is most commonly used to convert strings from lowercase to uppercase or vice versa. Note that for historical reasons, the tr can be replaced with a y. So the program

$string1 = "Hello World\n";
$string1 =~ tr/a-m/A-M/;     # Note a-m ==> abc...m
$string1 =~ y/N-Z/n-z/;
print $string1;

would produce

HELLo worLD
Remembering Patterns

It is often useful to remember patterns that have been matched so that they can be used again. In a Perl program, anything matched in parentheses gets remembered in the variables $1,...,$9. These variables can only be used after the expression has been completely evaluated. To use the matched patterns in the same regular expression, or in the replacement string, we use the special expressions \1,...,\9 instead. For example, the script

$_ = "King Garfield of Lasagne";
s/([A-Z])/:\1:/g;
print "$_\n";

will replace each upper case letter in the string $_ by the same letter surrounded by colons. It will thus print.

:K:ing :G:arfield of :L:asagne

The variables $1,...,$9 are read-only: you cannot alter them yourself. As another example, the test

if (/(\b.+\b) \1/) {
    print "Found $1 repeated\n";
}

will identify any repeated words. Each \b represents a word boundary and the .+ matches any non-empty string, so \b.+\b matches anything between two word boundaries. This is then remembered by the parentheses and stored as \1 while we are evaluating the (implicit) =~, and as the variable $1 for the rest of the program.

Regular ExpressionsRegular expressions, which you have had a glimpse of in the previous section, allow us to specify complex patterns relatively simply. They can make complicated operations possible in two or three lines of code. There are times when you want to perform a more complicated pattern match than those we used as example in the previous section, such as finding the first 5 letters of a string, the first occurrence of an embedded numeric value or even the first character of a string. These matches will require the use of regular expressions. Below is a table containing a few of the most common regular expressions:

Regular Expressions

Regular Expression Description Example
Note that all the if statements return a TRUE value
. Matches an arbitrary character, but not a newline.
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
  print "$string1 has length >= 5\n";
}
( ) Groups a series of pattern elements to a single element. When you match a pattern within parentheses, you can use any of $1, $2, ... $9 later to refer to the previously matched pattern.
Program:
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'\n";
}
Output:
We matched 'Hel' and 'o W';
+ Matches the preceding pattern element one or more times.
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
  print "There consecutive l's in $string1";
}
? Matches zero or one times.
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
  print "There is an 'H' and a 'e' no more ";
  print "than 2 characters afterwards.\n";
}
? Matches the *, +, or {M,N}'d regexp that comes before as few times as possible.
$string1 = "Hello World\n";
if ($string1 =~ m/(l+?o)/) {
  print "There small match with 1 or more 'l'
  print "followed by an 'o' is 'lo', not 'llo'.\n";
}
* Matches zero or more times.
$string1 = "Hello World\n";
if ($string =~ m/el*o/) {
  print "There is a 'e' followed by some";
  print "'l' (maybe) followed by 'o'\n";
}
{M,N} Denotes the minimum M and the maximum N match count.
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
 print "There exists a substring with 1";
 print "or 2 l's in $string1";
}
[...] Denotes a set of possible matches.
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]/) {
  print "$string1 contains a vowel\n";
}
| Matches one of the left or right operand.
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi)/) {
  print "Hello or Hi is ";
  print "contained in $string1";
}
\b Matches a word boundary>
$string1 = "Hello World\n";
if ($string1 =~ m/\bllo\b/) {
  print "This will not match because";
  print "the llo is not a word.";
}
\w Matches alphanumeric, including "_".
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
  print "There is at least one alpha-";
  print "numeric char in $string1";
}
\W Matches a non-alphanumeric character.
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric\n";
}
\s Matches a whitespace character (space, tab, newline, formfeed)
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
  print "There are TWO whitespace ";
  print "characters in $string1";
}
\S Matches anything BUT a whitespace.
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
  print "There are TWO non-whitespace ";
  print "characters in $string1";
}
\d Matches a digit, same as [0-9].
$string1 = "99 beers on the wall\n";
if ($string1 =~ m/\d.*\d/) {
  print "There are TWO digits in $string1";
}
\D Matches a non-digit.
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
  print "There is >= 1 non-digit in $string1\n";
}
^ Matches the beginning of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
  print "$string1 starts with a He\n";
}
$ Matches the end of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/rld\n?$/) {
  print "$string1 is a line or string";
  print "that ends with rld\n";
}

To allow for greater leeway in parsing, the slashes (/) in these operators can be replaced by any non-alphanumeric character such as: ~,!,@,#,$, etc. This is particularly useful when you don't want to escape the slashes in a search pattern.  For example,

$string1 = "Escaped slashes look like valleys \/\/\/.";
if ($string1 =~ m/Escaped slashes look like valleys ///./) {
  print "Whatchew talkin' 'bout, Willis?\n";
}

is illegal, whereas

$string1 = "Escaped slashes look like valleys \/\/\/.";
if ($string1 =~ m%Escaped slashes look like valleys ///.%) {
  print "Back to legal Perl\n";
}

is legal. Think for a second about why the first program is illegal, while the second is valid Perl. Try running each of them to verify which works and which does not. In the latter case, discover what it is that causes the Perl interpreter to complain.

Regular expressions are one of the more powerful features of the Perl programming language. If you decide to master only one aspect of the language, make regular expressions that aspect. In fact, regular expressions appear in so many different places in the Unix environment that it would be a crime not to master them.

Perl Gotchas

While Perl makes common jobs easy, its syntax can cause some frustrations for programmers who are new to the language. These problems arise as a result of Perl borrowing syntax and structure from a number of other programming/scripting languages. For example, some of Perl's syntax is taken from C, some control structures are taken from Unix shell scripting, and some of the pattern matching syntax is taken from the Unix utility awk. The purpose of this section of the notes is to present you with a few mistakes that are made frequently by programmers who have experience in other languages, but are new to Perl.

Example Perl Scripts

You will probably be surprised at how powerful your scripting ability already is. This is perhaps best demonstrated by Perl one-liners, a traditional challenge for Perl programmers to accomplish everything in one line. Below is a list of one-line (less than 80 chars) Perl commands that have been used to perform common tasks on the UNIX platform. Note: some of them are a bit obscure, so don't worry if you are having a hard time figuring out how they work.


Copyright (C) 2000 - 2002, The University of British Columbia