Just like every other programming or scripting language you have been exposed to, Perl allows you to create and use variables (we will call variable an association between a name and a value). The languages that you are already familiar with fall into two categories:

Perl falls somewhere in between these two categories: it has several kinds of variables, and we can determine the kind of a variable by looking at the first character of its name. However, the exact type of the value associated with a given variable name may not be known until the program is run. Perl uses the following conventions for variable names:

Kinds of Variables

Scalars

Scalars are the most commonly used variables in Perl. A scalar variable (or, more precisely, the use of a variable in a scalar context), always starts with a $ sign, and contains a single value by definition.

If you attempt to use a scalar variable that has not been assigned a value, you will get the value 0 or "", depending on whether it is to be used as number or as a string. The statement use strict;, mentioned in the section How to Run and Debug a Perl Script, will warn about such attempts.

Unless you know what you are doing, do not use numeric or special characters immediately after the $ in the name of one of you variables. For instance, the variables $1, $9, $$, $&, all have predefined meanings. Unlike normal scalars, these predefined scalars have a default value that is not necessarily 0 or "", and in most cases they should only be read, and not modified.

When looking at a Perl program, because of your C++ background, you will probably be asking yourself: are these scalars of type int, float, char or string? The answer is: all of the above. Think of scalars as magical strings. Their type will change according to the value that they contain. For example, the program

$string_or_int = "Hello";
print "$string_or_int World\n";

$string_or_int = 12;
print "Twelve * two equals: ";
print $string_or_int * 2;
print "\n";

would output

Hello World
Twelve * two equals: 24

Arrays

Arrays are similar to their C++ counterpart, and contain a group of scalars that are accessed by their position in the array. An array name must start with the @ character, and is usually followed by one or more alphabetic characters. We can create and initialize an array by putting the elements of this array in parentheses, separated by commas. For instance,

@letters = ('a', 'b', 'c', 'd');

creates an array with 4 elements. Similar to predefined scalars, there also exist predefined arrays such as @ARGV (the command line arguments to a Perl script) and @INC (the list of directories that Perl will search for modules). All predefined arrays are capitalized and hence you should name your arrays using lower case characters to avoid potential name conflicts.

As in most other programming languages, a specific element in a Perl array is referenced using the [] operator:

$arrayname[index]

Observe that because each array element is a scalar, we use $arrayname[index] and not @arrayname[index]. The first element of the array is $arrayname[0], and the last element is $arrayname[$#arrayname], where $#arrayname is a predefined scalar whose value is the position of the last element of array @arrayname.

To assign more than one element of an array to individual variables, one can use the [] operator and perform the assignments individually. An alternative option is to do the assignments in parallel using the following syntax:

(var1, var2, ..., vart) = @arrayname;
by which var1 gets assigned $arrayname[0]$, var2 gets assigned $arrayname[1]$, etc. If one of var1, var2, vart is not a scalar (e.g. if it is an array), then it will be assigned the remaining elements of @arrayname. For instance, the statement
($x, $y, @a, $z) = (1, 2, 3, 4, 5)

would assign 1 to $x, 2 to $y, the array (3, 4, 5) to @a, and nothing (the undefined value) to $z.

Note that if we use an array name, such as @letters, in a context where Perl expects a scalar, Perl will use the number of elements of the array as the value. So, in the following example, $size is assigned the value 4:

@numbers = (5, 6, 7, 8);
@letters = ('a', 'b', 'c', 'd');
@numbers_and_letters = (@numbers, @letters);

$size = @numbers."";

print "Look mom, I can count: @numbers\n";
print "I know some characters too.  See: @letters\n";
print "I'm on a roll now: @numbers_and_letters\n";
print "The first letter in the alphabet is $letters[0]\n";
print "The last element in my numbers array is $numbers[$#numbers]\n";
print "The number of elements stored in my numbers array is $size\n";

and the output produced is:

Look mom, I can count: 5 6 7 8
I know some characters too.  See: a b c d
I'm on a roll now: 5 6 7 8 a b c d
The first letter in the alphabet is a
The last element in my numbers array is 8
The number of elements stored in my numbers array is 4

As you can see from this example, the use of the array data type in Perl is similar to the use of arrays in any other programming language.

Hashes

The third and final type of Perl variable is referred to as a hash or associative array. An associative array is simply a convenient form of one-to-one hashing. Think of it as an array indexed by strings, instead of by integers. More generally, the key and the value associated with the key are both scalars.

The name of a hash begins with the % character, and is usually followed by one or more alphabetic characters. Like arrays, there are a few predefined hashes like %ENV (contains your current environment, $PATH, $DISPLAY, etc.) and %SIG. All predefined hashes are capitalized and hence you should name your associative arrays with lower case characters if you are unsure if there is a name conflict. Here is an example of how to use a hash. The program

%responsibilities = ( 
    "Ian" => "instructor", "Moyra" => "systems administrator", 
    "George" => "instructor", "Bob" => "department head"
);

print "Ian's job: $responsibilities{'Ian'} \n";
print "Bob's job: $responsibilities{'Bob'} \n";

will produce the following output:

Ian's job: instructor
Bob's job: department head

Keys must be unique. However, values do not have to satisfy this condition and hence we can have many keys with the same value (e.g. "instructor"). You can see how the hash data type has the potential to be very useful. A hash data type is to Perl what the alist (association list) data structure is to Scheme.



Helpful Tips

Hashing - is the process of placing an item into a data structure using a key-to-address transformation.  Hashing is used to increase efficiency for searching and retrieving items in the data structure.  The data structure that is most often used is a table (often called a hash table). The key, which is used to index the hash table, is most often the item to be inserted or searched for.  The address of the item in the hash table is generated by applying a "hashing function" to the key. One of the most common hashing functions is the modulo division function. Given a key (K) we can generate an index (I) into the hash table of size (H) with the division function:

I = K mod H

Suppose we have a hash table of size 10 that we will use to store integer items. First we wish to insert the integer 1. Using the modulo division hash function, we obtain the address 1 (since 1 mod 10 = 1). Next we want to insert the integer 11. Using the modulo division hash function, we obtain the address 1 (since 11 mod 10 = 1). These two insertion operations have resulted in a collision. That is, two keys have mapped to the same index in the hash table. What can we do to resolve the collision? The simplest collision resolution technique is called Linear Probing. Linear Probing works as follows. If you encounter a collision at address A, look at address A+1 to see if it is occupied. If it is occupied, look at address A+2 and so on. When an unoccupied address is identified, insert the item into it.

To retrieve items from a hash table that uses linear probing collision resolution, we apply the hash function to the requested item and obtain an address, A. We look up the address A in the hash table. If the address is not occupied, we conclude that the item is not in the hash table. If the address is occupied, we check to see that the correct item is the occupant. If this is not the case, we look at A+1, and so on.

The advantage of using hashing for storing a large number of items instead of using linked-lists is that the search and insert operations can be done in some constant amount of time (the amount of time used to compute the hash function). When using a linked-list, a traversal of the list is required to search for and insert items. This takes time proportional to the size of the linked list. Compared to standard arrays, hash tables make more efficient use of space for items taken from large data sets. Consider the case where I wish to store two numbers 1 and 1000. In an array, I would store the number 1 in index 1 and the number 1000 in index 1000. Thus, I must use an array with minimum size 1001. If I was using a hash table, I could use a table of size 5, store the number 1 at index 1 and the number 1000 at index 3 (using the modulo division hash function).