A Perl subroutine (function) takes the form:

sub routine_name { body... return returnval; }

where returnval is either a scalar, an array, or an associative array.

All subroutines must be defined with the sub keyword specified immediately before the routine name. To make a subroutine call, only the subroutine name is necessary:

subroutine_name($arg1, $arg2, ...);

The arguments $arg1, $arg2, ... are passed to the subroutine in a predefined array called @_. These arguments are usually assigned to local variables inside the subroutine using a statement such as

my ($parm1, $parm2, ..., $parmt) = @_;

where the my keyword indicates that the scope of the variables being assigned is restricted to the block that contains the my keyword (within the {}). The return keyword within a Perl subroutine is optional. If unspecified, the default return value is the value of the last statement executed. Here is an example. The program:

sub double 
{
    my($num) = @_;

    my $result = $num + $num;
    return $result;
}
$ans = double(10);
print double($ans);

would output the integer 40. Observe that the my is not only useful for the parameter list, but can also be used elsewhere to create local variables.

To use subroutines you created that exist in a different file, you need to add the line:

require "otherfile.pl";

to your current file, where otherfile.pl is the name of the file containing that subroutine.

Perl also contains a large number of predefined subroutines such as chop,each, eval, etc. In order to be able to do useful things in Perl, you must familiarize yourself with these numerous functions. Since they are far too many to list here, you should take the time to browse through them using one of the numerous online resources available, such as the Perl manpages. The Perl man pages on the Undergraduate machines are also very helpful. To browse the documentation on the functions that are predefined in Perl, try man perlfunc. Note that the Perl man pages are split into sections. To see which sections are available for separate browsing, use man perl.

Passing Variables by Reference

Passing variables by reference is necessary if you wish to pass multiple array or hash variables to a function, or if you would like to have a single array (or hash) maintain its integrity over a function call. If you are passing multiple arrays (or hashes) to a function and you do not explicitly pass them by reference, then they will be concatenated into one large array, which is almost certainly not what you want to do. For instance, the code

sub notWhatIWanted
{
    my (@A, @B) = @_;

    print "A = @A\n";
    print "B = @B\n";
}

@X = (1, 2, 3);
@Y = (4, 5, 6);

notWhatIWanted(@X, @Y);

would output

A = 1 2 3 4 5 6
B = 

because the value of @_ was (1, 2, 3, 4, 5, 6), and Perl (not knowing how to split this between @A and @B) just stores all of @_ into @A. The array @B is then initialized with an empty array.

Now that you know why it is sometimes necessary to pass variables by reference, let us look at a few examples.

sub popmany {
    my $aref;
    my @retlist = ();
    foreach $aref ( @_ ) {
        push @retlist, pop @$aref;
    }
    return @retlist;
}
@tailings = popmany ( \@a, \@b, \@c, \@d );

In this example, the array @tailings contains all of the last elements of the arrays @a, @b, @c, @d. The last elements of each array have also been popped off. Printing the contents of the arrays @a, @b, @c, @d after the call to popmany() would show that the last elements had been removed from each array.

There is one more thing that must be mentioned. Just as calling a function with multiple arrays or hashes causes problems without passing variables by reference, so does returning multiple arrays or hashes. Returning multiple non-scalar variables is a feature unique to Perl, so you have likely not seen this in C++ (where you might wrap the arrays in an object and return that object). We can write a return statement that returns multiple values like this:

return ($value1, $value2, ...);

That is, we are simply returning an array, whose individual elements can then be assigned to distinct variables as in:

($x, $y) = myFunction(...);

However, this does not work when returning multiple arrays and/or hashes: all of them are concatenated, and it is not possible for the caller to determine where one array ends and where the next one begins. For instance:

(@a, @b) = func(@c, @d);   # wrong, @a contains all elements, @b is empty
(%a, %b) = func(%c,%d);    # wrong, %a contains all keys/values, %b is empty

The solution to this problem is once again to use references. Consider the following example where references to two arrays are passed into a function, the array sizes are compared, and the two references are returned.

sub func {
    my ($cref, $dref) = @_;       # Local variables, two references to arrays
    if (@$cref > @$dref) {        # Compare the size of the two referenced arrays
        return ($cref, $dref);    # Return both references to arrays
    } else {                      
        return ($dref, $cref);    # with the larger of the two first
    }
}

($aref, $bref) = func(\@c, \@d); # Call the function with two array refs
print "@$aref has more elements than @$bref\n";

Returning references to non-scalar variables is very important when returning multiple variables. When returning either a single array or hash, you can use the normal calling convention (i.e. return @array;).