Ok-Guru: ADVANCED PERL PROGRAMMING

Data References and Anonymous Storages

Referring to Existing Variables

To create a reference to an existing variable, prefix it with a backslash. For example,

$a = "mama mia";

@array = (10, 20);

%hash = ("laurel" => "hardy", "nick" => "nora");

$ra = \$a;

$rarray = \@array;

$rhash = \%hash;

References to constant scalars can be created in a similar fashion. For example,

$ra = \10;

$rs = \"hello world";

Since arrays and hashes are collections of scalars, it is possible to take a reference to an individual element the same way.

$r_array_element = \$array[1];

$r_hash_element = \$hash{"laurel"};

Dereferencing

Dereferencing means getting at the value that a reference points to. If $r is a reference, then $$r, @$r, or %$r retrieves the value being referred to, depending on whether $r is pointing to a scalar, an array, or a hash.

References to Scalars

$a = 10;

$a += 2;

print $a;

The program can be written using references like

$a = 10;

$ra = \$a;

$$ra += 2;

print $$ra;

References to Arrays

$rarray = \@array;

The following operations are equivalent

push (@array , "a", 1, 2);

push (@$rarray, "a", 1, 2);

print $array[$i] ;

print $$rarray[1];

@sl = @array[1,2,3];

@sl = @$rarray[1,2,3];

Another example,

$s = (\'a', \'b', \'c');

Here $s contains a reference to the constant string c.

References to Hashes

$rhash = \%hash;

print $hash{"key1"};

print $$rhash{"key1"};

Hash slices work the same way too:

@slice = @$rhash{'key1', 'key2'};

$$rarray[1] can also be written as ${$rarray}[1].

This is because Perl follows two rules,

(1) Key or index lookups are done at the end.

(2) The prefix closest to a variable name binds most closely.

When Perl sees something like $$rarray[1] or $$rhash{"browns"}, it leaves index lookups ([1] and {"browns"}) to the very end. That leaves $$rarray and $$rhash. It gives preference to the `$' closest to the variable name. So the precedence works out like this: ${$rarray} and ${$rhash}.

Perl provides an easier-to-read syntax for accessing array elements: the ->[ ] notation. For example,

$rarray = \@array;

print $rarray->[1] ;

#instead of

print $$rarray[1];

# or

print ${$rarray}[1];

Similarly, to access an element of a hash table the ->{ } notation can be used. For example,

$rhash = \%hash;

print $rhash->{"k1"};

#instead of

print $$rhash{"k1"};

# or

print ${$rhash}{"k1"};

Note: This notation works only for single indices, not for slices. For example,

print $rarray->[0,2];

Perl treats the stuff within the brackets as a comma-separated expression that yields the last term in the array: 2. Hence, this expression is equivalent to $rarray->[2], which is an index lookup, not a slice.

Passing Arrays and Hashes to Subroutines

When more than one array or hash is passed to a subroutine, Perl merges all of them into the @_ array available within the subroutine. The only way to avoid this merger is to pass references to the input arrays or hashes. For example,

@array1 = (1, 2, 3);

@array2 = (4, 5, 6, 7);

AddArrays (\@array1, \@array2); # Passing the arrays by reference.

print "@array1 \n";

sub AddArrays

{

my ($rarray1, $rarray2) = @_;

$len2 = @$rarray2; # Length of array2

for ($i = 0 ; $i < $len2 ; $i++) {

$rarray1->[$i] += $rarray2->[$i];

}

References to Anonymous Storage

To create an anonymous array, use square brackets instead of parentheses. For example,

$ra = [ ]; # Creates an empty, anonymous array and returns a reference to it

$ra = [1,"hello"]; # Creates an initialized anonymous array and returns a reference to it.

What happens if parentheses are used instead of square brackets? Perl evaluates the right side as a comma-separated expression and returns the value of the last element; $ra contains the value "hello".

To create an anonymous hash, use braces instead of square brackets. For example,

$rh = { }; # Creates an empty hash and returns a reference to it.

$rh = {"k1", "v1", "k2", "v2"}; # A populated anonymous hash.

Note: An ordinary hash uses the % prefix and is initialized with a list within parentheses like

%hash = ("flock" => "birds", "pride" => "lions");

An anonymous hash is a list contained within curly braces like

$rhash = {"flock" => "birds", "pride" => "lions"};

Perl doesn't have any notation for dynamically allocated scalars.

{

my $a = "hello world";

$ra = \$a;

}

print "$$ra \n";

In the above code a reference to an existing variable is created and then the variable is allowed to pass out of scope. The my operator tags a variable as local to the block.

Dereferencing Multiple Levels of Indirection

$a = 10;

$ra = \$a;

$rra = \$ra;

$rrra = \$rra;

print $a;

print $$ra;

print $$$rra;

print $$$$rrra;

$$rarray[1] is same as ${$rarray}[1]. It isn't entirely by accident that braces were chosen to denote the grouping.

There is a more general rule which specifies that the braces imply a block of code, and Perl doesn't care what is put in there as long as it yields a reference of the required type. For example,

sub test {

return \$a;

}

$a = 10;

$b = ${test()};

print $b;

Note: Normally, when Perl sees a string such as "$a", it does variable interpolation. But "a" can be replaced by a block as long as it returns a reference to a scalar, so something like this is completely acceptable, even within a string:

print "${foo()}";

Replace foo() by system ('/bin/rm *') and an unpleasant Trojan horse is created.

print "${system('/bin/rm *')}"

The parameters given to system do their damage before Perl has a chance to figure out that system doesn't return a scalar reference.

Nested Data Structure

Arrays and hashes contain scalars. They cannot contain arrays or hashes. But references are scalars which refer to an array or a hash. Hence references can be used in an array or hash to point to more arrays or hashes. This technique is used to create complicated data structures. For example,

%Mamma = ("name" => "Big Mamma", "Age" => 56);

%kid1 = ("name" => "Kid1", "Age" => 26);

%kid2 = ("name" => "Kid2", "Age" => 22);

$Mamma{"Children"} = [\%kid1, \%kid2];

Then kid2's age can be known as

print $Mamma{"Children"}->[1]->{"Age"};

If the first line of a PERL program is

$Mamma{"Children"}->[1]->{"Age"} = 34;

then PERL automatically creates a hash %Mamma with a key "Children" which points to a newly allocated array. The second element of the array points to a new allocated hash. This hash is given the key "Age" with a value of 34.

The symbol "->" can be excluded if and only if it is between subscripts. The following two examples are identical.

$Mamma{"Children"}->[1]->{"Age"} = 34;

$Mamma{"Children"}[1]{"Age"} = 34;

The following example shows the advantage of using anonymous arrays and hashes.

%Mamma = (

"name" => "Big Mamma",

"Age" => 56,

"Children" => [ { "name" => "Kid1",

"Age" => 26

{ "name" => "Kid2",

"Age" => 22

}

]

);

The above code contains only one named variable. The "Children" attribute is a reference to an anonymous array which itself contains references to anonymous hashes containing the children's details. None of these arrays or hashes embeds the next level of hash or array because anonymous arrays or hashes are references. In other words, such a nesting does not reflect a containment hierarchy.

PERL automatically deletes all the nested structures as soon as the top-level structure (%Mamma) is deleted or reassigned to something else.

The ref function

The ref function checks a scalar to see if it contains a reference and if so the data type it is pointing to. ref returns false if its argument contains a number or string. If the argument is a reference, ref returns one of these strings to describe the data being referred to: "SCALAR", "HASH", "ARRAY", "REF" (referring to another variable), "CODE" (referring to a subroutine), "GLOB" (referring to a typeglob). For example,

$a = 10;

@b = (1,2,3,4,5,6);

%c = (1,2,3,4,5,6);

$r_a = \$a;

$r_b = \@b;

$r_c = \%c;

$r_d = \$r_c;

print ref($a),"\n",ref($r_a),"\n",ref($r_b),"\n",ref($r_c),"\n",ref($r_d),"\n",ref($r_e),"\n";

Symbolic References

A construct such as $$var indicates that $var is a reference variable. What if $var is not a reference variable at all? Then Perl checks if $var contains a strings. If so, it uses that string as a regular variable name and assigns value to it. Such type of reference is called symbolic reference. For example,

$x = 34;

$var = "x";

$$var = 45; # Modifies $x to 45.

$var = "x";

@$var = (1,2,3);

The symbol used before $var dictates the type of variable to access. $$var is equivalent to $x and @$var is equivalent to @x.

Note: Symbolic references work only for global variables, not for those marked private using my.

Symbolic references in Perl can be disabled. This is done as given below.

use strict 'refs';

$var = "x";

$$var = 30;

The above program causes a runtime error.

The strict pragma remains in effect until the end of the block. It can be turned off by

no strict 'refs';

For example, the program given below executes properly.

use strict 'refs';

$var = "x";

no strict 'refs';

$$var = 30;

A view of the Internals

A variable logically represents a binding between a name and a value. This is true whether the variable is global, dynamically scoped (using local()) or lexically scoped (using my()).

An array or a hash is a collection of scalar values. An array has one value that represents this collection of scalar values.

Note that while a name always points to a value, a value doesn’t always have to be pointed by a name as in the case of anonymous arrays or hashes.

Reference Counts

Perl maintains a reference count for every value, whether it is directly pointed to by a name or not.

As shown in the figure above, the reference count represents the number of arrows pointing to the value part of a variable. Because there is always an arrow from the name to its value, the variable's reference count is at least 1. When a reference to a variable exists, the corresponding value's reference count is incremented.

The value of the reference variable is the address of another scalar value, which does not change even if $a's value changes.

Perl automatically deletes a value when its reference count drops to zero. When variables go out of scope, the binding between the name and the value is removed, resulting in the value's reference count being decremented. In the typical case in which this count is 1, the value is deleted.

The problem of reference counting technique is that reference counts take up space, which adds up if every piece of data in the application has an extra integer associated with it.

Although symbolic references access variables in an indirect way, no actual reference variables are created. Hence, the reference count of a symbolically accessed variable is not modified. Hence symbolic references are also called soft references, in contrast to hard references, which actually allocate storage to keep track of the indirection.

Then there is also the problem of circular references. For example,

$a = \$a;

$a's reference count indicates that something is pointing to it, so it will never get freed. A more practical case of circular references is that of network graphs where each node keeps track of each of its neighbors or ring buffers where the last element points to the first one.

The array's value maintains its own reference count, and each of its elements has its own. When

a reference to an array is created, its own reference count is incremented without its elements getting affected. This is shown in the figure below

When a reference to an element of an array (or a hash) is created, Perl increments that scalar value's reference count. This is shown in the figure below.

If this element is now popped from the array, its reference count goes down by 1 because the

array is no longer interested in the scalar value. But since there is an outstanding reference to the array element (and its reference count is still 1), it is not destroyed. This is shown in the figure below.

Typeglobs and Symbol Tables

Perl has a feature not usually seen in other languages. The same variable name can be used across various data types. For example, scalar $alpha, array @alpha, hash %alpha, subroutine &alpha, file handle alpha, format name alpha are all valid and completely independent of each other.

Perl uses a symbol table (implemented internally as a hash table) to map identifier names (here alpha) to the appropriate values. But duplicate keys are not allowed in a hash table. Therefore, Perl creates a structure called a typeglob between the symbol table entry and the other data types. Typeglob is just a bunch of pointers to values that can be accessed by the same name, with one pointer for each value type. In the typical case, where identifier names are unique, all but one of these pointers is null.

A typeglob has the prefix "*"; which can be considered as a wildcard representing all values sharing the identifier name.

Lexical & Dynamically scoped variables

Lexical variables (those tagged with my) aren't listed in the symbol table at all. Every block and subroutine gets an array of variables called scratchpads. Each lexical variable is assigned one slot from a scratchpad; different types of variables with the same name - $alpha and %alpha occupy different slots. Since a subroutine's lexical variables are independent of any other's, we get truly local variables.

The local operator which operates on global variables only saves their values and arranges to have them restored at the end of the block. For example,

$a = 20; # global variable

{

local ($a); # save $a's old value; new value is undef

$a = 10;

print $a; # prints "10"

}

print $a; # prints "20", the old value

When a local operator is applied to global variables, it squirrels their values away and restores them at the end of the block. Because the variables themselves are global, their new value is available not only to the block in which the local operator is used, but also to all called subroutines. For example,

$x = 10;

first();

second();

sub first {

local ($x) = "zen"; # $x is still global, and has a new value

second();

}

sub second {

print $x."\n";

}

This process is called dynamic scoping, because the value of $x seen by second() depends on the particular call stack.

Typeglobs

Typeglobs can be localized (with local only) and assigned to one another. For example,

$alpha = "Wow!";

@alpha = ("idaho", "russet");

*beta = *alpha; # Alias beta to alpha using typeglob assignment

print "$beta\n"; # prints "Wow!"

print @beta, "\n"; # prints "idaho russet"

Once the typeglob assignment is made, all entities that were called "alpha" can now also be referred to as "beta" - the names are freely interchangeable. The alias holds true until the typeglob is reassigned or removed. In the above example, there is no subroutine called alpha, but if it is defined after the typeglobs have been assigned, that subroutine can also be invoked as beta. It turns out that the alias works the other way too. @beta will also be automatically accessible as @alpha.

Temporary Aliases

For now, there is no way to get rid of an alias created by a typeglob assignment. However temporary aliases can be created using local, because it restores the typeglob's values at the end of the block. For example,

$b = 10;

{

local *b; # Save *b's values

*b = *a; # Alias b to a

$b = 20; # Same as modifying $a instead

} # *b restored at end of block

print $a; # prints "20"

print $b; # prints "10"

local *b puts all of *b's value pointers into safekeeping and substitutes an UNDEF value for all of them. This lasts until the end of the block. Now, because of the alias (*b = *a), the assignment $b = 20 has the effect of modifying both $a and $b. But at the end of the block, only $b is restored, and $a is left with the new value.

Lexical variables and the symbol table have nothing to do with each other. Hence, localizing a typeglob with my is a compile-time error.

my(*F);

Using Typeglob Aliases

Aliases happen to be quite a bit faster than references, because they don't need to do any dereferencing. For example,

$a = 10;

*b = *a ; $b++ ; # 1. Increment $a indirectly through the typeglob

$r = \$a; $$r++; # 2. Increment $a indirectly through the reference

Case 1 is faster than case 2.

@array = (10,20);

DoubleEachEntry(*array);

print "@array \n"; # prints 20 40

sub DoubleEachEntry {

local *copy = shift; # $_[0] contains *array

foreach $element (@copy) {

$element *= 2;

}

The typeglob *copy springs into existence when it is first encountered, but because it didn't exist prior to the local statement, it and the corresponding entry in the symbol table are removed at the end of the block.

A lexically scoped array cannot be used as a parameter to DoubleEachEntry, because lexical variables don't have typeglobs associated with them. The restriction is easily circumvented, because Typeglobs and references are strangely equivalent. An ordinary reference can be passed to a subroutine expecting a typeglob, and it'll work well.

my @array = (1, 2, 3);

DoubleEachEntry(\@array); # Instead of *array, which wouldn't work

Perl has a number of cryptic built-in variables such as $!, $/, and $@ , and it is better to work with longer descriptive names. The module English.pm in the standard Perl library provides nice big, long aliased names, such as $ERRNO, $INPUT_RECORD_SEPARATOR, and $EVAL_ERROR (respectively). For example,

use English; # Import the module file called English.pm

unlink ('/tmp/foo');

if ($ERRNO) { # Use $ERRNO instead of $!

print $ERRNO, "\n";

}

Aliasing Problem

Aliases don't create new variables. This often leads to weird values of variables. For example,

$x = 10;

foo(*x);

sub foo {

local(*y) = @_; # Save global *y's values. Alias it to *x

print "Before value of y : $y \n"; # y = 10

local($x) = 100; # local saves $x's value 10 and substitutes 100.

print "After value of y : $y \n"; # Since *y is still aliased to *x, $y now contains 100

}

The example below throws up an error.

foreach $f (10, 20, 30) {

foo (*f);

}

sub foo {

local (*g) = @_;

$g++;

}

For efficiency, the foreach operator aliases $f to the next element of the list in every iteration, each of which is a constant. The subroutine foo aliases *g to *f, which means that $g is aliased to a constant. For this reason, the operation $g++ causes a problem.

Typeglobs and References

A variable $a can be seen simply as a dereference of a typeglob ${*a}. For this reason, Perl makes the two expressions ${\$a} and ${*a} refer to the same scalar value. This equivalence of typeglobs and ordinary references has some interesting properties described below

Selective Aliasing

There is a way to create selective aliases, using the reference syntax:

*b = \$a;

In this case $b and $a are aliases, but @b and @a (or &b and &a, and so on) are not.

Constants

Read-only variables can be got by creating references to constants, like this:

*PI = \3.1415927;

$PI = 10; # Perl complains: "Modification of a read-only value attempted at try.pl line 3."

Naming Anonymous Subroutines

sub generate_greeting {

my ($greeting) = @_;

sub { print "$greeting world\n";}

}

$rs = generate_greeting("hello");

Instead of invoking it as $&rs(), give it your own name.

*greet = $rs;

greet();

This is equivalent to calling $&rs(). Prints "hello world\n"

References to Typeglobs

References to typeglobs can also be created by prefixing it with a backslash. For example,

$ra = \*a;

References to typeglobs are not used much in practice, because it is very efficient to pass typeglobs around directly.

Filehandles & Directory Handles

The built-in functions open and opendir initialize a filehandle and a directory handle, respectively:

open(F, "/home/calvin");

opendir (D, "/usr");

These handles don't have some basic facilities such as assigning handles or creating local handles. For example, the following statements are invalid.

local (G);

G = F;

Why is it so important to be able to assign handles and create local filehandles? Without assignment, filehandles cannot be passed as parameters to subroutines or maintained in data structures. This can be done using typeglob. For example,

*G = *F;

local (*F);

The following example shows how I/O redirection can be achieved.

open(F, '>/tmp/x') || die;

*STDOUT = *F;

print "hello world\n";

To keep this redirection temporary, localize *STDOUT.

The following piece of code passes a filehandle to a subroutine:

open (F, "/tmp/sesame") || die $!;

read_and_print(*F);

sub read_and_print {

local (*G) = @_;

while (<G>) { print; }

}

Strings as Handles

All Perl I/O functions that accept a handle also happen to accept a string instead. For example,

$fh = "foo";

open ($fh, "< /home/snoopy") ;

read ($fh, $buf, 1000);

When open examines its parameters, it finds a string where a typeglob should have been. In this case, it automatically creates a typeglob of that name and then proceeds as before.

Similarly, when read gets a string instead of a typeglob, it looks up the corresponding typeglob from the symbol table, and then the internal filehandle, and proceeds to read the appropriate file.

Subroutine References and Closures

As with ordinary variables, subroutines can be named or anonymous. Perl has a syntax for taking a reference to either type. Such references can be used to create sophisticated structures. For example,

1. Dispatch tables. Or data structures that map events to subroutine references. When an event comes in, a dispatch table is used to look up the corresponding subroutine. This is useful in creating large and efficient switch statements, finite state machines, signal handlers, and GUI toolkits.

2. Higher-order procedures. A higher-order procedure takes other procedures as arguments or returns new procedures.

3. Closures. A closure is a subroutine that, when created, packages its containing subroutine's environment (all the variables it requires and that are not local to itself).

References to Named Subroutines

To take a reference to an existing sub routine prefix it with a backslash. For example,

\&mysub is a reference to &mysub.

Example:

sub greet {

print "hello \n";

}

$rs = \&greet; # Create a reference to subroutine greet

Contrast this to the following code, which uses parentheses:

$rs = \&greet();

It calls greet and produces a reference to its return value, which is the value of the last expression evaluated inside that subroutine. Since print executed last and returned a 1 or a 0 the result of this expression is a reference to a scalar containing 1 or 0.

In short, do not use parentheses when taking a subroutine reference.

References to Anonymous Subroutines

An anonymous subroutine can be simply created by omitting the name in a subroutine declaration. For example,

$rs = sub {

print "hello \n";

};

This expression returns a reference to the newly declared subroutine. Notice that because it is an expression, it requires the semicolon at the end, unlike the declaration of a named subroutine.

Dereferencing Subroutine References

Dereferencing a subroutine reference calls the subroutine indirectly. Prepending the sub-routine reference variable with the appropriate prefix - "&" dereferences it. Perl does not care whether the sub-routine reference variable is pointing to a named or an anonymous subroutine; dereferencing works the same way in either case. For example,

&$rs(10, 20); # Call the subroutine indirectly

The subroutines can be indirectly called through references using the -> syntax. For example,

$rsub->(10);

Subroutine calls can be chained if the intermediate calls return subroutine references. For example:

sub test1 {

my $arg = shift;

print "$arg";

return \&test2;

}

sub test2 {

my $arg = shift;

print " and $arg\n";

}

$rs = \&test1;

$rs->("Batman")->("Robin"); # Prints "Batman and Robin"

Symbolic References

Symbolic references contain names (strings), not real references. There is no difference in syntax between real and symbolic references. For example,

sub foo { print "foo called\n" }

$rs = "foo";

&$rs(); # prints "foo called"

Using symbolic references is a much slower than using real references.

Dispatch Table

A dispatch table is an array of subroutine references. For example:

%options = ( # For each option, call appropriate subroutine.

"-h" => \&help,

"-f" => sub {$askNoQuestions = 1},

"-r" => sub {$recursive = 1},

"_default_" => \&default,

);

sub ProcessArgs {

my ($rlArgs, $rhOptions) = @_;

foreach $arg (@$rlArgs) {

if (exists $rhOptions->{$arg}) {

$rsub = $rhOptions->{$arg};

&$rsub(); # Call it.

} else { #option does not exist.

if (exists $rhOptions->{"_default_"}) {

&{$rhOptions{"_default_"}};

}

ProcessArgs (\@ARGV, \%options);

One step can be omitted by using the block form of dereferencing.

if (exists $rhOptions->{$arg}) {

&{$rhOptions->{$arg}}(); # Dereference and call sub in one shot

}

Signal Handlers

Usually, a program works by calling functions implemented by the operating system, not vice versa. An exception to this rule is when the operating system has an urgent message to deliver to the program. In many operating systems, the delivery is accomplished by means of signals. For example, a signal might be issued when a user presses Ctrl-C. A function can be specified to be called whenever a signal is delivered to the program. This allows the program to take appropriate action.

In Perl there is a special hash variable called %SIG whose keys are the names of signals, and its values correspond to subroutine names or references, which are called for the corresponding signal. For example,

sub ctrl_c_handler {

print "Ctrl C pressed \n";

}

$SIG {"INT"} = \&ctrl_c_handler; # "INT" indicates "Interrupt" signal.

Here, the word INT is a reserved string and signifies keyboard interrupts with Ctrl-C.

When assigning values to %SIG, Perl also allows the programmer to give the name of the subroutine, so the user doesn't have to give it a subroutine reference. For example

$SIG {"INT"} = 'ctrl_c_handler'; # Name of the subroutine passed.

Expression plotting

Suppose we want to plot a variety of functions, of the general type:

y = f(x)

where f(x) is a function that takes a number as an argument and returns another number. For example,

y = sin(2x) + cos2(x);

It is easy to develop a subroutine plot that can plot this expression in the range 0 to 2 [pi] :

$PI = 3.1415927;

$rs = sub { # Anonymous subroutine

my($x) = @_;

return sin (2*$x) + cos($x) ** 2; # Function to be plotted

};

plot ($rs, 0, 2 * $PI, 0.01);

This is an example of a higher-order procedure that takes (a reference to) another user-defined subroutine as an input parameter and calls it one or more times.

Callback Function

A callback function is a reference to a function that is usually passed into another function for use. It is a way of generalizing behavior. For example,

my $add = sub {return ($_[0] + $_[1]);};

my $sub = sub {return ($_[0] - $_[1]);};

my $mul = sub {return ($_[0] * $_[1]);};

my $div = sub { if ($_[1] != 0) {

return ($_[0] / $_[1]);

}

};

sub operation {

($action, $var1, $var2) = @_;

my($result) = $action->($var1, $var2);

print "$result\n";

}

&operation($add,10,20);

&operation($sub,10,20);

&operation($mul,10,20);

&operation($div,10,20);

Closures

Instead of returning data, a Perl subroutine can return a reference to a subroutine. For example,

$greeting = "hello world";

$rs = sub { print $greeting; };

&$rs(); #prints "hello world"

Another example:

sub generate_greeting {

my($greeting) = "hello world";

return sub {print $greeting};

}

$rs = generate_greeting();

&$rs(); # Prints "hello world"

The generate_greeting subroutine returns the reference to an anonymous subroutine, which in turn prints $greeting. The curious thing is that $greeting is a my variable that belongs to generate_greeting. Once generate_greeting finishes executing, it would be expected that all its local variables to be destroyed. A subroutine block is a package of code to be invoked at a later time, so it keeps track of all the variables it is going to need later on. When this subroutine is called subsequently and invokes print "$greeting", the subroutine remembers the value that $greeting had when that subroutine was created.

sub generate_greeting {

my($greeting) = @_;

return sub {

my($subject)= @_;

print "$greeting $subject \n";

};

}

$rs1 = generate_greeting("hello");

$rs2 = generate_greeting("my fair");

# $rs1 and $rs2 are two subroutines holding on to different $greeting's

&$rs1 ("world") ; # prints "hello world"

&$rs2 ("lady") ; # prints "my fair lady"

Such subroutines are known as a closure. Hence a closure is the special case of an anonymous subroutine holding onto data that used to belong to its scope at the time of its creation. Perl creates closures only over lexical (my) variables and not over global or localized (tagged with local) variables.

The name of a variable and its value are separate entities. When it first sees $greeting, Perl binds the name "greeting" to a freshly allocated scalar value, setting the value's reference count to 1. At the end of the block, Perl disassociates the name from the scalar value and decrements the value's reference count. In a typical block the value would be deallocated, since the reference count comes down to zero. In the above example, however, the anonymous subroutine happens to use $greeting, so it increments that scalar value's reference count, thus preventing its automatic deallocation when generate_greeting finishes. When generate_greeting is called a second time, the name "greeting" is bound to a whole new scalar value, and so the second closure gets to hang on to its own scalar value.

Why don't closures work with local variables? Variables marked local are dynamically scoped. A local variable's value depends on the call stack at the moment at which it is used. For this reason, if $greeting were declared local, Perl would look up its value when the anonymous subroutine is called not when it is defined. This can be verified with a simple example,

sub generate_greeting {

local ($greeting) = @_;

return sub { print "$greeting \n" ; };

}

$rs = generate_greeting("hello");

$greeting = "Goodbye";

&$rs(); # Prints "Goodbye", not "hello"

The anonymous subroutine is not a closure in this case, because it doesn't hang onto the local value of $greeting ("hello") at the time of its creation.

Using Closures

An iterator keeps track of where it currently is in a "stream" of entities and returns the next logical entity every time it is called. It is like a database cursor, which returns the next record from a stream of records. A stream can be bounded (a set of records from a database) or unbounded (a stream of even numbers). For example,

sub even_number_printer_gen {

# This function returns a reference to an anonymous subroutine. This anonymous

# subroutine prints even numbers starting from $input.

my($input) = @_;

if ($input % 2) { $input++}; # Next even number, if the given number is odd

$rs = sub {

print "$input ";

# Using $input, which is a my variable declared in an outside scope

$input += 2;

};

return $rs;

}

$iterator = even_number_printer_gen(30); # $iterator now points to a closure.

for ($i = 0; $i < 10; $i++) {

&$iterator();

}

# This prints 30 32 34 36 38 40 42 44 46 48

$iterator1 = even_number_print_gen (102);

$iterator2 = even_number_print_gen (22);

&$iterator1(); # Prints 102

&$iterator2(); # Prints 22

&$iterator1(); # Prints 104

&$iterator2(); # Prints 24

Notice how each subroutine reference is using its own private value for $input.

Two closures can share the same variables as long as the two closures are created in the same environment. For example,

sub even_odd_print_gen {

my ($rs1, $rs2);

my ($last) = shift; # Shared by the two closures below

$rs1 = sub { # Even number printer

if ($last % 2) {

$last ++;

} else {

$last += 2

}

print "$last \n";

};

$rs2 = sub { # Odd number printer

if ($last % 2) {

$last += 2;

} else {

$last++;

}

print "$last \n";

};

return ($rs1, $rs2); # Returning two anonymous sub references

}

($even_iter,$odd_iter) = even_odd_print_gen(10);

&$even_iter (); # prints 12

&$odd_iter (); # prints 13

&$odd_iter (); # prints 15

&$even_iter (); # prints 16

&$odd_iter (); # prints 17

Random number generation

The rand() function represents an iterator primed with a seed (using srand function). If the objective is to write a simulation program that depends on two independent sources of random number generation, using rand in both these sources does not make them independent. The reason is that rand calculates a new random number based on the last number it generated and calling rand for one stream affects the next number retrieved by the other stream.

Closures provide a nice solution. Instead of using srand, the user defined function my_srand will be used which returns a random-number-generating subroutine, seeded with an appropriate initial value. The definition of my_srand is given below

sub my_srand {

my ($seed) = @_;

# Returns a random number generator function

my $rand = $seed;

return sub {

# Compute a new pseudo-random number based on its old value. This number is #constrained between 0 and 1000.

$rand = ($rand*21+1)%1000;

};

}

$random_iter1 = my_srand (100);

$random_iter2 = my_srand (1099);

for ($i = 0; $i < 100; $i++) {

print $random_iter1(), " ", $random_iter2(), "\n";

}

Eval

Perl's eval function works in two somewhat distinct ways, depending on the type of its argument. If given a string, eval treats the string as a little program and compiles and executes it. This is called dynamic expression evaluation. The contents of the string may or may not be known at compile time. Alternatively, if given a block of code - that is, the code is known at compile time - eval traps run-time exceptions.

The String Form - Expression Evaluation

When Perl is given a file to execute or a string as a command line option (using -e), it needs to parse the contents, check it for syntax errors, and, if all is fine, execute it. Perl makes this feature available to the programmer through the eval string form. For example,

$str = '$c = $a + $b';

$a = 10; $b = 20;

eval $str; # Treat $str as code, and execute it.

print $c; # prints 30

In the above snippet, eval thinks of $str as a program and executes it. The important point is that it doesn't think of it as a separate program, but as if it belonged right there in the original code instead of the eval statement. For this reason, the string that is given to eval can use variables and subroutines available to it at that point, including my and local variables, and optionally produce new ones in the same environment. In the preceding example, the string given to eval adds two initialized variables ($a and $b) and produces a new variable, $c.

If there are more than one statement inside the string, eval evaluates all of them and returns the result of the last evaluation. For example,

$str = '$a++; $a + $b'; # Contains two expressions

$a = 10; $b = 20;

$c = eval $str; # $c gets 31

It's quite pointless to eval a piece of code that is known at compile time, as in the example above. The real use lies when $str comes from elsewhere - standard input, a file, or over the network.

Note: The string form of eval is a security risk. If the string argument comes from an un trusted source and contains, say, system('rm *'), then it might cause problems. This can be avoided by using the taint-checking option provided by Perl, which prevents the program from using data derived from outside the program to affect files or things outside the program.

If $str doesn't contain a valid Perl expression, Perl puts an error message in a special variable called $@. Since eval compiles the string before actually executing it, this can be either a compilation or a run-time error. The error strings in $@ are exactly those that are seen on the standard error output when processing a flawed script.

The Block Form - Exception Handling

In this form, eval is followed by a block of code, not a scalar containing a string. It is used for handling run-time errors, or exceptions. Errors can be internal built-in ones (out-of-memory, divide-by-zero) or user-defined ones produced by die. For example,

eval {

$a = 10; $b = 0;

$c = $a / $b; # Causes a run-time error, which is trapped by eval

};

print $@; # Prints "Illegal division by 0 at try.pl line 3

When the script is compiled, Perl syntax-checks the block of code and generates code. If it encounters a run-time error, Perl skips the rest of the eval block and sets $@ to the corresponding error text.

The die keyword can be used to throw your own errors. Perl knows whether a piece of code is currently executing inside an eval, and so, when die is called, Perl simply gives the error string - die's argument - to the global $@, and jumps to the statement following the eval block.

sub open_file {

open (F, $_[0]) || die "Could not open file: $!";

}

$f = 'test.dat';

while (1) {

eval {

open_file($f); # if open_file dies, the program doesn't quit

};

last unless $@; # no error: break out of the loop.

print "$f is not present. Please enter new file name $f";

chomp($f = <STDIN>);

}

In Perl a function can re-throw an exception if it doesn't want to handle it itself by calling die without arguments. For example,

eval {

...

};

if ($@ =~ /sorry, bucko/) {

....

} else {

# .. don't know what to do with it.

die; # Identical to die $@

}

If there is an enclosing eval block, this exception will be caught; otherwise, the program terminates.

Using Eval for Expression Evaluation

A useful application of eval is to check in a quoted string, if the quotes are balanced. For example,

'He said, "come on over"'

'There are times when "Peter" doesn\'t work at all'

A string is a correct Perl expression too. If the above strings are given as input to eval and if Perl puts an error in $@, then the input is faulty.

while (defined($s = <>))

{

$result = eval $s; # Evaluate that line

if ($@) {

print "Invalid string:\n $s";

} else {

print $result, "\n";

}

The above code can be used as a calculator also because $s can be any valid Perl statement, with arithmetic operators, loops, variable assignments, subroutines, and so on. For example, $s can be given values as:

2 * log (10);

$a = 10; $a += $a ** 2;

for (1..10) {print $_ , " " }

Expression Evaluation in Substitutions

If in a Perl substitution operator, a /e flag is added, it tells the substitution operator that the second part is a Perl expression, not an ordinary replacement string; the result of the expression is used as the replacement instead. For example,

$line = 'Expression Evaluation';

$line =~ s/(\w+)/ scalar (reverse($1)) /eg;

print $line; # prints "noisserpxE noitaulavE"

The /e flag stands for expression, and not for eval. This expression is checked for syntax at compile-time, so to watch for run-time errors put the entire statement within an eval block. For example,

$l = 'His chances of winning are between 2/5 and 1/3';

eval {

$l =~ s|(\d+)/(\d+)| $1 / $2 |eg;

};

print $l unless $@;

Using Eval for Efficiency

A Fast Multipattern grep

Consider a grep-like Perl script that can search for any number of patterns and print out only those lines that match all the given patterns. The code is

while ($s = <>) {

$all_matched = 1; # start by assuming all patterns match $s

foreach $pat (@patterns) {

if ($s !~ /$pat/) {

$all_matched = 0; # No, our assumption was wrong

last;

}

print $s if $all_matched;

}

The problem with this code is that the regular expression (/$pat/) is compiled afresh for every line and for every pattern. That is, if 10,000 lines in the text needs to be searched, and three patterns to search for, a.*b, [0-9], and [^def], the patterns will be compiled 30,000 times.

The fastest approach would be to hardcode the patterns as shown below. Unfortunately, it is also the least reusable approach.

while ($s = <> ) {

if ( ($s =~ /a.*b/) && ($s =~ /[0-9]$/) && ($s =~ /[^def]/)) {

print $s;

}

It is possible to get this level of efficiency without losing generality. The idea is to hard-wire the regular expression at run-time and then to eval it. For example,

$code = 'while (<>) {';

$code .= 'if (/';

$code .= join ('/ && /', @patterns);

$code .= '/) {print $_;}}';

print $code, "\n";

eval $code;

die "Error ---: $@\n Code:\n$code\n" if ($@);

# Check if faulty regular expressions given as input patterns

Using Eval for Time-Outs

Let's 10 seconds be the time set for the user to type something. A statement like

$buf = <>;

blocks the program until the user hits a carriage return.

The built-in function alarm() can be used to generate an ALRM signal after a given number of seconds.

$SIG{ALRM} = \&timed_out;

eval {

alarm (10);

$buf = <>;

alarm(0); # Cancel the pending alarm if user responds.

};

if ($@ =~ /GOT TIRED OF WAITING/) {

print "Timed out. Proceeding with default\n";

....

}

sub timed_out {

die "GOT TIRED OF WAITING";

}

The procedure timed_out is called after 10 seconds regardless of what Perl happens to be executing at that time, be it a blocked read or an infinite loop. If the user doesn't hit a return within 10 seconds, timed_out is called from the signal handler, which calls die, which internally jumps over to the statement following the eval. If the user does hit a return within the allotted time, alarm(0) is called to reset the alarm.

Modules

The package keyword signifies the beginning of a new namespace. All global identifiers such as names of variables, subroutines, filehandles, formats, and directory handles mentioned after this statement "belong" to that package. For example,

package A;

$a = 23;

sub first {

print "This is a subroutine of the first package";

}

The user-defined global identifiers $a and subroutine first belong to the package A.

The scope of a package lasts until the end of the innermost enclosing block if it is declared inside that block or until another package statement is encountered. In the absence of an explicit package declaration, Perl assumes a package name called main.

To access an identifier in a different namespace, put the package name before the variable name. If an identifier is not fully qualified, Perl looks for it in the currently active package.

Example,

package first;

$first = "Nishant";

sub first {

print "This is a subroutine in the first package.";

}

package B;

$first = "Sinha";

print "$first\n"; # prints "Sinha"

print "$first::first \n"; # prints "Nishant"

first::first(); # prints "This is a subroutine in the first package."

Since the package statement simply dictates the effective namespace, different namespaces can be switched at will. For example,

package first;

$first = "Nishant";

sub first {

print "This is a subroutine in the first package.";

}

package B;

$first = "Sinha";

print "$first\n"; # prints "Sinha"

print "$first::first \n"; # prints "Nishant"

first::first(); # prints "This is a subroutine in the first package."

package first;

print "$first\n" # prints "Nishant";

first(); # prints "This is a subroutine in the first package."

User-defined identifiers in package main can also refer to a variable, say $x, in that package as "$::x" or "$main::x". For example,

$first = "Nishant";

sub first {

print "This is a subroutine in the default package.\n";

}

package B;

$first = "Sinha";

sub first {

print "This is a subroutine in package B.\n";

}

print "$first\n"; # prints "Sinha"

first();

print "$::first \n"; # prints "Nishant"

::first(); # can also be written as main::first()

Each package actually gets its own symbol table.

The built-in variables such as $|, $_, @ARGV, and %ENV always belong to package main, and Perl refers to these variables in any package without having to prefix them with main::. These are the only truly global variables in Perl.

The lexical (my) variables are not associated with symbols and typeglobs and therefore have nothing to do with packages. It is a compile-time error to say something like

my $BankAccount::total; # Error

Symbolic References

Examples:

package A;

$x = 10;

package B;

print ${"A::x"}; # Access $A::x symbolically

$pkg = "A";

$var_name = "x";

print ${"${pkg}::$var_name"};

&{"A::foo"}(10, 20); # Identical to A::foo(10,20);

The same package declaration can be present in multiple files. Or multiple packages can be declared in one file. By convention, a package is usually assigned its own file and named package.pm or package.pl. Files with the suffix .pm are called Perl modules, and packages inside files with the suffix .pl are usually referred to as libraries.

The require keyword simply loads a file into the program. If the suffix and the quotes are omitted, a .pm suffix is assumed.

The use statement is more restrictive in that it accepts only module names, not filenames.

The big difference between use and require is that the use statement is executed as soon as it is parsed. For this reason, the following attempt to load a module dynamically won't work, because the assignment statement is executed only after everything has been parsed and compiled:

$pkg_name = "Account"; # executes at run-time

use $pkg_name; # executes at compile-time

The advantage of use is that when a program starts executing, there's a guarantee that all required modules have been successfully loaded, and there won't be any surprises at run-time.

When a file is require'd or use'd, it is expected to return a Boolean success value. That is, the last executing statement at global scope must be a statement such as "return 1;" or just "1;". Note that this is not necessarily the last statement in the file; it is just the last executing statement.

Perl first looks for the file given to use or require in the current directory and then looks up the @INC built-in array to search the include paths. By default, @INC contains a few standard directory names specified when the interpreter was installed and built. The paths given in @INC can be seen as:

perl -e 'print "@INC \n";'

To specify additional directories the following techniques can be used:

1. Set the PERL5LIB environment variable as a set of paths, separated by colons.

2. Modify @INC before calling require. For example,

unshift (@INC, "/usr/perl/include");

3. Use the -I command-line option. For example,

perl -I/home/nsh/perl -I/local/mylib script.pl

Perl can execute a code while it is still in the compilation stage. Normally, while parsing a file, Perl compiles the entire code, and when this process is successfully completed, it starts executing from the first global statement onward. However, if it encounters a subroutine or a block called BEGIN while parsing, it not only compiles it, but also executes it right away, before resuming the compilation of the rest of the file. For example,

sub BEGIN { # the word "sub" is optional

print "Executing before compiling.\n";

}

foo*** ; # Intentional error

This prints the following:

Executing before compiling.

syntax error at x.pl line 4, near "** ;"

Because a BEGIN block gets executed even before the compilation phase is over, it can be used to hardcode an include path in the program. For example,

BEGIN {

unshift (@INC, "../include");

}

use Foo; # Looks for Foo.pm in "../include" first

There are times when there is need to do some clean-up after all the code has executed. The END block is called just before the program is due to exit, independent of whether it was a successful exit or not. That is, even if the program dies because of, say, an exception, the END block is called anyway.

Perl supports multiple BEGIN and END statements. BEGIN statements are executed in the order in which they are seen, while END statements are executed in reverse order of appearance. If there are multiple packages with many BEGIN or END blocks, the order in which the packages were loaded is taken into account.

Privacy

A package can easily create new names in another package's namespace. For example,

package Test;

$main::foo = 10;

sub main::myFunc {

print "Hello \n";

}

package main;

myFunc(); # prints "Hello"

Perl does not enforce private and public parts of its modules. To make variables private, the my operator can be used at file scope.

A subroutine can be made private by using anonymous subroutines and holding references to them in lexical variables. For example,

my $rs_func = sub {

....

};

This function can be called as

&$rs_func

Importing Symbols

Symbols can be selectively imported into the current namespace by using the "use" statement which specifies an optional list of function names to be imported. For example,

use BankAccount ('withdraw', 'deposit');

withdraw(); # Can now call function without fully qualifying it.

For its part, the module should be ready to export these names. It should also have a policy for what it should do if the user does not specify a list at all. Both these tasks are handled by a standard module called Exporter. For example,

package BankAccount;

use Exporter;

@ISA = ('Exporter'); # Inherit from Exporter

@EXPORT_OK = ('withdraw', 'deposit');

sub deposit { .... }

sub withdraw { .... }

The above code loads the Exporter module and arranges to inherit from that module, using the @ISA array. The @EXPORT_OK array states which symbols are fine to export. If the user says,

use BankAccount ('deposit');

the deposit function can be called without fully qualifying the name, in contrast to withdraw(). To tell the Exporter module not to export any symbols into your namespace, leave the list blank.

If the module uses @EXPORT instead of @EXPORT_OK, the user gets all the exported symbols, regardless of whether they were mentioned in the import list or not.

Nesting Packages

Since all packages are global in scope, nesting of packages is not supported. However, it is possible to use statements like

use patient::encounter;

In this case Perl looks for a file called patient/encounter.pm (directory patient, file encounter.pm). The double colon gets translated to a filename separator.

Autoloading

If a function called Test::func() is invoked, for example, and if func() has not been defined in module Test, Perl automatically looks for a subroutine called Test::AUTOLOAD(). If such a subroutine exists, Perl calls it with the arguments that were passed to func(). In addition, a variable called $AUTOLOAD is set to the full name of the function that was called ("Test::func").

Accessing the Symbol Table

Perl allows users to get information about the contents of the symbol table. This property is sometimes called reflection or introspection.

Each package gets its own symbol table also called stash (short for "symbol table hash"). Perl makes these stashes available as regular associative arrays. The stash for a package named Foo can be accessed by using the hash called %Foo::. The main package is available as %main::, or simply as %::. In fact, all other packages' hash tables are available from the main stash %main::

The symbol names inside a package can be iterated as

foreach $name (keys %main::) {

print "$name, \n";

}

Each of these symbolic names maps to a typeglob, which itself points to one or more values (one or more of each type: scalar, array, hash, subroutine, filehandle, format name, or directory handle). Unfortunately, there's no direct way to find out which values actually exist. This has to be found programmatically. For example,

package DUMPVAR;

sub dumpvar {

my ($packageName) = @_;

local (*alias); # a local typeglob

*stash = *{"${packageName}::"}; # Now %stash is the symbol table

$, = " "; # Output separator for print

while (($varName, $globValue) = each %stash) {

print "$varName ============================= \n";

*alias = $globValue;

if (defined ($alias)) {

print "\t \$$varName $alias \n";

}

if (defined (@alias)) {

print "\t \@$varName @alias \n";

}

if (defined (%alias)) {

print "\t \%$varName ",%alias," \n";

}

package XX;

$x = 10;

@y = (1,3,4);

%z = (1,2,3,4, 5, 6);

$z = 300;

DUMPVAR::dumpvar("XX");

Object-Oriented Programming

Objects in Perl:

An object (also called an instance) has the following:

1. Attributes or properties

2. Identity (one object is different from another)

3. Behavior

Objects of a certain type are said to belong to a class. All objects of a class have the same functionality.

Attributes

An object is a collection of attributes. An array or a hash can be used to represents this set. For example,

%employee = ("name" => "John Doe", "age" => 32, "position" => "Software Engineer");

print "Name: ", $employee{name};

Unique Identity

Clearly, one %employee won't suffice. Each employee requires a unique identity and his or her own collection of attributes. This can be achieved by either allocating this structure dynamically or returning a reference to a local data structure.

# Using an anonymous hash

sub new_employee {

my ($name, $age, $starting_position) = @_;

my $r_employee = { # Create a unique object

"name" => $name, # using an anonymous hash

"age" => $age,

"position" => $starting_position

};

return $r_employee; # Return "object"

}

# OR, returning a reference to a local variable

sub new_employee {

my ($name, $age, $starting_position) = @_;

my %employee = (

"name" => $name,

"age" => $age,

"position" => $starting_position

);

return \%employee; # return a reference to a local object

}

$emp1 = new_employee("John Doe", 32, "Software Engineer");

$emp2 = new_employee("Norma Jean", 25, "Vice President");

new_employee() returns a reference to a unique data structure in both cases.

In the above example, the hash table is the object, and the reference to the hash table is termed the object reference.

Behavior

All functions that access or update one or more attributes of the object constitute the behavior of the object.

Example:

sub promote_employee {

my $r_employee = shift;

$r_employee->{"position"} = lookup_next_position($r_employee->{"position"});

}

promote_employee($emp1);

Such functions are also called instance methods because they require a specific instance of an object.

Asking a class to do something for you is calling a class method.

Asking an object to do something for you is calling an object method.

Asking either a class (usually) or an object (sometimes) to give you back an object is calling a constructor, which is just a kind of method.

To avoid having to suffix every method with the suffix "_employee," put all these functions in a package of their own, called Employee:

package Employee;

sub new { # No need for the suffix.

my ($name, $age, $starting_position) = @_;

my $r_employee = { # Create a unique object

"name" => $name, # using an anonymous hash

"age" => $age,

"position" => $starting_position

};

return $r_employee; # Return "object"

}

sub promote {

my $r_employee = shift;

$r_employee->{"position"} = lookup_next_position($r_employee->{"position"});

}

Use this module as:

$emp = Employee::new("John Doe", 32, "Software Engineer");

Employee::promote($emp);

The user of this code invokes only the interface functions "new" and "promote" and does not know or care about the type of data structure used to store employee details. Complications arise when the problem gets more involved. Suppose information needs to be added about hourly and regular employees. Hourly employees get paid by the hour and are eligible for overtime pay, while regular employees get a monthly salary. One way to approach it is to create a new function per type of employee. For example,

package Employee;

# Creating Regular Employees

sub new_regular {

my ($name, $age, $starting_position, $monthly_salary) = @_;

my $employee = { "name" => $name,

"age" => $age,

"position" => $starting_position,

"monthly_salary" => $monthly_salary

};

return $employee; # return the object reference

}

# Hourly Employees

sub new_hourly {

my ($name, $age, $starting_position,$hourly_rate, $overtime_rate) = @_;

my $employee = { "name" => $name,

"age" => $age,

"position" => $starting_position,

"hourly_rate" => $hourly_rate,

"overtime_rate" => $overtime_rate

};

return $employee; # return the object reference

}

To get an employee's year-to-date salary, a distinction has to be made between the two types of employees. Two subroutines compute_hourly_ytd_income() and compute_regular_ytd_income() could be created. But other differences between hourly and regular employees (such as allowed vacation, medical benefits, and so on) or the introduction of other types of employees (such as temporary employees) results in a combinatorial explosion of functions. Worse, the interface requires the user of this package to make a distinction between types of employees to be able to call the right function.

To get us out of this situation, different types of employees are put in different packages. Then the "bless" keyword is used to tag objects internally with a pointer to the packages they belong to. For example,

package RegularEmployee;

sub new {

my ($name, $age, $starting_position, $monthly_salary) = @_;

my $r_employee = { "name" => $name,

"age" => $age,

"position" => $starting_position,

"monthly_salary" => $monthly_salary,

"months_worked" => 0

};

bless $r_employee, 'RegularEmployee'; # Tag object with pkg name

return $r_employee; # Return object

}

sub promote {

my $r_employee = shift;

$r_employee->{"position"} = lookup_next_position($r_employee->{"position"});

}

sub compute_ytd_income{

my $r_emp = shift;

# Assume the months_worked attribute got modified at some point

return $r_emp->{'monthly_salary'} * $r_emp->{'months_worked'};

}

package HourlyEmployee;

sub new {

my ($name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_;

my $r_employee = { "name" => $name,

"age" => $age,

"position" => $starting_position,

"hourly_rate" => $hourly_rate,

"overtime_rate" => $overtime_rate};

bless $r_employee, 'HourlyEmployee';

return $r_employee;

}

sub promote {

my $r_employee = shift;

$r_employee->{"position"} = lookup_next_position($r_employee->{"position"});

}

sub compute_ytd_income {

my ($r_emp) = $_[0];

return $r_emp->{'hourly_rate'} * $r_emp->{'hours_worked'} + $r_emp->{'overtime_rate'} * $r_emp->{'overtime_hours_worked'};

}

bless is given an ordinary reference to a data structure. It tags that data structure (note: not the reference) as belonging to a specific package and thus bestows on it some more powers. For example,

$emp1 = RegularEmployee::new('John Doe', 32, 'Software Engineer', 5000);

$emp2 = HourlyEmployee::new('Jane Smith', 35, 'Auditor', 65, 90);

In order to manufacture objects, a class needs to have a constructor method. A constructor gives back a brand-new object in that class. This is taken care of by the bless() function, whose sole purpose is to enable its referent to be used as an object.

Note: Being an object really means nothing more than that methods may now be called against it.

While a constructor can be named anything, most Perl programmers seem to like to call theirs new(). However, new() is not a reserved word, and a class is under no obligation to supply such. Some programmers have also been known to use a function with the same name as the class as the constructor.

How is an object different from any other Perl data type?

An object is different from any other data type in Perl in one and only one way: It can be dereferenced using not merely string or numeric subscripts as with simple arrays and hashes, but with named subroutine calls. In a word, with methods.

What's its fundamental type?

It's a reference, but one whose referent has been bless()ed into a particular class (read: package). What kind of reference? It could be a scalar, an array, or a hash reference. It could even be a code reference. But because of its inherent flexibility, an object is usually a hash reference.

The class (package) should provide one or more ways to generate objects (using constructor). Finally, it should provide mechanisms to allow users of its objects to indirectly manipulate these objects from a distance (using getter and setter methods).

The arrow notation is used to directly invoke instance methods or invoke methods on the object. For example,

$emp1->promote();

$emp2->compute_ytd_income();

When Perl sees $emp1->promote(), it determines the class to which $emp1 belongs (the one under which it has been blessed). In this case, it is the Regular-Employee. Perl then calls this function as follows: RegularEmployee::promote($emp1).

In other words, the object on the left side of the arrow is simply given as the first parameter of the appropriate subroutine.

Both the :: and -> notations are in fact permissible. The first one is more flexible because Perl figures out the class at run time, while the latter is faster because the function to be called is known at compile time.

An instance method in Perl is an ordinary subroutine whose first parameter simply happens to be an object reference.

The advantage of using bless is that the module user doesn't have to discriminate between types of objects using an if statement but instead lets Perl take care of routing a call to the appropriate function. That is, instead of saying something like

if (ref($emp) eq "HourlyEmployee") {

$income = HourlyEmployee::compute_ytd_income($emp);

} else {

$income = RegularEmployee::compute_ytd_income($emp);

}

it suffices if,

$income = $emp->compute_ytd_income();

This ability of Perl to call the appropriate module's function is called run-time binding.

The ref function returns a string indicating the type of the entity pointed to by the reference. In the case of a blessed object reference, it returns the name of the corresponding class.

Class Methods and Attributes

Class attributes are properties that pertain to all instances of a class, but don't vary on a per-employee basis. For example, one insurance company might provide health coverage for all employees, so it doesn't make sense to store the name of this company in each and every employee.

Class methods (also known as static methods) are functions that are relevant to that class but don't need a specific object instance to work with. For example, a subroutine called get_employee_names() doesn't require an employee object as input to figure out what it has to do.

Perl has no specific syntax for class attributes and methods. Class attributes are simply package global variables, and class methods are ordinary subroutines that don't work on any specific instance.

Perl supports polymorphism and run-time binding for these ordinary subroutines and not just instance methods. For example,

$record = <STDIN>; # Tab delimited record containing employee details

($type, $name, $age, $position) = split(/\t/, $record);

$emp = $type->new($name, $age, $position);

$emp->compute_ytd_income(); # Now use the object as before

In the above example, $type can contain either of these two strings: "HourlyEmployee" or "RegularEmployee." This approach improves on the example in the previous section by avoiding having to hardcode the name of the package. If this facility was not available, the following code would be needed

if ($type eq "HourlyEmployee") {

$emp = HourlyEmployee->new(....);

} else {

$emp = RegularEmployee->new(....);

}

Since in the case of an instance method, the object to the left of the arrow is passed as the first parameter to the subroutine, the procedure HourlyEmployee::new must be rewritten as:

package HourlyEmployee;

sub new {

my ($pkg, $name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_;

...

}

The indirect notation

PERL supports an alternative to the arrow notation, called the indirect notation, in which the function name precedes the object or class name. For example

$emp = new Employee ("John Doe", 20, "Vice President");

promote $emp "Chairman", 100000;

Notice that there is no comma between $emp and the first argument ("Chairman"). This is how Perl knows that a method is called using the indirect notation and not calling a subroutine in the current package.

The Need for Inheritance

Perl has no special syntax for specifying the class (or classes) to inherit from. Each package has a variable called @ISA, which governs (method) inheritance. If a method on an object or class is called and that method is not found in that object's package, Perl then looks to @ISA for other packages to go looking through in search of the missing method.

Most classes have just one item in their @ISA array. In this case it is called "single inheritance"

For example,

package Man;

@ISA = qw(Mammal Social_Animal);

All methods common to mammals are supported in the Mammal class and don't have to be implemented in Man too. For example,

package Employee;

use Person;

@ISA = ("Person");

1; # so that require or use succeeds

In the above example, all it's doing so far is creating a Employee class and stating that this one will inherit methods from Person class if need be. The Employee class will behave just like a Person class. Setting up an empty class like this is called the "empty subclass test", that is, making a derived class that does nothing but inherit from a base class. If the original base class has been designed properly, then the new derived class can be used as a drop-in replacement for the old one.

use Employee;

my $empl = Employee->new();

$empl->name("Jason");

$empl->age(23);

printf "%s is age %d.\n", $empl->name, $empl->age;.

Proper design implies always

1. Using the two-argument form of bless(),

2. Avoiding direct access of global data

3. Not exporting anything.

In Perl, a method is just a function that expects as its first argument a class name (package) or object (blessed reference). While a method call looks a lot like a function call, they aren't really quite the same. First, the actual underlying calling conventions are different: method calls get an extra argument. Second, function calls don't do inheritance, but methods do. For example,

Method Call	Resulting Function Call
Person->new()	Person::new("Person")
Employee->new()	Person::new("Employee")

New method can be added to the Employee class as

sub salary {

my $self = shift;

if (@_) { $self->{SALARY} = shift }

return $self->{SALARY};

}

sub id_number {

my $self = shift;

if (@_) { $self->{ID} = shift }

return $self->{ID};

}

sub start_date {

my $self = shift;

if (@_) { $self->{START_DATE} = shift }

return $self->{START_DATE};

}

Overriding base classes

When both a derived class and its base class have the same method defined, the method of the derived class is invoked by the object of the derived class.

Examples:

# The contents of package BASE.pm is

package BASE;

sub new {

my ($class) = shift;

my ($val) = {};

bless ($val, $class);

return $val;

}

sub display {

my ($obj) = shift;

print "This is base class display.\n";

}

1; # so that require or use succeeds

# The contents of package CHILD.pm is

use BASE;

package CHILD;

@ISA = qw(BASE);

sub new {

my($class) = shift;

my($obj);

$obj = $class->SUPER::new();

return $obj;

}

sub display {

my ($obj) = shift;

print "This is child class display.\n";

}

1; # so that require or use succeeds

# The Perl program is

use CHILD;

$ref = CHILD->new();

$ref->display(); # The derived class method display() is invoked.

To invoke the base class method from program, modify the Perl program as

use CHILD;

@ISA = qw(CHILD);

$ref = CHILD->new();

$ref->BASE::display(); # The base class method display() is invoked.

This syntax tells Perl to start the search for display() in the @ISA hierarchy, starting from BASE. A small problem here is that by hardcoding the name of a class (BASE). To avoid this, Perl provides a pseudoclass called SUPER. It is used as

$ref->SUPER::display();

This searches the @ISA hierarchy for the appropriate promote subroutine.

Another Example:

# The contents of package BASE.pm is

package BASE;

sub new {

($class, $name, $empId, $company) = @_;

print "class is $class\n";

my ($val) = { "Name" => $name,

"EmpId" => $empId};

bless ($val,$class);

return $val;

}

sub display {

$obj = shift;

print "Name is $obj->{\"Name\"}\n";

print "Employee Id is $obj->{\"EmpId\"}\n";

}

1; # so that require or use succeeds

# The contents of package CHILD.pm is

use BASE;

package CHILD;

@ISA = qw(BASE);

sub new {

my($class, $name, $empId, $company) = @_;

my($obj);

$obj = $class->SUPER::new($name, $empId);

$obj->{"Company"} = $company;

return $obj;

}

sub display {

$obj = shift;

$obj->SUPER::display();

print "Company is $obj->{\"Company\"}\n";

}

1; # so that require or use succeeds

# The Perl program is

use CHILD;

$ref = CHILD->new("Nishant","6155","Keane");

$ref->display();

In the above example, the constructor of the derived class CHILD ,i.e., the new method calls the constructor of the base class BASE. Similarly, when the method display() is called by the derived class CHILD, the method is turn calls its base class counterpart.

Arrays as Objects:

Here's another implementation of Person class that conforms to the same interface specification. This time an array reference is used instead of a hash reference to represent the object.

package Person;

use strict;

my($NAME, $AGE, $PEERS) = ( 0 .. 2 );

sub new {

my $self = [];

$self->[$NAME] = undef; # this is unnecessary

$self->[$AGE] = undef; # as is this

$self->[$PEERS] = []; # but this isn't, really

bless($self);

return $self;

}

sub name {

my $self = shift;

if (@_) { $self->[$NAME] = shift; }

return $self->[$NAME];

}

sub age {

my $self = shift;

if (@_) { $self->[$AGE] = shift; }

return $self->[$AGE];

}

sub peers {

my $self = shift;

if (@_) { @{ $self->[$PEERS] } = @_; }

return @{ $self->[$PEERS] };

}

1; # so the require or use succeeds

The array access would be a little bit faster, but not more than ten or fifteen percent than the hash access, even when the variables above like $AGE are replaced with literal numbers, like 1.

Closures as Objects:

Closures have many uses. In Perl, one of them is protecting the data of objects from outside interference. The key observation is that the my variables of a closure created inside a subroutine are the locals of that subroutine, and these are out of scope, and therefore truly inaccessible, anywhere else.

The strategy is to store the object's data in local variables of the constructor, and use closures, created inside the constructor and called from the methods, to perform the actions of the class. To make the closures available for subsequent use, store them in a hash, and bless a reference to that hash to return as the object. For example,

package Person;

use strict;

sub new {

my $class = shift;

my ($name, $age, @peers) = ();

my (%person) = ();

my ($ref_person) = "";

if (@_) {

$name = shift;

}

if (@_) {

$age = shift;

}

if (@_) {

@peers = @_;

}

$person{"NAME"} = sub {

my $self = shift;

if (@_) { $name = shift }

return $name;

};

$person{"AGE"} = sub {

my $self = shift;

if (@_) { $age = shift }

return $age;

};

$person{"PEERS"} = sub {

my $self = shift;

if (@_) { @peers = @_ }

return @peers;

};

$ref_person = \%person;

bless ($ref_person, $class);

return $ref_person;

}

1; # so the require or use succeeds

my($ref_person) = Person->new("Sasuke",20,"Sakura","Naruto");

my ($n) = &{$ref_person->{"NAME"}}();

my ($a) = &{$ref_person->{"AGE"}}();

my (@p) = &{$ref_person->{"PEERS"}}();

print "Name is : $n\n";

print "Name is : $a\n";

print "Peers are @p";

Object Destruction

Perl automatically garbage collects a data structure when its reference count drops to zero. If a data structure has been blessed into a module, Perl allows that module to perform some clean-up before it destroys the object, by calling a special procedure in that module called DESTROY and passing it the reference to the object to be destroyed:

package Employee;

sub DESTROY {

my ($emp) = @_;

print "Alas, ", $emp->{"name"}, " is now no longer with us \n";

}

Perl's garbage collection is deterministic; DESTROY is called as soon as the object is not being referred to any more.

Note that it is not necessary to declare this subroutine. This subroutine needs to be declared only if some clean-up work has to be done.

The subroutine AUTOLOAD is called if a function is not found. In case AUTOLOAD subroutine is defined but not the DESTROY method, it might be useful to ensure that AUTOLOAD checks for this possibility:

sub AUTOLOAD {

my $obj = $_[0];

return if $AUTOLOAD =~ /::DESTROY$/;

# ....

}

Accessor Methods

To discourage direct access to an object's attributes, "accessor methods" are provided. These two methods read and update the "position" attribute of an employee. For example,

$pos = $emp->get_position(); # read attribute

$emp->set_position("Software Engineer"); # write attribute

The more popular convention is to have a single method to handle both read and write access:

$pos = $emp->position(); # read attribute

$emp->position("Software Engineer"); # write attribute

The method $position() might be implemented as

package Employee;

sub position {

my $obj = shift;

if (@_) {

$obj->{position} = shift # modify attribute

}

return $obj->{position}; # retrieve attribute

}

The accessor methods have the following advantages:

· Side effects: Accessor methods are sometimes used for triggering actions in addition to retrieving or updating the attribute.

· Access checking: Accessor methods can be made to disallow updates. For example, primary key attributes such as an employee's name should not be updatable once created; an accessor can easily enforce this.

UNIVERSAL

All modules implicitly inherit from a built-in module called UNIVERSAL and inherit the following three methods:

1. isa (package name)

For example, Rectangle->isa('Shape') returns true if the Rectangle module inherits (however indirectly) from the Shape module.

2. can (function name)

Rectangle->can('draw') returns true if the Rectangle or any of its base packages contain a function called draw.

3. VERSION (need version)

For example,

package Bank;

$VERSION = 5.1;

and the user of this module says,

use Bank 5.2;

Perl automatically calls Bank->VERSION(5.2), which can, for instance, make sure that all libraries required for version 5.2 are loaded. The default VERSION method provided by UNIVERSAL simply dies if the Bank's $VERSION variable has a lower value than that needed by the user of the module.

Searching for Methods

The two places that Perl searches when it cannot find a method in the target module are the inheritance hierarchy (@ISA) and AUTOLOAD. The following example shows the precise order in which all these subroutines are searched:

package Man;

@ISA = qw(Mammal Social_Animal);

When a call to Man->schmooze is made, the subroutine schmooze is searched in the following sequence

Man::schmooze

Mammal::schmooze

(Mammal's base classes, recursively)::schmooze

Social_Animal::schmooze

(Social_Animal's base classes, recursively)::schmooze

UNIVERSAL::schmooze (because UNIVERSAL is implicitly at the end of every module's @ISA array)

Then AUTOLOAD is looked up in the same order:

Man::AUTOLOAD

Mammal::AUTOLOAD

(Mammal's base classes, recursively)::AUTOLOAD

Social_Animal::AUTOLOAD

(Social_Animal's base classes, recursively)::AUTOLOAD

UNIVERSAL::AUTOLOAD

The first available subroutine is given the control and the search is stopped. If all fails, Perl throws a run-time exception.

Conventions

The following conventions must be followed while handling Perl modules.

· A module must be present in its own file called <module>.pm. Ensure that the last executing global statement must return 1 to signify successful loading.

· All subroutines in a module should be designed as methods. That is, they should expect either the name of a class or an object reference as their first parameter. For added convenience, they should be able to deal with either.

· Package names should never be hardcoded. The package name obtained as the first argument must always be used to supply to bless. This enables a constructor to be inherited.

· Always provide accessor methods for class and instance attributes.

Ok-Guru

Search This Blog

Saturday, June 13, 2020

ADVANCED PERL PROGRAMMING

Data References and Anonymous Storages

No comments:

Post a Comment

About Me