129

Chapter 9:

Odd Corners

The place through which he made his way at leisure
was one of those receptacles for old and curious things
which seem to crouch in odd corners of this town...

Charles Dickens, The Old Curiosity Shop

Perl is a powerful and complex language. As with many languages, the
expressions you write may be simple or complicated, verbose or terse. You
may write scripts that are elegant and carefully crafted, or scripts that
"just get the job done". There really is More Than One Way To Do It.

So far, we've presented a subset of Perl - a subset we think you can use to
create many useful programs. We hope that we've also provided you with a
solid base on which you can build. You should be well equipped to learn
more about Perl as you find new areas to explore and new problems to solve.

Along the way, we've touched on a number of special constructs that make
Perl (and MacPerl) powerful, interesting, and often unique. Perl is full of
interesting idioms, programming paradigms, curious constructions, and odd
corners. In this chapter, we will focus on these special constructs, pulling
them together into one place and discussing them in more detail.

We don't expect everyone to use all of the features we discuss in this chap-
ter (we even recommend against one or two of them!). Everything here is
part of the language, however, and sooner or later you're liable to run across
some of these "odd corners"; we think it's best if you're properly prepared.


IMAGE imgs/220.Oddments01.gif

$_

The global scalar variable, $_, represents the default input and pattern
searching space. There are many places in Perl where, if you don't specify a
variable to use, Perl will assume
$_.1These include

  • all of the file test operators (except for -s, which defaults to STDIN)
  • various functions, such as chomp(), print(), unlink(), and int()
  • pattern matching operations: m//, s///, and tr/// (if you leave out
    the
    =~)
  • foreach loops, if no other iteration variable is specified
  • while loops, when the sole condition of the whileis to read input
    records from a
    <> or <FILEHANDLE>

For example

while (<>) {
chomp;
if (/^#/) {
print;
}
}

# read input line into $_
# chomp($_)
# if ($_ =~ m/^#/)
# print "$_"

When you see Perl statements that don't appear to be acting upon a vari-
able, they are most likely acting upon
$_.

use English

If the symbol soup of the special variables begins to get too confusing, an
alternative is available. Simply begin your program with the directive

use English;

and you will be able to use longer (and theoretically more mnemonic) names.
Some variables have both a medium and a long name. The medium-length
name is usually reminiscent of a similar variable in another language, such
as
awkor C. For example, here are some of the global special variables and
their longer "English" names:
2

IMAGE imgs/220.Oddments02.gif

1If you like, you can always specify$_. But Perl will understand if you don't.
2See Chapter 21, Special Variables, for caveats and a definitive list.


IMAGE imgs/220.Oddments03.gif

$_
$.
$/
$\
$0
$]
$^O

$ARG
$NR
$RS
$ORS

$INPUT_LINE_NUMBER
$INPUT_RECORD_SEPARATOR
$OUTPUT_RECORD_SEPARATOR
$PROGRAM_NAME
$PERL_VERSION

$OSNAME

MacPerl Oddities - Portability Issues

In many ways, MacPerl is no different from "regular" (that is, Unix) Perl. In
some ways, however, it is vastly different. Recognizing those differences
will help you to understand why scripts you might pick up from the CPAN
or other sites may not work without some effort on your part. Understanding
the differences may also help you to write programs that are more portable
to other platforms.

#!perl

We've said that MacPerl programs can (and should) begin with the string

#!perl

optionally followed by switches, such as -w. On Unix systems, the #!line is
magical - it provides a directive to the Unix kernel, giving it the path to
the program that should be used to run the script, as if that command were
given on the command line. On a Macintosh, the
#!line is emulated. Mac-
Perl itself reads the
#!perlline and any switches that may be included on
it. Because of the way
#!is implemented, many Perl switches cannot be
included in it when using MacPerl. If you try, you'll get an error message:

# Can't emulate -x on #! line.

\nand \r

On Unix systems, the line-ending character (record separator) is the line
feed (ASCII \012). On Mac OS, it's the carriage return (\015). On MS-DOS
and Windows systems, it's the sequence "carriage return/line feed" (\012
\015). Perl uses
\nto represent a newline, which Unix systems interpret as
a line feed character. For portability, the newline character in MacPerl
outputs a carriage return (the Mac OS newline character).

This can cause some difficulties if you share files (or scripts!) between Mac
OS and Unix systems. You'll need to be sure to translatethe newline charac-


IMAGE imgs/220.Oddments04.gif

ters to avoid problems. Smart text editors such as BBEdit or Alpha, and
many file transfer programs such as Fetch will make the proper transla-
tions for you. The debate over whether MacPerl should recognize and trans-
late various line-endings frequently rages hot and heavy in some circles.
For now, however, accept the fact that there is a difference to be handled.

You should also note that, although \nproduces a carriage return under
MacPerl,
\r(which produces a return without a newline under Unix Perl),
produces a line feed under MacPerl. Programs that use
\rto back up to the
beginning of a line (without moving to the next line) will not exhibit the
expected behavior under MacPerl. Beware.

Running commands: system()and backquoted commands

Unless you are using ToolServerand MPW, the system()command is not
available under MacPerl. A few backquoted commands, such as
`pwd`, are
emulated, but the number available is small. Even with
ToolServer, the
only commands you will be able to use will be MPW commands.

Unfortunately, if you start picking up Perl scripts from the Net, you will
find that many of them make use of
system()and backquoted commands.
The good news is that many of these scripts can be modified to run under
MacPerl (by rewriting these calls in native Perl code). You may have to do
a little research, and you may need to get creative, but give it a try before
you give up.

Here are some common Unix commands that may be found in system()
statements, along with their Perl equivalents:

system("mv...")
system("rm...")
system("date...")
system("cp...")
system("find...")
system("echo...")
system("mkdir...")

rename(...)
unlink(...)
localtime(...) # must be parsed
use File::Copy; copy(...)
use File::Find; find(...)
open(...); print...
mkdir(...)

Processes: fork, exec, wait...

Unfortunately, the Mac OS is not designed to fork and execute processes in
the same way that a Unix system does. These commands are not imple-
mented, and there are no good workarounds available. If you pick up a Perl


IMAGE imgs/220.Oddments05.gif

script that forks subprocesses, you're probably in for a difficult porting
effort. You may want to try to find some other way to solve the problem ...

Time

Both Unix and the Mac OS view time as the number of seconds since the
epoch. However, the two operating systems have vastly different ideas of
when the epoch began! Unix, and Unix Perl, count time from January 1, 1970.
In contrast, Mac OS and MacPerl count time from January 1, 1904.

The various time functions, such as localtime(), have been modified to
take this difference into account. However, if you work with raw time val-
ues, such as those returned by
time(), or port your code to another operat-
ing system, be wary.

Filename Globbing

Under Unix, the glob()function (and the shorthand globbing operator, <>)
invoke the C shell to handle filename expansions. Under Mac OS, the C
shell is not available, so MacPerl implements filename globbing itself.
Unix Perl has a few extra metacharacters available for glob patterns, such
as character classes (e.g.,
[a-f]*). MacPerl currently supports only two
metacharacters: * matches any number of characters; ? matches a single
character.

Pathnames

If you pick up Perl scripts from the net, read other books on Perl, or move a
script from Mac OS to a Unix machine, sooner or later you will run into dif-
ferences in pathnames. On Unix systems, pathnames are relatively common;
it's difficult to spend very much time entering commands from a command
line without using pathnames! However, because of the Macintosh's point
and click, drag and drop interface, Mac OS users can go for a long time with-
out knowing about pathnames, much less getting used to employing them.

Not only are pathnames unfamiliar to many Mac OS users, but Mac OS
pathnames and Unix pathnames are only superficially similar. Users of
both operating systems can specify either absolute (full ) or relative paths,
where the last element of the path can be a file or a directory (folder) and
the elements leading to it are directories. Now for the differences...

Unix pathname elements are separated by a slash, /. An absolute path-
name always starts with a
/. A relative pathname always starts with the


IMAGE imgs/220.Oddments06.gif

name of a directory, except, of course, in the minimal case of naming a single
file. The current directory is specified by
.("dot"); the parent directory is
specified by
..("dot dot"). The file tree can be traversed upwards by
adding additional parents,
.., always separating path elements by /.

/usr/local/bin/perl
../local/bin/perl
../../bin/perl
bin/perl
perl

# absolute pathname
# relative pathname
# relative pathname
# relative pathname
# relative pathname (file)

Mac OS pathname elements, in contrast, are separated by a colon, :. An
absolute pathname always starts with the name of a volume. A relative
pathname starts with a colon,
:, followed by the name of either a folder or
a file (except in the minimal case of naming a single file, in which case the
colon is optional). The current directory is specified by
:;the parent direc-
tory by
::. The file tree can be traversed upwards by adding additional
parents,
:, but no additional separators.

HD:MacPerl ƒ:MacPerl
::MacPerl ƒ:MacPerl
:MacPerl ƒ:MacPerl
:MacPerl
MacPerl

# absolute pathname
# relative pathname
# relative pathname
# relative pathname (file)
# relative pathname (file)

To Parenthesize Or Not To Parenthesize

Throughout Part II of this book, we have made very liberal use of paren-
theses. As we have pointed out, "if it looks like a function, it is a function",
and in Part II our examples have cast most of the Perl named operators and
list operators as functions.

In the "real world" of Perl code, however, this is not always the norm. In
particular,
print, die, and warnstatements are often seen without paren-
theses. This is especially useful in cases where you want to mix strings to
print with expressions to be evaluated.

print '10**3 is ', 10**3, ".\n";

die "Something went wrong: $!";


IMAGE imgs/220.Oddments07.gif

In Part III, you will see many more statements without parentheses. Keep in
mind, however, that you can always parenthesize if it makes the meaning
clearer, or helps determine precedence. (Just be careful to get it right!)

Taking Shortcuts

According to Larry Wall, the three great virtues of a programmer are lazi-
ness
, impatience, and hubris. Perl gives you ample opportunity to develop
these virtues. One of the ways it does this is by providing ways to create
shortcuts in your code; why write several statements where one will serve?

We saw a little of this back in Chapter 5, Building Blocks, with the
shortcut assignment operators:

$x = $x +1;
$x += 1;
$x ++;

But there are many other (and sometimes very interesting) ways to create
shortcuts. For example, here's a popular idiom; instead of

$a = $b;
$a =~ s/^/#/;

why not make the assignment and the substitution at the same time?

($a = $b) =~ s/^/#/;

Note that $bis unchanged.

The?:Conditional Operator

The ?:conditional operator can provide a shortcut for the simple if ...
else ...
statement. Recall this code from Building Blocks:

if ($count == 1) {
print("My Mac has 1 mouse.\n")
} else {
print("My Mac has $count mice.\n")
}

Using the ?:operator, we can rewrite this as

print "My Mac has $count ",
($count == 1) ? "mouse.\n" : "mice.\n";


IMAGE imgs/220.Oddments08.gif

As shown here, the second line handles the conditional. The part before the
?is the ifpart (the condition). If the condition is true, the expression fol-
lowing the
?is executed. Otherwise, (else), the expression following the :
is executed.

Shortcut Subroutine Returns

Recall the min2secfunction from Chapter 8, Curious Constructions:

sub min2sec {
$minutes = shift(@_);
return($minutes * 60);
}

If we take advantage of the fact that, by default, shift()will shift the
@_array in a subroutine, we can write a very short subroutine indeed

sub min2sec{ return(shift() * 60); }

Here's part of the code from the hmssubroutine from Curious Constructions:

if (wantarray()) {
return($hr, $min, $sec);
} else {
return(sprintf("%02d:%02d:%02d",
$hr, $min, $sec));
}

we could rewrite this as

@list = ($hr, $min, $sec);
$scalar = sprintf("%02d:%02d:%02d", $hr, $min, $sec);
return (wantarray() ? @list : $scalar);

We use the ?:conditional operator to determine which value to return. Or,
we could get even more clever and not set any variables at all!

return (wantarray()
? ($hr, $min, $sec)
: sprintf("%02d:%02d:%02d", $hr, $min, $sec)
);

A Forever Loop

Here's a quick shortcut for setting up an infinite loop(a foreverloop), that
is, a loop that never exits:


IMAGE imgs/220.Oddments09.gif

for (;;) {
# body of the loop goes here
}

The initial expression, the condition, and the followup expression are all
null. While you might have thought this would make the loop exit imme-
diately (considering that the condition seems to be met already), instead,
this loop will repeat forever.
3Just remember to include a laststatement
somewhere within the loop or your program will never exit!

Changing Defaults

Perl has many defaults, and you may not like all of them, or the current
default may be inappropriate for your program and data. So, you may be
pleased to learn that many of these defaults can be changed. We suggest
that you exercise caution, however; some of the defaults are there for what
Perl's authors considered good reasons.

If you get tired of printing newlines all of the time, consider changing the
output record separator. By default, when you use
print(), Perl just prints
the list you specify, with no newline or other record separator attached.
However, if you prefer, you can specify that Perl include a newline (or just
about anything else) as the output record separator.

The $\variable contains the current output record separator. By setting
this to
\n, you can force Perl to print a \nafter every output line.

You can also use the -lswitch on the #!perlline (or on the perlcommand
line, under MPW) to enable automatic line-end processing. You may (option-
ally) specify an octal number following the
-l; if you do, it represents the
ASCII value of the character you want to use as the record separator. If you
specify the
-lswitch without an octal number, Perl will use the current
value of the input record separator; the default is newline.

As you may be guessing, you can also change the input record separator,
either by setting the global special variable,
$/, or by using the -0switch
on the
#!perlline. If you change the input record separator afteryou
enable line-end processing with
-l, the output record separator will be

IMAGE imgs/220.Oddments02.gif

3The forand whileconstructs regard an empty test clause as true; if, unless,and
until


IMAGE imgs/220.Oddments11.gif

unaffected (it acquired its value before you made the change to the input
separator).

You can change other defaults. See the chapters on Operatorsand Special
Variables
in Part IV, or the online help under Predefined Variables.

When you change a default, take care to save the original value, in case
you need to restore it later. For example, let's say that you are reading sev-
eral input files. One file is to be read in "paragraph" mode, while the rest
are to be read a line at a time.

Before reading in the "paragraph" mode file, save the current value of the
input record separator, then set it to null for paragraph mode.

$slash = $/;
$/ = '';

After you have processed the file, you can restore the previous value of the
input record separator

$/ = $slash;

Quoting

We've said many times that in Perl, There Is More Than One Way To Do It.
This applies as much to quoting techniques as to anything else. Perl pro-
vides the customary (single, double, and back-) quotes we've mentioned
above, and there are also quite a few alternative quoting choices.

In Perl, the difference between quotes and operators is somewhat fuzzy (in
fact, we've included quoting in our Operatorschapter). The parentheses in a
list and the slashes in a pattern match or substitution are, in fact, forms of
quoting. (That is, they affect the interpretation of the enclosed characters).

Perl also provides a set of operators (with names starting with the letter
q)that replace many of the customary quote characters:

Customary

"Generic"

Description

'single quotes'

q/.../

literal; no interpolation

"double quotes"

qq/.../

literal; variable interpolation

`backquotes`

qx/.../

command; variable interpolation

(list)

qw/.../

word list; no interpolation


IMAGE imgs/220.Oddments12.gif

Use the generic quoting operators if you need to enclose quote characters in
the quoted string and prefer not to use backslash ("escape") characters:

$message =q/Don't type that!/;
$error= qq/The file "myfile" could not be found./;
@days= qw/Monday Wednesday Friday/;

Wait, There's More!

If this doesn't seem flexible enough, the delimiter can be (just about) any-
thing;
4The slash, /, is only a suggestion. Again, the reason is to let you
avoid including so many backslashes (and simply because TMTOWTDI :-).
As mentioned in a previous chapter, the delimiter character can also be
changed in pattern matches, substitutions, and translations.

$the_date = q#Saturday, 3/21/98#;
$line =~ s,^,#,;

If the opening delimiter is a character that is considered part of a bracket-
ing pair, such as
(, {, [, or <, the closing delimiter is the matching charac-
ter; embedded delimiters must match in pairs.

$message = q[Don't type that!];

For the substitution and translation operators, which normally have three
delimiters, if you use a bracketing pair for the first two delimiters, the last
delimiter gets its own starting quote character as well. The two pairs of
bracketing delimiters do not need to be identical. For example:

$var =~ tr[a-z][A-Z];
$line =~ s(^)<#>;

Implicit Quoting

To prevent variable interpolation from acting upon the wrong string of
characters, you can wrap the "variable" part of the string within braces.

$var = 'butter';
print "${var}fly\n";

This causes Perl to separate the variable identifier string from any alpha-
numeric (or underscore!) characters that follow. Also, the identifier within

IMAGE imgs/220.Oddments02.gif

4Any non-alphanumeric, non-whitespace character, that is.


IMAGE imgs/220.Oddments14.gif

the braces is forced to be a string (as long as it is a single identifier, and not
an expression); thus, the braces provide some implicit quoting of their own.

$days{'Monday'}

could be written (though we don't recommend it!) as

$days{Monday}

Extended Regular Expressions

Regular expression patterns can become complex very rapidly. For example,
here is a code fragment that validates its input data by matching the input
fields to expected patterns:

($record !~ m/^g\d+\d\t\w+\.?\w+\t[\w\s]+[35]\'/) or
die "bad input data";

We could make this seem less complex by breaking up the pattern and com-
menting it, which we can do if we use the
/xmodifier for extended regular
expressions
. By including an
xafter the final pattern delimiter, we can put
any desired whitespace and comments into our pattern.

Note: If the pattern itself contains whitespace, be sure to specify it
with the appropriate escape character (e.g.,
\s), rather than with
literal blanks and tabs, because the blanks and tabs are interpreted as
"organizational" spacing rather than literal parts of the pattern!

($record !~ m{# check for a "clean" record
(^g\d+\d\t)# g1281914
(\w+\.?\w+\t)# mb45a01.r1
([\w\s]+[35]\')# prime value 5'.
}x) or die 'bad input data';

Note how each piece of the pattern can be commented with a sample of
what an actual valid field might contain. We've changed the delimiters to
braces because the pattern looks like a block of code (although it's not what
Perl thinks of as a block).

Here's an example we have adapted from Chapter 2 of Programming Perl,
second edition. In this example, the authors used comments to explain the
higher level algorithm used (in this case, for finding duplicate words in
paragraphs):


IMAGE imgs/220.Oddments15.gif

$/ = '';# paragraph mode
while (<>) {
while (m{
\b# start at a word boundary
(\w\S+)# find a wordish chunk
(
\s+# separated by some whitespace
\1# and that chunk again
)# repeat ad lib
\b# until another word boundary
}xig
)
{
print "dup word '$1' at paragraph $..\n";
}
}

Again, you should note the difference between the {}delimiters in the pat-
tern and the
{}that surround the code block of the whilestatements. Simi-
larly, you should recognize the difference between the required parentheses
around the
whileconditions and the parentheses used to group parts of the
pattern space.

As explained above, the construct

$/ = '';

is used to put Perl into "paragraph mode". The special global variable, $/,
defines the input record separator. By setting it to the null string,
'', you
cause Perl's input operator,
<>, to read in paragraphs (terminated by an
empty line) rather than individual lines (terminated by a newline).

A Word Of Warning

With all the idioms, shortcuts, and multiple ways to do things, Perl gives
you plenty of opportunity to get into trouble. Don't despair, though; Perl
also tries to reduce your chances of doing something you might regret.

If you include the -wswitch on the #!perlline (or the MPW command
line), or check
Compiler Warningsitem in the Scriptmenu, Perl will warn
you about certain risky kinds of behavior. These include variables that are
used before they are set, identifiers mentioned only once (possible misspel-
lings), redefined subroutines, attempts to write to a filehandle that was


IMAGE imgs/220.Oddments16.gif

opened as read-only, and many other possibilities for error. For a complete
list, see the online help for
Diagnostic Messagesunder Troubleshooting.

If you plan to use the -wswitch, you should take care when writing certain
statements, to avoid warnings. For example, when using the
<>loop to read
in your input data, check explicitly to ensure that the value you read is not
undefined:

if (defined(<IN>)) {
# code goes here...
}

Also, be sure to initialize (write to) your variables before you read from
them. Perl guarantees that all variables which have not been specifically
initialized to a value will have an initial value of null, but you will get a
warning if you take advantage of this and attempt to use a variable which
has not previously been set.

Many authors recommend that you use -wwhile you are learning Perl, and
we have tried to ensure that all of the examples in this book will run cor-
rectly with Compiler Warnings turned on. After you have been program-
ming in Perl for a while, it will be your decision whether to leave
-wset for
"production" code.

For even more assistance, you may want to use the strictpragmas which
restrict certain constructs deemed as unsafe. You can include the line

use strict;

at the top of your script, to apply all restrictions, or specify only a few. At
present, there are three areas of restrictions defined as part of the
strict
pragmas.

use strict 'vars';
use strict 'refs';
use strict 'subs';

These impose the following restrictions

  • 'vars'generates an error if you access a variable that wasn't fully
    qualified, imported from a module, or declared with
    my().
  • 'refs'generates an error if you use any symbolic references.

IMAGE imgs/220.Oddments17.gif
  • 'subs'generates an error if you try to use any bareword (unquoted)
    strings.

Grab Bag

This section covers a few idioms and odd corners that we didn't feel fit well
anywhere else in the chapter.

Using a List in Scalar Context

If you use an array variable in a scalar context, you get a scalar result. Speci-
fically, the number of elements in the array. This can be very convenient. For
example, the following code fragment evaluates the input argument array,
@ARGV, in a scalar context and stores the result (the number of input argu-
ments) in a scalar variable.

$argc = @ARGV;

oror||

The operators orand andwere created to give Perl programmers more
choice when writing statements such as

open(IN, 'myfile') or die 'Oops!';

In previous versions of Perl, only the ||and &&operators were available.
The
orand andoperators were created because they are more readable and
because they have very low precedence (allowing programmers to choose to
use fewer parentheses).

"Funny Characters" vs. Octal Notation

When you're writing a script, there may be times when you need to work
with characters that are not part of the "standard" set of alphanumeric or
punctuation characters, that is, the Mac OS "option" characters. For exam-
ple:

$line =~ tr/'/'/;
$line =~ tr/"/"/;

# replace curly single quote
# replace curly double quote

if ($line =~ /^*/) {

# match on bullet character

You may decide to specify these characters literally or to use their numeric
(octal, decimal, or hexadecimal) ASCII equivalents.The latter may be
more portable (some systems do not support 8-bit character sets!). Recall


IMAGE imgs/220.Oddments18.gif

from chapter 6 that, in pattern specification, a backslashed two- or three-
digit octal number matches the character with the specified value.

$line =~ tr/\325/'/;

# replace curly single quote

Similarly, a backslashed xfollowed by one or two hexadecimal digits
matches the character with that (hexadecimal) value.

$line =~ tr/\xD2/"/;

# replace curly double quote

Several font utilities are available to help you determine the ASCII
encoding of a given character, as well as various other useful information.
We have included one such,
FontView, on the MacPerl CD-ROM.5

1;

Perl modules are required to return a true status. To guarantee this, a com-
mon idiom is for the last line of the module to be written as:

1;# return(1);

Simple, but effective.

IMAGE imgs/220.Oddments02.gif

5FontViewis shareware. If you keep using it, please respect the shareware license.

Copyright © 1997-1998 by Prime Time Freeware. All Rights Reserved.