Minimal Perl For UNIX and Linux People 6 pptx

210 CHAPTER 7 BUILT-IN FUNCTIONS

The double quotes around the argument are processed first, forming a string from the

space-separated list elements; then, the list context provided by the function is applied

to that result. But a quoted string is a scalar, and list context doesn’t affect scalars, so

the existing string is left unmodified as print’s argument.

The join function listed in table 7.1 provides the same service as the combination

of ‘$"’ and double quotes and is provided as a convenience for those who prefer to

pass arguments to a function rather than to set a variable and double quote a string.

We’ll discuss this function later in this chapter.

Now you understand the basic principles of evaluation context and the tools

used for converting data types. With this background in mind, we’ll examine some

important Perl functions that deal with scalar data next, such as split. Then, in

section 7.3 we’ll discuss functions that deal with list data, such as join.

7.2 PROGRAMMING WITH FUNCTIONS THAT

GENERATE OR PROCESS SCALARS

Table 7.2 describes some especially useful built-in functions that generate or process

scalar values, which weren’t already discussed in part 1.

Table 7.2 Useful Perl functions for scalars, and their nearest relatives in Unix

Perl built-in

function Unix relative(s) Purpose Effects

split The cut command;

AWK’s split function;

the Shell’s IFS variable

Converting

scalars to lists

Takes a string and optionally a set of

delimiters, and extracts and returns

the delimited substrings.The default

delimiter is any sequence of whitespace

characters.

localtime The date command Accessing

current date

and time

Returns a string that resembles the

output of the Unix date command.

stat

lstat

The ls –lL command

The ls -l command

Accessing file

information

Provides information about the file

referred to by stat’s argument, or the

symbolic link presented as lstat’s

argument.

chomp N/A Removing

newlines in

data

Removes trailing input record

separators from strings, using newline

as the default. (With Unix utilities and

Shell built-in commands, newlines are

always removed automatically.)

rand The Shell’s RANDOM

variable; AWK’s rand

function

Generating

random

numbers

Generates random numbers that can be

used for decision-making in simulations,

games, etc.

PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 211

The counterparts to those functions found in Unix or the Shell are also indicated in

the table. These provide related services, but in ways that are generally not as convenient or useful as their Perl alternatives.6

For example, although split looks at A<TAB><TAB>B as you do, seeing the

fields A and B, the Unix cut command sees three fields there by default—including

an imaginary empty one between the tabs! As you might guess, this discrepancy has

caused many people to have difficulty using cut properly. As another example, the

default behavior of Perl’s split is to return a list of whitespace-separated words, but

obtaining that result by manipulating the Shell’s IFS variable requires advanced

skills—and courage.7

We’ll now turn to detailed consideration of each of the functions listed in table 7.2

and demonstrate how they can be effectively used in typical applications.

7.2.1 Using split

split is typically used to extract a list of fields from a string, using the coding techniques shown in table 7.3.

split’s optional first argument is a matching operator whose regex specifies the

delimiter(s) to be used in extracting fields from the string. The optional second argument overrides the default of $_ by specifying a different string to be split.

6 Perl has the advantage of being a modern descendant of the ancient Unix tradition, so Larry was able

to address and correct many of its deficiencies while creating Perl.

7 Why courage? Because if the programmer neglects to reinstate the IFS variable’s original contents after

modifying it, a mild-mannered Shell script can easily mutate into its evil twin from another dimension

and wreak all kinds of havoc.

Table 7.3 The split function

Typical invocation formats a

@fields=split;

@fields=split /RE/;

@fields=split /RE/, string;

Example Explanation

@fields=split; Splits $_ into whitespace-delimited “words,” and

assigns the resulting list to @fields (as do the

examples that follow).

@fields=split /,/; Splits $_ using individual commas as delimiters.

@fields=split /\s+/, $line; Splits $line using whitespace sequences as delimiters.

@fields=split /[^\040\t_]+/,

$line;

Splits $line using sequences of one or more non-

“space, tab, or underscore characters” as delimiters.

a. Matching modifiers (e.g., i for case insensitivity) can be appended after the closing delimiter of the matching

operator, and a custom regex delimiter can be specified after m (e.g., split m:/:;).

212 CHAPTER 7 BUILT-IN FUNCTIONS

In the simplest case, shown in the table’s first invocation format, split can be

invoked without any arguments to split $_ using whitespace delimiters. However,

when input records need to be split into fields, it’s more convenient to use the n

and a invocation options to automatically load fields into @F, as discussed in part 1.

For this reason, split is primarily used in Minimal Perl for secondary splitting. For

instance, input lines could first be split into fields using whitespace delimiters via

the -wnla standard option cluster, and then one of those fields could be split further using another delimiter to extract its subfields.

Here’s a demonstration of a script that uses this technique to show the time in a

custom format:

$ mytime # reformats date-style output

The time is 7:32 PM.

$ cat mytime

#! /bin/sh

# Sample output from date: Thu Apr 6 16:12:05 PST 2006

# Index numbers for @F: 0 1 2 3 4 5

date |

perl -wnla -e '$hms=$F[3]; # copy time field into named variable

($hour, $minute)=split /:/, $hms; # no $seconds

$am_pm='AM';

$hour > 12 and $am_pm='PM' and $hour=$hour-12;

print "The time is $hour:$minute $am_pm.";

mytime is implemented as a Shell script, to simplify the delivery of date’s output

as input to the Perl command.8 Perl’s automatic field splitting option is used (via

–wnla) to load date’s output into the elements of @F, and then the array element9

containing the hour:minutes:seconds field ($F[3]) is copied into the $hms variable (for readability). $hms is then split on the “:” delimiter, and its hour and

minute fields are assigned to variables. What about the seconds? The programmer

didn’t consider them to be of interest, so despite the fact that split returns a

three-element list here, the third subfield’s value isn’t used in the program. Next,

the script adds an AM/PM field, and prints the reworked date output in the custom format.

In addition to splitting-out subfields from time fields, you can use split in many

other applications. For example, you could carve up IP addresses into their individual

8 An alternative technique based on command interpolation (like the Shell's command substitution) is

shown in section 8.5.

9 The expression $F[3] uses array indexing (introduced in table 5.9) to access the fourth field. The

named-variable approach could be used instead, with some additional typing:

(undef, undef, undef, $hms)=@F;

PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 213

numeric components using “.” as the delimiter, but remember that you need to backslash that character to make it literal:

@IPa_parts=split /\./, $IPa; # 216.239.57.99 --> 216, 239, 57, 99

You can also use split to extract schemes (such as http) and domains from URLs,

using “://” as the delimiter:

$URL='http://a.b.org';

($scheme, $domain)=split m|://|, $URL; # 'http', 'a.b.org'

Notice the use of the m syntax of the matching operator to specify a non-slash delimiter, to avoid conflicts with the slashes in the regex field.

Tips on using split

One common mistake with split is forgetting the proper order of the arguments:

@words=split $data, /:/; # string, RE: WRONG!

@words=split /:/, $data; # RE, string: Right!

Another typical mistake is the incorrect specification of split’s field delimiters, usually by accidentally describing a particular sequence of delimiters rather than any

sequence of them.

For example, this invocation of split says that each occurrence of the indicated

character sequence is a single delimiter:

$_='Hoboken::NJ,:Exit 14c';

@fields=split /,:/, $data; # Extracts two fields

The result is that “Hoboken::NJ” and “Exit 14c” are assigned to the array.

This alternative says that any sequence of one or more of the specified characters

counts as a single delimiter, which results in “NJ” being extracted as a separate field:

$_='Hoboken::NJ,:Exit 14c';

@fields=split /[,:]+/, $data; # Extracts three fields

This second type of delimiter specification is more commonly used than the first

kind, but of course what’s correct in a specific case depends on the format of the data

being examined.

Although split is a valuable tool, it’s not indispensable. That’s because its functionality can generally be duplicated through use of a matching operator in list context, which can also extract substrings from a string. But there’s an important

difference—with split, you define the data delimiters in the regex, whereas with a

matching operator, you define the delimited data there.

How do you decide whether to use split or the matching operator when parsing

fields? It’s simple—split is preferred for cases where it’s easier to describe the delimiters than to describe the delimited data, whereas a matching operator using capturing

parentheses (see table 3.8) is preferred for the cases where it’s easier to describe the data

than the delimiters..

Thư viện tri thức trực tuyến

Minimal Perl For UNIX and Linux People 6 pptx

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Minimal Perl For UNIX and Linux People 2 doc

Minimal Perl For UNIX and Linux People 8 docx

Minimal Perl For UNIX and Linux People 9 potx

Minimal Perl For UNIX and Linux People 3 pot

Minimal Perl For UNIX and Linux People 4 ppt

Minimal Perl For UNIX and Linux People 7 potx