Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Minimal Perl For UNIX and Linux People 6 pptx
Nội dung xem thử
Mô tả chi tiết
210 CHAPTER 7 BUILT-IN FUNCTIONS
The double quotes around the argument are processed first, forming a string from the
space-separated list elements; then, the list context provided by the function is applied
to that result. But a quoted string is a scalar, and list context doesn’t affect scalars, so
the existing string is left unmodified as print’s argument.
The join function listed in table 7.1 provides the same service as the combination
of ‘$"’ and double quotes and is provided as a convenience for those who prefer to
pass arguments to a function rather than to set a variable and double quote a string.
We’ll discuss this function later in this chapter.
Now you understand the basic principles of evaluation context and the tools
used for converting data types. With this background in mind, we’ll examine some
important Perl functions that deal with scalar data next, such as split. Then, in
section 7.3 we’ll discuss functions that deal with list data, such as join.
7.2 PROGRAMMING WITH FUNCTIONS THAT
GENERATE OR PROCESS SCALARS
Table 7.2 describes some especially useful built-in functions that generate or process
scalar values, which weren’t already discussed in part 1.
Table 7.2 Useful Perl functions for scalars, and their nearest relatives in Unix
Perl built-in
function Unix relative(s) Purpose Effects
split The cut command;
AWK’s split function;
the Shell’s IFS variable
Converting
scalars to lists
Takes a string and optionally a set of
delimiters, and extracts and returns
the delimited substrings.The default
delimiter is any sequence of whitespace
characters.
localtime The date command Accessing
current date
and time
Returns a string that resembles the
output of the Unix date command.
stat
lstat
The ls –lL command
The ls -l command
Accessing file
information
Provides information about the file
referred to by stat’s argument, or the
symbolic link presented as lstat’s
argument.
chomp N/A Removing
newlines in
data
Removes trailing input record
separators from strings, using newline
as the default. (With Unix utilities and
Shell built-in commands, newlines are
always removed automatically.)
rand The Shell’s RANDOM
variable; AWK’s rand
function
Generating
random
numbers
Generates random numbers that can be
used for decision-making in simulations,
games, etc.
PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 211
The counterparts to those functions found in Unix or the Shell are also indicated in
the table. These provide related services, but in ways that are generally not as convenient or useful as their Perl alternatives.6
For example, although split looks at A<TAB><TAB>B as you do, seeing the
fields A and B, the Unix cut command sees three fields there by default—including
an imaginary empty one between the tabs! As you might guess, this discrepancy has
caused many people to have difficulty using cut properly. As another example, the
default behavior of Perl’s split is to return a list of whitespace-separated words, but
obtaining that result by manipulating the Shell’s IFS variable requires advanced
skills—and courage.7
We’ll now turn to detailed consideration of each of the functions listed in table 7.2
and demonstrate how they can be effectively used in typical applications.
7.2.1 Using split
split is typically used to extract a list of fields from a string, using the coding techniques shown in table 7.3.
split’s optional first argument is a matching operator whose regex specifies the
delimiter(s) to be used in extracting fields from the string. The optional second argument overrides the default of $_ by specifying a different string to be split.
6 Perl has the advantage of being a modern descendant of the ancient Unix tradition, so Larry was able
to address and correct many of its deficiencies while creating Perl.
7 Why courage? Because if the programmer neglects to reinstate the IFS variable’s original contents after
modifying it, a mild-mannered Shell script can easily mutate into its evil twin from another dimension
and wreak all kinds of havoc.
Table 7.3 The split function
Typical invocation formats a
@fields=split;
@fields=split /RE/;
@fields=split /RE/, string;
Example Explanation
@fields=split; Splits $_ into whitespace-delimited “words,” and
assigns the resulting list to @fields (as do the
examples that follow).
@fields=split /,/; Splits $_ using individual commas as delimiters.
@fields=split /\s+/, $line; Splits $line using whitespace sequences as delimiters.
@fields=split /[^\040\t_]+/,
$line;
Splits $line using sequences of one or more non-
“space, tab, or underscore characters” as delimiters.
a. Matching modifiers (e.g., i for case insensitivity) can be appended after the closing delimiter of the matching
operator, and a custom regex delimiter can be specified after m (e.g., split m:/:;).
212 CHAPTER 7 BUILT-IN FUNCTIONS
In the simplest case, shown in the table’s first invocation format, split can be
invoked without any arguments to split $_ using whitespace delimiters. However,
when input records need to be split into fields, it’s more convenient to use the n
and a invocation options to automatically load fields into @F, as discussed in part 1.
For this reason, split is primarily used in Minimal Perl for secondary splitting. For
instance, input lines could first be split into fields using whitespace delimiters via
the -wnla standard option cluster, and then one of those fields could be split further using another delimiter to extract its subfields.
Here’s a demonstration of a script that uses this technique to show the time in a
custom format:
$ mytime # reformats date-style output
The time is 7:32 PM.
$ cat mytime
#! /bin/sh
# Sample output from date: Thu Apr 6 16:12:05 PST 2006
# Index numbers for @F: 0 1 2 3 4 5
date |
perl -wnla -e '$hms=$F[3]; # copy time field into named variable
($hour, $minute)=split /:/, $hms; # no $seconds
$am_pm='AM';
$hour > 12 and $am_pm='PM' and $hour=$hour-12;
print "The time is $hour:$minute $am_pm.";
'
mytime is implemented as a Shell script, to simplify the delivery of date’s output
as input to the Perl command.8 Perl’s automatic field splitting option is used (via
–wnla) to load date’s output into the elements of @F, and then the array element9
containing the hour:minutes:seconds field ($F[3]) is copied into the $hms variable (for readability). $hms is then split on the “:” delimiter, and its hour and
minute fields are assigned to variables. What about the seconds? The programmer
didn’t consider them to be of interest, so despite the fact that split returns a
three-element list here, the third subfield’s value isn’t used in the program. Next,
the script adds an AM/PM field, and prints the reworked date output in the custom format.
In addition to splitting-out subfields from time fields, you can use split in many
other applications. For example, you could carve up IP addresses into their individual
8 An alternative technique based on command interpolation (like the Shell's command substitution) is
shown in section 8.5.
9 The expression $F[3] uses array indexing (introduced in table 5.9) to access the fourth field. The
named-variable approach could be used instead, with some additional typing:
(undef, undef, undef, $hms)=@F;
PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 213
numeric components using “.” as the delimiter, but remember that you need to backslash that character to make it literal:
@IPa_parts=split /\./, $IPa; # 216.239.57.99 --> 216, 239, 57, 99
You can also use split to extract schemes (such as http) and domains from URLs,
using “://” as the delimiter:
$URL='http://a.b.org';
($scheme, $domain)=split m|://|, $URL; # 'http', 'a.b.org'
Notice the use of the m syntax of the matching operator to specify a non-slash delimiter, to avoid conflicts with the slashes in the regex field.
Tips on using split
One common mistake with split is forgetting the proper order of the arguments:
@words=split $data, /:/; # string, RE: WRONG!
@words=split /:/, $data; # RE, string: Right!
Another typical mistake is the incorrect specification of split’s field delimiters, usually by accidentally describing a particular sequence of delimiters rather than any
sequence of them.
For example, this invocation of split says that each occurrence of the indicated
character sequence is a single delimiter:
$_='Hoboken::NJ,:Exit 14c';
@fields=split /,:/, $data; # Extracts two fields
The result is that “Hoboken::NJ” and “Exit 14c” are assigned to the array.
This alternative says that any sequence of one or more of the specified characters
counts as a single delimiter, which results in “NJ” being extracted as a separate field:
$_='Hoboken::NJ,:Exit 14c';
@fields=split /[,:]+/, $data; # Extracts three fields
This second type of delimiter specification is more commonly used than the first
kind, but of course what’s correct in a specific case depends on the format of the data
being examined.
Although split is a valuable tool, it’s not indispensable. That’s because its functionality can generally be duplicated through use of a matching operator in list context, which can also extract substrings from a string. But there’s an important
difference—with split, you define the data delimiters in the regex, whereas with a
matching operator, you define the delimited data there.
How do you decide whether to use split or the matching operator when parsing
fields? It’s simple—split is preferred for cases where it’s easier to describe the delimiters than to describe the delimited data, whereas a matching operator using capturing
parentheses (see table 3.8) is preferred for the cases where it’s easier to describe the data
than the delimiters..