Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Minimal Perl For UNIX and Linux People 4 ppt
Nội dung xem thử
Mô tả chi tiết
106 CHAPTER 4 PERL AS A (BETTER) sed COMMAND
*************************************************************************** ! URGENT !
NEW CORPORATE DECREE ON TERMINOLOGY (CDT)
***************************************************************************
Headquarters (HQ) has just informed us that, as of today, all company
documents must henceforth use the word “trousers” instead of the (newly
politically incorrect) “pants.” All IT employees should immediately make this
Document Conversion Operation (DCO) their top priority (TP).
The Office of Corporate Decree Enforcement (OCDE) will be scanning all
computer files for compliance starting tomorrow, and for each document that’s
found to be in violation, the responsible parties will be forced to forfeit their Free
Cookie Privileges (FCPs) for one day.
So please comply with HQ’s CDT on the TP DCO, ASAP, before the OCDE
snarfs your FCPs.
***************************************************************************
What’s that thundering sound?
Oh, it’s just the sed users stampeding toward the snack room to load up on free
cookies while they still can. It’s prudent of them to do so, because most versions of
sed have historically lacked a provision for saving its output in the original file! In consequence, some extra I/O wrangling is required, which should generally be scripted—
which means fumbling with an editor, removing the inevitable bugs from the script,
accidentally introducing new bugs, and so forth.
Meanwhile, back at your workstation, you, as a Perl aficionado, can Lazily compose a test-case using the file in which you have wisely been accumulating pantrelated phrases, in preparation for this day:
$ cat pantaloony
WORLDWIDE PANTS
SPONGEBOB SQUAREPANTS
Now for the semi-magical Perl incantation that’s made to order for this pants-totrousers upgrade:
$ perl -i.bak -wpl -e 's/\bPANTS\b/TROUSERS/ig;' pantaloony
$ cat pantaloony
WORLDWIDE TROUSERS
SPONGEBOB SQUAREPANTS
It worked. Your Free Cookie Privileges might be safe after all!
Why did the changes appear in the file, rather than only on the screen? Because the
i invocation option, which enables in-place editing, causes each input file (in this case,
pantaloony) to become the destination for its own filtered output. That means it’s
critical when you use the n option not to forget to print, or else the input file will
end up empty! So I recommend the use of the p option in this kind of program, to
make absolutely sure the vital print gets executed automatically for each record.
EDITING FILES 107
But what’s that .bak after the i option all about? That’s the (arbitrary) filename
extension that will be applied to the backup copy of each input file. Believe me, that
safeguard comes in handy when you accidentally use the n option (rather than p)
and forget to print.
Note also the use of the i match modifier on the substitution (introduced in
table 3.6), which allows PANTS in the regex to match “pants” in the input (which
is another thing most seds can’t do11).
Now that you have a test case that works, all it takes is a slight alteration to the
original command to handle lots of files rather than a single one:
$ perl -i.bak -wpl -e 's/\bPANTS\b/TROUSERS/ig;' *
$ # all done!
Do you see the difference? It’s the use of “*”, the filename-generation metacharacter,
instead of the specific filename pantaloony. This change causes all (non-hidden)
files in the current directory to be presented as arguments to the command.
Mission accomplished! Too bad the snack room is out of cookies right now, but
don’t despair, you’ll be enjoying cookies for the rest of the week—at least, the ones
you don’t sell to the newly snack-deprived sed users at exorbitant prices.12
Before we leave this topic, I should point out that there aren’t many IT shops
whose primary business activities center around the PC-ification of corporate text
files. At least, not yet. Here’s a more representative example of the kind of mass editing activity that’s happening all over the world on a regular basis:
$ cd HTML # 1,362 files here!
$ perl -i.bak -wpl -e 's/pomalus\.com/potamus.com/g;' *.html
$ # all done!
It’s certainly a lot easier to let Perl search through all the web server’s *.html files to
change the old domain name to the new one, than it is to figure out which files need
changing and edit each of them by hand.
Even so, this command isn’t as easy as it could be, so you'll learn next how to
write a generic file-editing script in Perl.
4.7.2 Editing with scripts
It’s tedious to remember and retype commands frequently—even if they’re oneliners—so soon you’ll see a scriptified version of a generic file-changing program.
But first, let’s look at some sample runs so you can appreciate the program’s user
interface, which lets you specify the search string and its replacement with a convenient -old='old' and -new='new' syntax:
11 The exception is, of course, GNU sed, which has appropriated several useful features from Perl in recent years.
12 This rosy scenario assumes you remembered to delete the *.bak files after confirming that they were
no longer needed and before the OCDE could spot any “pants” within them!
108 CHAPTER 4 PERL AS A (BETTER) sed COMMAND
$ change_file -old='\bALE\b' -new='LONDON-STYLE ALE' items
$ change_file -old='\bHEMP\b' -new='TUFF FIBER' items
You can’t see the results, because they went back into the items file. Note the use of
the \b metacharacters in the old strings to require word boundaries at the appropriate points in the input. This prevents undesirable results, such as changing “WHITER
SHADE OF PALE” into “WHITER SHADE OF PLONDON-STYLE ALE”.
The change_file script is very simple:
#! /usr/bin/perl -s -i.bak -wpl
# Usage: change_file -old='old' -new='new' [f1 f2 ...]
s/$old/$new/g;
The s option on the shebang line requests the automatic switch processing that handles
the command-line specifications of the old and new strings and loads the associated
$old and $new variables with their contents. The omission of the our declarations
for those variables (as detailed in table 2.5) marks both switches as mandatory.
In part 2 you’ll see more elaborate scripts of this type, which provide the additional benefits of allowing case insensitivity, paragraph mode, and in-place editing to
be controlled through command line switches.
Next, we’ll examine a script that would make a handy addition to any programmer’s toolkit.
The insert_contact_info script
Scripts written on the job that serve a useful purpose tend to become popular, which
means somewhere down the line somebody will have an idea for a useful extension, or
find a bug. Accordingly, to facilitate contact between users and authors, it’s considered
a good practice for each script to provide its author’s contact information.
Willy has written a program that inserts this information into scripts that don’t
already have it, so let’s watch as he demonstrates its usage:
$ cd ~/bin # go to personal bin directory
$ insert_contact_info -author='Willy Nilly, [email protected]' change_file
$ cat change_file # 2nd line just added by above command
#! /usr/bin/perl –s -i.bak -wpl
# Author: Willy Nilly, [email protected]
# Usage: change_file -old='old' -new='new' [f1 f2...]
s/$old/$new/g;
For added user friendliness, Willy has arranged for the script to generate a helpful
“Usage” message when it’s invoked without the required -author switch:
$ insert_contact_info some_script
Usage: insert_contact_info -author='Author info' f1 [f2...]
EDITING FILES 109
The script tests the $author variable for emptiness in a BEGIN block, rather than in
the body of the program, so that improper invocation can be detected before input
processing (via the implicit loop) begins:
#! /usr/bin/perl -s -i.bak -wpl
# Inserts contact info for script author after shebang line
BEGIN {
$author or
warn "Usage: $0 -author='Author info' f1 [f2 ...]\n" and
exit 255;
}
# Append contact-info line to shebang line
$. == 1 and
s|^#!.*/bin/.+$|$&\n# Author: $author|g;
Willy made the substitution conditional on the current line being the first and having a shebang sequence, because he doesn’t want to modify files that aren’t scripts. If
that test yields a True result, a substitution operator is attempted on the line.
Because the pathname he’s searching for (/bin/) contains slashes, using the customary slash also as the field-delimiter would require those interior slashes to be backslashed. So, Willy wisely chose to avoid that complication by using the vertical bar as
the delimiter instead.
The regex looks for the shebang sequence (#!) at the beginning of the line, followed by the longest sequence of anything (.*; see table 3.10) leading up to /bin/.
Willy wrote it that way because on most systems, whitespace is optional after the “!”
character, and all command interpreters reside in a bin directory. This regex will
match a variety of paths—including the commonplace /bin/, /local/bin/, and
/usr/local/bin/—as desired.
After matching /bin/ (and whatever’s before it), the regex grabs the longest
sequence of something (.+; see table 3.10) leading up to the line’s end ($). The “+”
quantifier is used here rather than the earlier “*” because there must be at least one
additional character after /bin/ to represent the filename of the interpreter.
If the entire first line of the script has been successfully matched by the regex,
it’s replaced by itself (through use of $&; see table 3.4) followed by a newline and
then a comment incorporating the contents of the $author switch variable. The
result is that the author’s information is inserted on a new line after the script’s shebang line.
Apart from performing the substitution properly, it’s also important that all the
lines of the original file are sent out to the new version, whether modified or not.
Willy handles this chore by using the p option to automate that process. He also uses
the -i.bak option cluster to ensure that the original version is saved in a file having
a .bak extension, as a precautionary measure.
We’ll look next at a way to make regexes more readable.
110 CHAPTER 4 PERL AS A (BETTER) sed COMMAND
Adding commentary to a regex
The insert_contact_info script is a valuable tool, and it shows one way to make
practical use of Perl’s editing capabilities. But I wouldn’t blame you for thinking that
the regex we just scrutinized was a bit hard on the eyes! Fortunately, Perl programmers
can alleviate this condition through judicious use of the x modifier (see table 4.3),
which allows arbitrary whitespace and comments to be included in the search field to
make the regex more understandable.
As a case in point, insert_contact_info2 rephrases the substitution operator
of the original version, illustrating the benefits of embedding commentary within the
regex field. Because the substitution operator is spread over several lines in this new
version, the delimiters are shown in bold, to help you spot them:
# Rewrite shebang line to append contact info
$. == 1 and
# The expanded version of this substitution operator follows below:
# s|^#!.*/bin/.+$|$&\n# Author: $author|g;
s|
^ # start match at beginning of line
\#! # shebang characters
.* # optionally followed by anything; including nothing
/bin/ # followed by a component of the interpreter path
.+ # followed by the rest of the interpreter path
$ # up to the end of line
|$&\n\# Author: $author|gx; # replace by match, \n, author stuff
Note that the “#” in the “#!” shebang sequence needs to be backslashed to remove its
x-modifier-endowed meaning as a comment character, as does the “#” symbol before
the word “Author” in the replacement field.
It’s important to understand that the x modifier relaxes the syntax rules for the
search field only of the substitution operator—the one where the regex resides. That
means you must take care to avoid the mistake of inserting whitespace or comments
in the replacement field in an effort to enhance its readability, because they’ll be taken
as literal characters there.13
Before we leave the insert_contact_info script, we should consider
whether sed could do its job. The answer is yes, but sed would need help from
the Shell, and the result wouldn’t be as straightforward as the Perl solution. Why?
Because you’d have to work around sed’s lack of the following features: the “+”
metacharacter, automatic switch processing, in-place editing, and the enhanced
regex format.
As useful as the –i.bak option is, there’s a human foible that can undermine the
integrity of its backup files. You’ll learn how to compensate for it next.
13 An exception is discussed in section 4.9—when the e modifier is used, the replacement field contains
Perl statements, whose readability can be enhanced through arbitrary use of whitespace.