Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Minimal Perl For UNIX and Linux People 4 ppt
MIỄN PHÍ
Số trang
54
Kích thước
578.9 KB
Định dạng
PDF
Lượt xem
836

Minimal Perl For UNIX and Linux People 4 ppt

Nội dung xem thử

Mô tả chi tiết

106 CHAPTER 4 PERL AS A (BETTER) sed COMMAND

*************************************************************************** ! URGENT !

NEW CORPORATE DECREE ON TERMINOLOGY (CDT)

***************************************************************************

Headquarters (HQ) has just informed us that, as of today, all company

documents must henceforth use the word “trousers” instead of the (newly

politically incorrect) “pants.” All IT employees should immediately make this

Document Conversion Operation (DCO) their top priority (TP).

The Office of Corporate Decree Enforcement (OCDE) will be scanning all

computer files for compliance starting tomorrow, and for each document that’s

found to be in violation, the responsible parties will be forced to forfeit their Free

Cookie Privileges (FCPs) for one day.

So please comply with HQ’s CDT on the TP DCO, ASAP, before the OCDE

snarfs your FCPs.

***************************************************************************

What’s that thundering sound?

Oh, it’s just the sed users stampeding toward the snack room to load up on free

cookies while they still can. It’s prudent of them to do so, because most versions of

sed have historically lacked a provision for saving its output in the original file! In con￾sequence, some extra I/O wrangling is required, which should generally be scripted—

which means fumbling with an editor, removing the inevitable bugs from the script,

accidentally introducing new bugs, and so forth.

Meanwhile, back at your workstation, you, as a Perl aficionado, can Lazily com￾pose a test-case using the file in which you have wisely been accumulating pant￾related phrases, in preparation for this day:

$ cat pantaloony

WORLDWIDE PANTS

SPONGEBOB SQUAREPANTS

Now for the semi-magical Perl incantation that’s made to order for this pants-to￾trousers upgrade:

$ perl -i.bak -wpl -e 's/\bPANTS\b/TROUSERS/ig;' pantaloony

$ cat pantaloony

WORLDWIDE TROUSERS

SPONGEBOB SQUAREPANTS

It worked. Your Free Cookie Privileges might be safe after all!

Why did the changes appear in the file, rather than only on the screen? Because the

i invocation option, which enables in-place editing, causes each input file (in this case,

pantaloony) to become the destination for its own filtered output. That means it’s

critical when you use the n option not to forget to print, or else the input file will

end up empty! So I recommend the use of the p option in this kind of program, to

make absolutely sure the vital print gets executed automatically for each record.

EDITING FILES 107

But what’s that .bak after the i option all about? That’s the (arbitrary) filename

extension that will be applied to the backup copy of each input file. Believe me, that

safeguard comes in handy when you accidentally use the n option (rather than p)

and forget to print.

Note also the use of the i match modifier on the substitution (introduced in

table 3.6), which allows PANTS in the regex to match “pants” in the input (which

is another thing most seds can’t do11).

Now that you have a test case that works, all it takes is a slight alteration to the

original command to handle lots of files rather than a single one:

$ perl -i.bak -wpl -e 's/\bPANTS\b/TROUSERS/ig;' *

$ # all done!

Do you see the difference? It’s the use of “*”, the filename-generation metacharacter,

instead of the specific filename pantaloony. This change causes all (non-hidden)

files in the current directory to be presented as arguments to the command.

Mission accomplished! Too bad the snack room is out of cookies right now, but

don’t despair, you’ll be enjoying cookies for the rest of the week—at least, the ones

you don’t sell to the newly snack-deprived sed users at exorbitant prices.12

Before we leave this topic, I should point out that there aren’t many IT shops

whose primary business activities center around the PC-ification of corporate text

files. At least, not yet. Here’s a more representative example of the kind of mass edit￾ing activity that’s happening all over the world on a regular basis:

$ cd HTML # 1,362 files here!

$ perl -i.bak -wpl -e 's/pomalus\.com/potamus.com/g;' *.html

$ # all done!

It’s certainly a lot easier to let Perl search through all the web server’s *.html files to

change the old domain name to the new one, than it is to figure out which files need

changing and edit each of them by hand.

Even so, this command isn’t as easy as it could be, so you'll learn next how to

write a generic file-editing script in Perl.

4.7.2 Editing with scripts

It’s tedious to remember and retype commands frequently—even if they’re one￾liners—so soon you’ll see a scriptified version of a generic file-changing program.

But first, let’s look at some sample runs so you can appreciate the program’s user

interface, which lets you specify the search string and its replacement with a conve￾nient -old='old' and -new='new' syntax:

11 The exception is, of course, GNU sed, which has appropriated several useful features from Perl in re￾cent years.

12 This rosy scenario assumes you remembered to delete the *.bak files after confirming that they were

no longer needed and before the OCDE could spot any “pants” within them!

108 CHAPTER 4 PERL AS A (BETTER) sed COMMAND

$ change_file -old='\bALE\b' -new='LONDON-STYLE ALE' items

$ change_file -old='\bHEMP\b' -new='TUFF FIBER' items

You can’t see the results, because they went back into the items file. Note the use of

the \b metacharacters in the old strings to require word boundaries at the appropri￾ate points in the input. This prevents undesirable results, such as changing “WHITER

SHADE OF PALE” into “WHITER SHADE OF PLONDON-STYLE ALE”.

The change_file script is very simple:

#! /usr/bin/perl -s -i.bak -wpl

# Usage: change_file -old='old' -new='new' [f1 f2 ...]

s/$old/$new/g;

The s option on the shebang line requests the automatic switch processing that handles

the command-line specifications of the old and new strings and loads the associated

$old and $new variables with their contents. The omission of the our declarations

for those variables (as detailed in table 2.5) marks both switches as mandatory.

In part 2 you’ll see more elaborate scripts of this type, which provide the addi￾tional benefits of allowing case insensitivity, paragraph mode, and in-place editing to

be controlled through command line switches.

Next, we’ll examine a script that would make a handy addition to any program￾mer’s toolkit.

The insert_contact_info script

Scripts written on the job that serve a useful purpose tend to become popular, which

means somewhere down the line somebody will have an idea for a useful extension, or

find a bug. Accordingly, to facilitate contact between users and authors, it’s considered

a good practice for each script to provide its author’s contact information.

Willy has written a program that inserts this information into scripts that don’t

already have it, so let’s watch as he demonstrates its usage:

$ cd ~/bin # go to personal bin directory

$ insert_contact_info -author='Willy Nilly, [email protected]' change_file

$ cat change_file # 2nd line just added by above command

#! /usr/bin/perl –s -i.bak -wpl

# Author: Willy Nilly, [email protected]

# Usage: change_file -old='old' -new='new' [f1 f2...]

s/$old/$new/g;

For added user friendliness, Willy has arranged for the script to generate a helpful

“Usage” message when it’s invoked without the required -author switch:

$ insert_contact_info some_script

Usage: insert_contact_info -author='Author info' f1 [f2...]

EDITING FILES 109

The script tests the $author variable for emptiness in a BEGIN block, rather than in

the body of the program, so that improper invocation can be detected before input

processing (via the implicit loop) begins:

#! /usr/bin/perl -s -i.bak -wpl

# Inserts contact info for script author after shebang line

BEGIN {

$author or

warn "Usage: $0 -author='Author info' f1 [f2 ...]\n" and

exit 255;

}

# Append contact-info line to shebang line

$. == 1 and

s|^#!.*/bin/.+$|$&\n# Author: $author|g;

Willy made the substitution conditional on the current line being the first and hav￾ing a shebang sequence, because he doesn’t want to modify files that aren’t scripts. If

that test yields a True result, a substitution operator is attempted on the line.

Because the pathname he’s searching for (/bin/) contains slashes, using the custom￾ary slash also as the field-delimiter would require those interior slashes to be back￾slashed. So, Willy wisely chose to avoid that complication by using the vertical bar as

the delimiter instead.

The regex looks for the shebang sequence (#!) at the beginning of the line, fol￾lowed by the longest sequence of anything (.*; see table 3.10) leading up to /bin/.

Willy wrote it that way because on most systems, whitespace is optional after the “!”

character, and all command interpreters reside in a bin directory. This regex will

match a variety of paths—including the commonplace /bin/, /local/bin/, and

/usr/local/bin/—as desired.

After matching /bin/ (and whatever’s before it), the regex grabs the longest

sequence of something (.+; see table 3.10) leading up to the line’s end ($). The “+”

quantifier is used here rather than the earlier “*” because there must be at least one

additional character after /bin/ to represent the filename of the interpreter.

If the entire first line of the script has been successfully matched by the regex,

it’s replaced by itself (through use of $&; see table 3.4) followed by a newline and

then a comment incorporating the contents of the $author switch variable. The

result is that the author’s information is inserted on a new line after the script’s she￾bang line.

Apart from performing the substitution properly, it’s also important that all the

lines of the original file are sent out to the new version, whether modified or not.

Willy handles this chore by using the p option to automate that process. He also uses

the -i.bak option cluster to ensure that the original version is saved in a file having

a .bak extension, as a precautionary measure.

We’ll look next at a way to make regexes more readable.

110 CHAPTER 4 PERL AS A (BETTER) sed COMMAND

Adding commentary to a regex

The insert_contact_info script is a valuable tool, and it shows one way to make

practical use of Perl’s editing capabilities. But I wouldn’t blame you for thinking that

the regex we just scrutinized was a bit hard on the eyes! Fortunately, Perl programmers

can alleviate this condition through judicious use of the x modifier (see table 4.3),

which allows arbitrary whitespace and comments to be included in the search field to

make the regex more understandable.

As a case in point, insert_contact_info2 rephrases the substitution operator

of the original version, illustrating the benefits of embedding commentary within the

regex field. Because the substitution operator is spread over several lines in this new

version, the delimiters are shown in bold, to help you spot them:

# Rewrite shebang line to append contact info

$. == 1 and

# The expanded version of this substitution operator follows below:

# s|^#!.*/bin/.+$|$&\n# Author: $author|g;

s|

^ # start match at beginning of line

\#! # shebang characters

.* # optionally followed by anything; including nothing

/bin/ # followed by a component of the interpreter path

.+ # followed by the rest of the interpreter path

$ # up to the end of line

|$&\n\# Author: $author|gx; # replace by match, \n, author stuff

Note that the “#” in the “#!” shebang sequence needs to be backslashed to remove its

x-modifier-endowed meaning as a comment character, as does the “#” symbol before

the word “Author” in the replacement field.

It’s important to understand that the x modifier relaxes the syntax rules for the

search field only of the substitution operator—the one where the regex resides. That

means you must take care to avoid the mistake of inserting whitespace or comments

in the replacement field in an effort to enhance its readability, because they’ll be taken

as literal characters there.13

Before we leave the insert_contact_info script, we should consider

whether sed could do its job. The answer is yes, but sed would need help from

the Shell, and the result wouldn’t be as straightforward as the Perl solution. Why?

Because you’d have to work around sed’s lack of the following features: the “+”

metacharacter, automatic switch processing, in-place editing, and the enhanced

regex format.

As useful as the –i.bak option is, there’s a human foible that can undermine the

integrity of its backup files. You’ll learn how to compensate for it next.

13 An exception is discussed in section 4.9—when the e modifier is used, the replacement field contains

Perl statements, whose readability can be enhanced through arbitrary use of whitespace.

Tải ngay đi em, còn do dự, trời tối mất!