Programming C# 4.0 phần 5 docx

while (current != null)

{

if (current.Value.AtImminentRiskOfDeath)

{

current = current.Next;

}

else

{

break;

}

if (current == null)

{

waitingPatients.AddLast(newPatient);

}

else

{

waitingPatients.AddBefore(current, newPatient);

}

This code adds the new patient after all those patients in the queue whose lives appear

to be at immediate risk, but ahead of all other patients—the patient is presumably either

quite unwell or a generous hospital benefactor. (Real triage is a little more complex, of

course, but you still insert items into the list in the same way, no matter how you go

about choosing the insertion point.)

Note the use of LinkedListNode<T>—this is how LinkedList<T> presents the queue’s

contents. It allows us not only to see the item in the queue, but also to navigate back

and forth through the queue with the Next and Previous properties.

Stacks

Whereas Queue<T> operates a FIFO order, Stack<T> operates a last in, first out (LIFO)

order. Looking at this from a queuing perspective, it seems like the height of

unfairness—latecomers get priority over those who arrived early. However, there are

some situations in which this topsy-turvy ordering can make sense.

A performance characteristic of most computers is that they tend to be able to work

faster with data they’ve processed recently than with data they’ve not touched lately.

CPUs have caches that provide faster access to data than a computer’s main memory

can support, and these caches typically operate a policy where recently used data is

more likely to stay in the cache than data that has not been touched recently.

If you’re writing a server-side application, you may consider throughput to be more

important than fairness—the total rate at which you process work may matter more

than how long any individual work item takes to complete. In this case, a LIFO order

may make the most sense—work items that were only just put into a queue are much

more likely to still live in the CPU’s cache than those that were queued up ages ago,

Stacks | 313

and so you’ll get better throughput during high loads if you process newly arrived items

first. Items that have sat in the queue for longer will just have to wait for a lull.

Like Queue<T>, Stack<T> offers a method to add an item, and one to remove it. It calls

these Push and Pop, respectively. They are very similar to the queue’s Enqueue and

Dequeue, except they both work off the same end of the list. (You could get the same

effect using a LinkedList, and always calling AddFirst and RemoveFirst.)

A stack could also be useful for managing navigation history. The Back button in a

browser works in LIFO order—the first page it shows you is the last one you visited.

(And if you want a Forward button, you could define a second stack—each time the

user goes Back, Push the current page onto the Forward stack. Then if the user clicks

Forward, Pop a page from the Forward stack, and Push the current page onto the Back

stack.)

Summary

The .NET Framework class library provides various useful collection classes. We saw

List<T> in an earlier chapter, which provides a simple resizable linear list of items.

Dictionaries store entries by associating them with keys, providing fast key-based

lookup. HashSet<T> and SortedSet<T> manage sets of unique items, with optional ordering. Queues, linked lists, and stacks each manage a queue of items, offering various

strategies for how the order of addition relates to the order in which items come out of

the queue.

314 | Chapter 9: Collection Classes

CHAPTER 10

Strings

Chapter 10 is all about strings. A bit late, you might think: we’ve had about nine chapters of string-based action already! Well, yes, you’d be right. That’s not terribly surprising, though: text is probably the single most important means an application has

of communicating with its users. That is especially true as we haven’t introduced any

graphical frameworks yet. I suppose we could have beeped the system speaker in Morse,

although even that can be considered a text-based operation.

Even with a graphical UI framework where we have pictures and buttons and graphs

and sounds, they almost always have textual labels, descriptions, comments, or tool

tips.

Users who have difficulty reading (perhaps because they have a low-vision condition)

may have that text transformed into sound by accessibility tools, but the application is

still processing text strings under the covers.

Even when we are dealing with integers or doubles internally within an algorithm, there

comes a time when we need to represent them to humans, and preferably in a way that

is meaningful to us. We usually do that (at least in part) by converting them into strings

of one form or another.

Strings are surprisingly complex and sophisticated entities, so we’re going to take some

time to explore their properties in this chapter.

First, we’ll look at what we’re really doing when we initialize a literal string. Then, we’ll

see a couple of techniques which let us convert from other types to a string representation and how we can control the formatting of that conversion.

Next, we’ll look at various different techniques we can use to process a string. This will

include composition, splitting, searching and replacing content, and what it means to

compare strings of various kinds.

Finally, we will look at how .NET represents strings internally, how that differs from

other representations in popular use in the world, and how we can convert between

those representations by using an Encoding.

315

What Is a String?

A string is an ordered sequence of characters:

We could consider this sentence to be a string.

We start with the first character, which is W. Then we continue on in order from left to

right:

'W', 'e', ' ', 'c', 'o', 'u', 'l', 'd'

And so on.

A string doesn’t have to be a whole sentence, of course, or even anything meaningful.

Any ordered sequence of characters is a string. Notice that each character might be an

uppercase letter, lowercase letter, space, punctuation mark, number (or, in fact, any

other textual symbol). It doesn’t even have to be an English letter. It could be Arabic,

for example:

ΔϴΑήόϟ΍

Here we have the following characters:

'΍' ,'ϝ' ,'ω' ,'έ' ,'Ώ' ,'ϱ' ,'Γ'

If you look carefully, you’ll notice that the string is ordered the other way round—the

first character is the rightmost one, and the last character is the leftmost one. This is

because Arabic scripts read right to left and not left to right; but the string is still ordered,

character by character.

A quick reminder: a font is a particular visual design for an entire set of

characters. Historically, it was a box containing a set of moveable type

in a specific design at a certain size, but we’ve come to blur the meanings

of font family, typeface, and font in popular usage, and people tend to

use these terms interchangeably now.

I think it is interesting to note that only a few years ago, fonts were the

sole purview of designers and printers; but they’ve now become commonplace, thanks to the ubiquity of the word processor.

Just in case you have been on the moon since 1968, here are three examples taken from different fonts:

316 | Chapter 10: Strings

You’ll also notice that the “joined up” cursive form of the characters is visually quite

different from their form when separated out individually. This is normal; the ultimate

visual representation of the character in the string is entirely separate from the string

itself. We’re just so used to the characters of our own language that we don’t tend to

think of them as abstract symbols, and tend to discount any visual differences down to

the choice of font or other typographical niceties when we are interpreting them.

We could happily design a font where the character e looks like Q and the character

f looks like A. All our text processing would continue as normal: searching and sorting

would be just fine (words starting with f wouldn’t start appearing in the dictionary

before words starting with e), because the data in the string is unchanged; but when

we drew it on the screen, it would look more than a bit confusing.*

The take-home point is that there are a bunch of layers between the .NET runtime’s

representation of a string as data in memory, and its final visual appearance on a screen,

in a file, or in another application (such as notepad.exe, for example). As we go through

this chapter, we’ll unpick those layers as we come across them, and point out some of

the common pitfalls.

Let’s get on and see how the .NET Framework presents a string to us.

The String and Char Types

It will come as no surprise that the .NET Framework provides us with two types that

correspond with strings and characters: String and Char. In fact, as we’ve seen before,

these are such important types that C# even provides us with keywords that correspond

to the underlying types: string and char.

String needs to provide us with that “ordered sequence of characters” behavior. It does

so by implementing IEnumerable<char>, as Example 10-1 illustrates.

Example 10-1. Iterating through the characters in a string

string myString = "I've gone all vertical.";

foreach (char theCharacter in myString)

{

Console.WriteLine(theCharacter);

}

* In fact, I don’t think that this particular typeface would catch on.

The String and Char Types | 317

If you create a console application for this code, you’ll see output like this when it runs:

What exactly does that code do? First, it initializes a variable called myString which we

will use to hold the reference to our string object (because String is a reference type).

We then enumerate the string, yielding every Char in turn, and we output each Char to

the console on its own separate line. Char is a value type, so we’re actually getting a

copy of the character from the string itself.

The string object is created using a literal string—a sequence of characters enclosed in

double quotes:

"I've gone all vertical."

We’re already quite familiar with initializing a string with a literal—we probably do it

without a second thought; but let’s have a look at these literals in a little more detail.

Literal Strings and Chars

The simplest literal string is a set of characters enclosed in double quotes, shown in the

first line of Example 10-2.

Example 10-2. A string literal

string myString = "Literal string";

Console.WriteLine(myString);

This produces the output:

Literal string

318 | Chapter 10: Strings

You can also initialize a string from a char[], using the appropriate constructor. One

way to obtain a char array is by using char literals. A char literal is a single character,

wrapped in single quotes. Example 10-3 constructs a string this way.

Example 10-3. Initializing a string from char literals

string myString = new string(new []

{ 'H', 'e', 'l', 'l', 'o', ' ', '"', 'w', 'o', 'r', 'l', 'd', '"' });

Console.WriteLine(myString);

If you compile and run this, you’ll see the following output:

Hello "world"

Notice that we’ve got double-quote marks in our output. That was easy to achieve with

this char[], because the delimiter for an individual character is the single quote; but

how could we include double quotes in the string, without resorting to a literal char

array? Equally, how could we specify the single-quote character as a literal char?

Escaping Special Characters

The way to deal with troublesome characters in string and char literals is to escape them

with the backslash character. That means that you precede the quote with a \, and it

interprets the quote as part of the string, rather than the end of it. Like this:†

"Literal \"string\""

If you build and run with this change, you’ll see the output, with quotes in place:

Literal "string"

There are several other special characters that you can escape in this way. You can find

some common ones listed in Table 10-1.

Table 10-1. Common escaped characters for string literals

Escaped character Purpose

\" Include a double quote in a string literal.

\' Include a single quote in a char literal.

\\ Insert a backslash.

\n New line.

\r Carriage return.

\t Tab.

There are also some rather uncommon ones, listed in Table 10-2. In general, you don’t

need to worry about them, but they are quite interesting.

† We’ll just show the string literal from here on, rather than repeating the boilerplate code each time. Just

replace the string initializer with the example.

Literal Strings and Chars | 319

Table 10-2. Less common escape characters for string literals

Escaped character Purpose

\0 The character represented by the char with value zero (not the character '0').

\a Alert or “Bell”. Back in the dim and distant past, terminals didn’t really have sound, so you couldn’t play

a great big .wav file beautifully designed by Robert Fripp every time you wanted to alert the user to the

fact that he had done something a bit wrong. Instead, you sent this character to the console, and it beeped

at you, or even dinged a real bell (like the line-end on a manual typewriter). It still works today, and on

some PCs there’s still a separate speaker just for making this old-school beep. Try it, but be prepared for

unexpected retro-side effects like growing enormous sideburns and developing an obsession with disco.

\b Backspace. Yes, you can include backspaces in your string.

Write:

"Hello world\b\b\b\b\bdolly"

to the console, and you’ll see:

Hello dolly

Not all rendering engines support this character, though. You can see the same string rendered in a WPF

application in Figure 10-1. Notice how the backspace characters have been ignored.

Remember: output mechanisms can interpret individual characters differently, even though they’re the

same character, in the same string.

\f Form feed. Another special character from yesteryear. This used to push a whole page worth of paper

through the printer. This is somewhat less than useful now, though. Even the console doesn’t do what

you’d expect.

If you write:

"Hello\fworld"

to the console, you’ll see something like:

Hello♀world

Yes, that is the symbol for “female” in the middle there. That’s because the original IBM PC defined a

special character mapping so that it could use some of these characters to produce graphical symbols

(like male, female, heart, club, diamond, and spade) that weren’t part of the regular character set. These

mappings are sometimes called code pages, and the default code page for the console (at least for U.S.

English systems) incorporates those original IBM definitions. We’ll talk more about code pages and

encodings later.

\v Vertical quote. This one looks like a “male” symbol (♂) in the console’s IBM-emulating code page.

The first character in Table 10-2 is worth a little attention: character value 0, sometimes

also referred to as the null character, although it’s not the same as a null reference—

char is a value type, so it’s more like the char equivalent of the number 0. In a lot of

programming systems, this character is used to mark the end of a string—C and C++

use this convention, as do many Windows APIs. However, in .NET, and therefore in

C#, string objects contain the length as a separate field, and so you’re free to put null

characters in your strings if you want. However, you may need to be careful—if those

320 | Chapter 10: Strings

strings end up being passed to Windows APIs, it’s possible that Windows will ignore

everything after the first null.

There’s one more escape form that’s a little different from all the others, because you

can use it to escape any character. This escape sequence begins with \u and is then

followed by four hexadecimal digits, letting you specify the exact numeric value for a

character. How can a textual character have a numeric value? Well, we’ll get into that

in detail in the “Encoding Characters” on page 360 section, but roughly speaking, each

possible character can be identified by number. For example, the uppercase letter A has

the number 65, B is 66, and so on. In hexadecimal, those are 41 and 42, respectively.

So we can write this string:

"\u0041\u0042\u0043"

which is equivalent to:

"ABC"

Of course, if that’s the string you want, you’d normally just write that second form.

The \u escape sequence is more useful when you need a particular character that’s not

Sometimes you’ll have a block of text that includes a lot of these special characters (like

carriage returns, for instance) and you want to just paste it out of some other application

straight into your code as a literal string without having to add lots of backslashes.

While it can be done, you might question the wisdom of large quantities

of text in your C# source files. You might want to store the text in a

separate resource file, and load it up on demand.

If you prefix the opening double-quote mark with the @ symbol, the compiler will then

interpret every subsequent character (including any whitespace such as newlines, and

tabs) as part of the string, until it sees a matching double-quote mark to close the string.

Example 10-4 exploits this to embed new lines and indentation in a string literal.

Figure 10-1. WPF ignoring control characters

Literal Strings and Chars | 321

Thư viện tri thức trực tuyến

Programming C# 4.0 phần 5 docx

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Programming C# 4.0 phần 1 pdf

Programming C# 4.0 phần 4 pdf

Programming C# 4.0 phần 6 doc

Programming C# 4.0 phần 7 pdf

Programming C# 4.0 phần 2 ppt

Programming C# 4.0 phần 8 doc