Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Programming C# 4.0 phần 5 docx
Nội dung xem thử
Mô tả chi tiết
while (current != null)
{
if (current.Value.AtImminentRiskOfDeath)
{
current = current.Next;
}
else
{
break;
}
}
if (current == null)
{
waitingPatients.AddLast(newPatient);
}
else
{
waitingPatients.AddBefore(current, newPatient);
}
This code adds the new patient after all those patients in the queue whose lives appear
to be at immediate risk, but ahead of all other patients—the patient is presumably either
quite unwell or a generous hospital benefactor. (Real triage is a little more complex, of
course, but you still insert items into the list in the same way, no matter how you go
about choosing the insertion point.)
Note the use of LinkedListNode<T>—this is how LinkedList<T> presents the queue’s
contents. It allows us not only to see the item in the queue, but also to navigate back
and forth through the queue with the Next and Previous properties.
Stacks
Whereas Queue<T> operates a FIFO order, Stack<T> operates a last in, first out (LIFO)
order. Looking at this from a queuing perspective, it seems like the height of
unfairness—latecomers get priority over those who arrived early. However, there are
some situations in which this topsy-turvy ordering can make sense.
A performance characteristic of most computers is that they tend to be able to work
faster with data they’ve processed recently than with data they’ve not touched lately.
CPUs have caches that provide faster access to data than a computer’s main memory
can support, and these caches typically operate a policy where recently used data is
more likely to stay in the cache than data that has not been touched recently.
If you’re writing a server-side application, you may consider throughput to be more
important than fairness—the total rate at which you process work may matter more
than how long any individual work item takes to complete. In this case, a LIFO order
may make the most sense—work items that were only just put into a queue are much
more likely to still live in the CPU’s cache than those that were queued up ages ago,
Stacks | 313
and so you’ll get better throughput during high loads if you process newly arrived items
first. Items that have sat in the queue for longer will just have to wait for a lull.
Like Queue<T>, Stack<T> offers a method to add an item, and one to remove it. It calls
these Push and Pop, respectively. They are very similar to the queue’s Enqueue and
Dequeue, except they both work off the same end of the list. (You could get the same
effect using a LinkedList, and always calling AddFirst and RemoveFirst.)
A stack could also be useful for managing navigation history. The Back button in a
browser works in LIFO order—the first page it shows you is the last one you visited.
(And if you want a Forward button, you could define a second stack—each time the
user goes Back, Push the current page onto the Forward stack. Then if the user clicks
Forward, Pop a page from the Forward stack, and Push the current page onto the Back
stack.)
Summary
The .NET Framework class library provides various useful collection classes. We saw
List<T> in an earlier chapter, which provides a simple resizable linear list of items.
Dictionaries store entries by associating them with keys, providing fast key-based
lookup. HashSet<T> and SortedSet<T> manage sets of unique items, with optional ordering. Queues, linked lists, and stacks each manage a queue of items, offering various
strategies for how the order of addition relates to the order in which items come out of
the queue.
314 | Chapter 9: Collection Classes
CHAPTER 10
Strings
Chapter 10 is all about strings. A bit late, you might think: we’ve had about nine chapters of string-based action already! Well, yes, you’d be right. That’s not terribly surprising, though: text is probably the single most important means an application has
of communicating with its users. That is especially true as we haven’t introduced any
graphical frameworks yet. I suppose we could have beeped the system speaker in Morse,
although even that can be considered a text-based operation.
Even with a graphical UI framework where we have pictures and buttons and graphs
and sounds, they almost always have textual labels, descriptions, comments, or tool
tips.
Users who have difficulty reading (perhaps because they have a low-vision condition)
may have that text transformed into sound by accessibility tools, but the application is
still processing text strings under the covers.
Even when we are dealing with integers or doubles internally within an algorithm, there
comes a time when we need to represent them to humans, and preferably in a way that
is meaningful to us. We usually do that (at least in part) by converting them into strings
of one form or another.
Strings are surprisingly complex and sophisticated entities, so we’re going to take some
time to explore their properties in this chapter.
First, we’ll look at what we’re really doing when we initialize a literal string. Then, we’ll
see a couple of techniques which let us convert from other types to a string representation and how we can control the formatting of that conversion.
Next, we’ll look at various different techniques we can use to process a string. This will
include composition, splitting, searching and replacing content, and what it means to
compare strings of various kinds.
Finally, we will look at how .NET represents strings internally, how that differs from
other representations in popular use in the world, and how we can convert between
those representations by using an Encoding.
315
What Is a String?
A string is an ordered sequence of characters:
We could consider this sentence to be a string.
We start with the first character, which is W. Then we continue on in order from left to
right:
'W', 'e', ' ', 'c', 'o', 'u', 'l', 'd'
And so on.
A string doesn’t have to be a whole sentence, of course, or even anything meaningful.
Any ordered sequence of characters is a string. Notice that each character might be an
uppercase letter, lowercase letter, space, punctuation mark, number (or, in fact, any
other textual symbol). It doesn’t even have to be an English letter. It could be Arabic,
for example:
ΔϴΑήόϟ
Here we have the following characters:
'' ,'ϝ' ,'ω' ,'έ' ,'Ώ' ,'ϱ' ,'Γ'
If you look carefully, you’ll notice that the string is ordered the other way round—the
first character is the rightmost one, and the last character is the leftmost one. This is
because Arabic scripts read right to left and not left to right; but the string is still ordered,
character by character.
A quick reminder: a font is a particular visual design for an entire set of
characters. Historically, it was a box containing a set of moveable type
in a specific design at a certain size, but we’ve come to blur the meanings
of font family, typeface, and font in popular usage, and people tend to
use these terms interchangeably now.
I think it is interesting to note that only a few years ago, fonts were the
sole purview of designers and printers; but they’ve now become commonplace, thanks to the ubiquity of the word processor.
Just in case you have been on the moon since 1968, here are three examples taken from different fonts:
316 | Chapter 10: Strings
You’ll also notice that the “joined up” cursive form of the characters is visually quite
different from their form when separated out individually. This is normal; the ultimate
visual representation of the character in the string is entirely separate from the string
itself. We’re just so used to the characters of our own language that we don’t tend to
think of them as abstract symbols, and tend to discount any visual differences down to
the choice of font or other typographical niceties when we are interpreting them.
We could happily design a font where the character e looks like Q and the character
f looks like A. All our text processing would continue as normal: searching and sorting
would be just fine (words starting with f wouldn’t start appearing in the dictionary
before words starting with e), because the data in the string is unchanged; but when
we drew it on the screen, it would look more than a bit confusing.*
The take-home point is that there are a bunch of layers between the .NET runtime’s
representation of a string as data in memory, and its final visual appearance on a screen,
in a file, or in another application (such as notepad.exe, for example). As we go through
this chapter, we’ll unpick those layers as we come across them, and point out some of
the common pitfalls.
Let’s get on and see how the .NET Framework presents a string to us.
The String and Char Types
It will come as no surprise that the .NET Framework provides us with two types that
correspond with strings and characters: String and Char. In fact, as we’ve seen before,
these are such important types that C# even provides us with keywords that correspond
to the underlying types: string and char.
String needs to provide us with that “ordered sequence of characters” behavior. It does
so by implementing IEnumerable<char>, as Example 10-1 illustrates.
Example 10-1. Iterating through the characters in a string
string myString = "I've gone all vertical.";
foreach (char theCharacter in myString)
{
Console.WriteLine(theCharacter);
}
* In fact, I don’t think that this particular typeface would catch on.
The String and Char Types | 317
If you create a console application for this code, you’ll see output like this when it runs:
I
'
v
e
g
o
n
e
a
l
l
v
e
r
t
i
c
a
l
.
What exactly does that code do? First, it initializes a variable called myString which we
will use to hold the reference to our string object (because String is a reference type).
We then enumerate the string, yielding every Char in turn, and we output each Char to
the console on its own separate line. Char is a value type, so we’re actually getting a
copy of the character from the string itself.
The string object is created using a literal string—a sequence of characters enclosed in
double quotes:
"I've gone all vertical."
We’re already quite familiar with initializing a string with a literal—we probably do it
without a second thought; but let’s have a look at these literals in a little more detail.
Literal Strings and Chars
The simplest literal string is a set of characters enclosed in double quotes, shown in the
first line of Example 10-2.
Example 10-2. A string literal
string myString = "Literal string";
Console.WriteLine(myString);
This produces the output:
Literal string
318 | Chapter 10: Strings
You can also initialize a string from a char[], using the appropriate constructor. One
way to obtain a char array is by using char literals. A char literal is a single character,
wrapped in single quotes. Example 10-3 constructs a string this way.
Example 10-3. Initializing a string from char literals
string myString = new string(new []
{ 'H', 'e', 'l', 'l', 'o', ' ', '"', 'w', 'o', 'r', 'l', 'd', '"' });
Console.WriteLine(myString);
If you compile and run this, you’ll see the following output:
Hello "world"
Notice that we’ve got double-quote marks in our output. That was easy to achieve with
this char[], because the delimiter for an individual character is the single quote; but
how could we include double quotes in the string, without resorting to a literal char
array? Equally, how could we specify the single-quote character as a literal char?
Escaping Special Characters
The way to deal with troublesome characters in string and char literals is to escape them
with the backslash character. That means that you precede the quote with a \, and it
interprets the quote as part of the string, rather than the end of it. Like this:†
"Literal \"string\""
If you build and run with this change, you’ll see the output, with quotes in place:
Literal "string"
There are several other special characters that you can escape in this way. You can find
some common ones listed in Table 10-1.
Table 10-1. Common escaped characters for string literals
Escaped character Purpose
\" Include a double quote in a string literal.
\' Include a single quote in a char literal.
\\ Insert a backslash.
\n New line.
\r Carriage return.
\t Tab.
There are also some rather uncommon ones, listed in Table 10-2. In general, you don’t
need to worry about them, but they are quite interesting.
† We’ll just show the string literal from here on, rather than repeating the boilerplate code each time. Just
replace the string initializer with the example.
Literal Strings and Chars | 319
Table 10-2. Less common escape characters for string literals
Escaped character Purpose
\0 The character represented by the char with value zero (not the character '0').
\a Alert or “Bell”. Back in the dim and distant past, terminals didn’t really have sound, so you couldn’t play
a great big .wav file beautifully designed by Robert Fripp every time you wanted to alert the user to the
fact that he had done something a bit wrong. Instead, you sent this character to the console, and it beeped
at you, or even dinged a real bell (like the line-end on a manual typewriter). It still works today, and on
some PCs there’s still a separate speaker just for making this old-school beep. Try it, but be prepared for
unexpected retro-side effects like growing enormous sideburns and developing an obsession with disco.
\b Backspace. Yes, you can include backspaces in your string.
Write:
"Hello world\b\b\b\b\bdolly"
to the console, and you’ll see:
Hello dolly
Not all rendering engines support this character, though. You can see the same string rendered in a WPF
application in Figure 10-1. Notice how the backspace characters have been ignored.
Remember: output mechanisms can interpret individual characters differently, even though they’re the
same character, in the same string.
\f Form feed. Another special character from yesteryear. This used to push a whole page worth of paper
through the printer. This is somewhat less than useful now, though. Even the console doesn’t do what
you’d expect.
If you write:
"Hello\fworld"
to the console, you’ll see something like:
Hello♀world
Yes, that is the symbol for “female” in the middle there. That’s because the original IBM PC defined a
special character mapping so that it could use some of these characters to produce graphical symbols
(like male, female, heart, club, diamond, and spade) that weren’t part of the regular character set. These
mappings are sometimes called code pages, and the default code page for the console (at least for U.S.
English systems) incorporates those original IBM definitions. We’ll talk more about code pages and
encodings later.
\v Vertical quote. This one looks like a “male” symbol (♂) in the console’s IBM-emulating code page.
The first character in Table 10-2 is worth a little attention: character value 0, sometimes
also referred to as the null character, although it’s not the same as a null reference—
char is a value type, so it’s more like the char equivalent of the number 0. In a lot of
programming systems, this character is used to mark the end of a string—C and C++
use this convention, as do many Windows APIs. However, in .NET, and therefore in
C#, string objects contain the length as a separate field, and so you’re free to put null
characters in your strings if you want. However, you may need to be careful—if those
320 | Chapter 10: Strings
strings end up being passed to Windows APIs, it’s possible that Windows will ignore
everything after the first null.
There’s one more escape form that’s a little different from all the others, because you
can use it to escape any character. This escape sequence begins with \u and is then
followed by four hexadecimal digits, letting you specify the exact numeric value for a
character. How can a textual character have a numeric value? Well, we’ll get into that
in detail in the “Encoding Characters” on page 360 section, but roughly speaking, each
possible character can be identified by number. For example, the uppercase letter A has
the number 65, B is 66, and so on. In hexadecimal, those are 41 and 42, respectively.
So we can write this string:
"\u0041\u0042\u0043"
which is equivalent to:
"ABC"
Of course, if that’s the string you want, you’d normally just write that second form.
The \u escape sequence is more useful when you need a particular character that’s not
on your keyboard. For example, \u00A9 is the copyright symbol: ©.
Sometimes you’ll have a block of text that includes a lot of these special characters (like
carriage returns, for instance) and you want to just paste it out of some other application
straight into your code as a literal string without having to add lots of backslashes.
While it can be done, you might question the wisdom of large quantities
of text in your C# source files. You might want to store the text in a
separate resource file, and load it up on demand.
If you prefix the opening double-quote mark with the @ symbol, the compiler will then
interpret every subsequent character (including any whitespace such as newlines, and
tabs) as part of the string, until it sees a matching double-quote mark to close the string.
Example 10-4 exploits this to embed new lines and indentation in a string literal.
Figure 10-1. WPF ignoring control characters
Literal Strings and Chars | 321