Programming C# 4.0 phần 6 doc

I warmly recommend that you crank UAC up to the maximum (and put up with the

occasional security dialog), run Visual Studio as a nonadministrator (as far as is possible), and think at every stage about the least possible privileges you can grant to your

users that will still let them get their work done. Making your app more secure benefits

everyone: not just your own users, but everyone who doesn’t receive a spam email or

a hack attempt because the bad guys couldn’t exploit your application.

We’ve now handled the exception nicely—but is stopping really the best thing we could

have done? Would it not be better to log the fact that we were unable to access particular

directories, and carry on? Similarly, if we get a DirectoryNotFoundException or FileNot

FoundException, wouldn’t we want to just carry on in this case? The fact that someone

has deleted the directory from underneath us shouldn’t matter to us.

If we look again at our sample, it might be better to catch the DirectoryNotFoundExcep

tion and FileNotFoundException inside the InspectDirectories method to provide a

more fine-grained response to errors. Also, if we look at the documentation for

FileInfo, we’ll see that it may actually throw a base IOException under some circumstances, so we should catch that here, too. And in all cases, we need to catch the security

exceptions.

We’re relying on LINQ to iterate through the files and folders, which means it’s not

entirely obvious where to put the exception handling. Example 11-28 shows the code

from InspectDirectories that iterates through the folders, to get a list of files. We can’t

put exception handling code into the middle of that query.

Example 11-28. Iterating through the directories

var allFilePaths = from directory in directoriesToSearch

from file in Directory.GetFiles(directory, "*.*",

searchOption)

select file;

However, we don’t have to. The simplest way to solve this is to put the code that gets

the directories into a separate method, so we can add exception handling, as Example 11-29 shows.

Example 11-29. Putting exception handling in a helper method

private static IEnumerable<string> GetDirectoryFiles(

string directory, SearchOption searchOption)

{

try

{

return Directory.GetFiles(directory, "*.*", searchOption);

}

catch (DirectoryNotFoundException dnfx)

{

Console.WriteLine("Warning: The specified directory was not found");

Console.WriteLine(dnfx.Message);

}

catch (UnauthorizedAccessException uax)

406 | Chapter 11: Files and Streams

{

Console.WriteLine(

"Warning: You do not have permission to access this directory.");

Console.WriteLine(uax.Message);

}

return Enumerable.Empty<string>();

}

This method defers to Directory.GetFiles, but in the event of one of the expected

errors, it displays a warning, and then just returns an empty collection.

There’s a problem here when we ask GetFiles to search recursively: if

it encounters a problem with even just one directory, the whole operation throws, and you’ll end up not looking in any directories. So while

Example 11-29 makes a difference only when the user passes multiple

directories on the command line, it’s not all that useful when using

the /sub option. If you wanted to make your error handling more finegrained still, you could write your own recursive directory search. The

GetAllFilesInDirectory example in Chapter 7 shows how to do that.

If we modify the LINQ query to use this, as shown in Example 11-30, the overall progress will be undisturbed by the error handling.

Example 11-30. Iterating in the face of errors

var allFilePaths = from directory in directoriesToSearch

from file in GetDirectoryFiles(directory,

searchOption)

select file;

And we can use a similar technique for the LINQ query that populates the

fileNameGroups—it uses FileInfo, and we need to handle exceptions for that. Example 11-31 iterates through a list of paths, and returns details for each file that it was

able to access successfully, displaying errors otherwise.

Example 11-31. Handling exceptions from FileInfo

private static IEnumerable<FileDetails> GetDetails(IEnumerable<string> paths)

{

foreach (string filePath in paths)

{

FileDetails details = null;

try

{

FileInfo info = new FileInfo(filePath);

details = new FileDetails

{

FilePath = filePath,

FileSize = info.Length

};

When Files Go Bad: Dealing with Exceptions | 407

}

catch (FileNotFoundException fnfx)

{

Console.WriteLine("Warning: The specified file was not found");

Console.WriteLine(fnfx.Message);

}

catch (IOException iox)

{

Console.Write("Warning: ");

Console.WriteLine(iox.Message);

}

catch (UnauthorizedAccessException uax)

{

Console.WriteLine(

"Warning: You do not have permission to access this file.");

Console.WriteLine(uax.Message);

}

if (details != null)

{

yield return details;

}

We can use this from the final LINQ query in InspectDirectories. Example 11-32

shows the modified query.

Example 11-32. Getting details while tolerating errors

var fileNameGroups = from filePath in allFilePaths

let fileNameWithoutPath = Path.GetFileName(filePath)

group filePath by fileNameWithoutPath into nameGroup

select new FileNameGroup

{

FileNameWithoutPath = nameGroup.Key,

FilesWithThisName = GetDetails(nameGroup).ToList()

};

Again, this enables the query to process all accessible items, while reporting errors for

any problematic files without having to stop completely. If we compile and run again,

we see the following output:

C:\Users\mwa\AppData\Local\dcyx0fv1.hv3

C:\Users\mwa\AppData\Local\0nf2wqwr.y3s

C:\Users\mwa\AppData\Local\kfilxte4.exy

Warning: You do not have permission to access this directory.

Access to the path 'C:\Users\mwa\AppData\Local\r2gl4q1a.ycp\' is denied.

SameNameAndContent.txt

----------------------

C:\Users\mwa\AppData\Local\dcyx0fv1.hv3

C:\Users\mwa\AppData\Local\0nf2wqwr.y3s

C:\Users\mwa\AppData\Local\kfilxte4.exy

408 | Chapter 11: Files and Streams

We’ve dealt cleanly with the directory to which we did not have access, and have continued with the job to a successful conclusion.

Now that we’ve found a few candidate files that may (or may not) be the same, can we

actually check to see that they are, in fact, identical, rather than just coincidentally

having the same name and length?

Reading Files into Memory

To compare the candidate files, we could load them into memory. The File class offers

three likely looking static methods: ReadAllBytes, which treats the file as binary, and

loads it into a byte array; File.ReadAllText, which treats it as text, and reads it all into

a string; and File.ReadLines, which again treats it as text, but loads each line into its

own string, and returns an array of all the lines. We could even call File.OpenRead to

obtain a StreamReader (equivalent to the StreamWriter, but for reading data—we’ll see

this again later in the chapter).

Because we’re looking at all file types, not just text, we need to use one of the binarybased methods. File.ReadAllBytes returns a byte[] containing the entire contents of

the file. We could then compare the files byte for byte, to see if they are the same. Here’s

some code to do that.

First, let’s update our DisplayMatches function to do the load and compare, as shown

by the highlighted lines in Example 11-33.

Example 11-33. Updating DisplayMatches for content comparison

private static void DisplayMatches(

IEnumerable<FileNameGroup> filesGroupedByName)

{

var groupsWithMoreThanOneFile = from nameGroup in filesGroupedByName

where nameGroup.FilesWithThisName.Count > 1

select nameGroup;

foreach (var fileNameGroup in groupsWithMoreThanOneFile)

{

// Group the matches by the file size, then select those

// with more than 1 file of that size.

var matchesBySize = from match in fileNameGroup.FilesWithThisName

group match by match.FileSize into sizeGroup

where sizeGroup.Count() > 1

select sizeGroup;

foreach (var matchedBySize in matchesBySize)

{

List<FileContents> content = LoadFiles(matchedBySize);

CompareFiles(content);

}

Reading Files into Memory | 409

Notice that we want our LoadFiles function to return a List of FileContents objects.

Example 11-34 shows the FileContents class.

Example 11-34. File content information class

internal class FileContents

{

public string FilePath { get; set; }

public byte[] Content { get; set; }

}

It just lets us associate the filename with the contents so that we can use it later to

display the results. Example 11-35 shows the implementation of LoadFiles, which uses

ReadAllBytes to load in the file content.

Example 11-35. Loading binary file content

private static List<FileContents> LoadFiles(IEnumerable<FileDetails> fileList)

{

var content = new List<FileContents>();

foreach (FileDetails item in fileList)

{

byte[] contents = File.ReadAllBytes(item.FilePath);

content.Add(new FileContents

{

FilePath = item.FilePath,

Content = contents

});

}

return content;

}

We now need an implementation for CompareFiles, which is shown in Example 11-36.

Example 11-36. CompareFiles method

private static void CompareFiles(List<FileContents> files)

{

Dictionary<FileContents, List<FileContents>> potentiallyMatched =

BuildPotentialMatches(files);

// Now, we're going to look at every byte in each

CompareBytes(files, potentiallyMatched);

DisplayResults(files, potentiallyMatched);

}

This isn’t exactly the most elegant way of comparing several files. We’re building a big

dictionary of all of the potential matching combinations, and then weeding out the

ones that don’t actually match. For large numbers of potential matches of the same size

this could get quite inefficient, but we’ll not worry about that right now! Example 11-37 shows the function that builds those potential matches.

410 | Chapter 11: Files and Streams

Example 11-37. Building possible match combinations

private static Dictionary<FileContents, List<FileContents>>

BuildPotentialMatches(List<FileContents> files)

{

// Builds a dictionary where the entries look like:

// { 0, { 1, 2, 3, 4, ... N } }

// { 1, { 2, 3, 4, ... N }

// ...

// { N - 1, { N } }

// where N is one less than the number of files.

var allCombinations = Enumerable.Range(0, files.Count - 1).ToDictionary(

x => files[x],

x => files.Skip(x + 1).ToList());

return allCombinations;

}

This set of potential matches will be whittled down to the files that really are the same

by CompareBytes, which we’ll get to momentarily. The DisplayResults method, shown

in Example 11-38, runs through the matches and displays their names and locations.

Example 11-38. Displaying matches

private static void DisplayResults(

List<FileContents> files,

Dictionary<FileContents, List<FileContents>> currentlyMatched)

{

if (currentlyMatched.Count == 0) { return; }

var alreadyMatched = new List<FileContents>();

Console.WriteLine("Matches");

foreach (var matched in currentlyMatched)

{

// Don't do it if we've already matched it previously

if (alreadyMatched.Contains(matched.Key))

{

continue;

}

else

{

alreadyMatched.Add(matched.Key);

}

Console.WriteLine("-------");

Console.WriteLine(matched.Key.FilePath);

foreach (var file in matched.Value)

{

Console.WriteLine(file.FilePath);

alreadyMatched.Add(file);

}

Console.WriteLine("-------");

}

Reading Files into Memory | 411

This leaves the method shown in Example 11-39 that does the bulk of the work, comparing the potentially matching files, byte for byte.

Example 11-39. Byte-for-byte comparison of all potential matches

private static void CompareBytes(

List<FileContents> files,

Dictionary<FileContents, List<FileContents>> potentiallyMatched)

{

// Remember, this only ever gets called with files of equal length.

int fileLength = files[0].Content.Length;

var sourceFilesWithNoMatches = new List<FileContents>();

for (int fileByteOffset = 0; fileByteOffset < fileLength; ++fileByteOffset)

{

foreach (var sourceFileEntry in potentiallyMatched)

{

byte[] sourceContent = sourceFileEntry.Key.Content;

for (int otherIndex = 0; otherIndex < sourceFileEntry.Value.Count;

++otherIndex)

{

// Check the byte at i in each of the two files, if they don't

// match, then we remove them from the collection

byte[] otherContent =

sourceFileEntry.Value[otherIndex].Content;

if (sourceContent[fileByteOffset] != otherContent[fileByteOffset])

{

sourceFileEntry.Value.RemoveAt(otherIndex);

otherIndex -= 1;

if (sourceFileEntry.Value.Count == 0)

{

sourceFilesWithNoMatches.Add(sourceFileEntry.Key);

}

foreach (FileContents fileWithNoMatches in sourceFilesWithNoMatches)

{

potentiallyMatched.Remove(fileWithNoMatches);

}

// Don't bother with the rest of the file if

// there are no further potential matches

if (potentiallyMatched.Count == 0)

{

break;

}

sourceFilesWithNoMatches.Clear();

}

We’re going to need to add a test file that differs only in the content. In CreateTest

Files add another filename that doesn’t change as we go round the loop:

string fileSameSizeInAllButDifferentContent =

"SameNameAndSizeDifferentContent.txt";

412 | Chapter 11: Files and Streams

Then, inside the loop (at the bottom), we’ll create a test file that will be the same length,

but varying by only a single byte:

// And now one that is the same length, but with different content

fullPath = Path.Combine(directory, fileSameSizeInAllButDifferentContent);

builder = new StringBuilder();

builder.Append("Now with ");

builder.Append(directoryIndex);

builder.AppendLine(" extra");

CreateFile(fullPath, builder.ToString());

If you build and run, you should see some output like this, showing the one identical

file we have in each file location:

C:\Users\mwa\AppData\Local\e33yz4hg.mjp

C:\Users\mwa\AppData\Local\ung2xdgo.k1c

C:\Users\mwa\AppData\Local\jcpagntt.ynd

Warning: You do not have permission to access this directory.

Access to the path 'C:\Users\mwa\AppData\Local\cmoof2kj.ekd\' is denied.

Matches

-------

C:\Users\mwa\AppData\Local\e33yz4hg.mjp\SameNameAndContent.txt

C:\Users\mwa\AppData\Local\ung2xdgo.k1c\SameNameAndContent.txt

C:\Users\mwa\AppData\Local\jcpagntt.ynd\SameNameAndContent.txt

-------

Needless to say, this isn’t exactly very efficient; and it is unlikely to work so well when

you get to those DVD rips and massive media repositories. Even your 64-bit machine

probably doesn’t have quite that much memory available to it.*

There’s a way to make

this more memory-efficient. Instead of loading the file completely into memory, we can

take a streaming approach.

Streams

You can think of a stream like one of those old-fashioned news ticker tapes. To write

data onto the tape, the bytes (or characters) in the file are typed out, one at a time, on

the continuous stream of tape.

We can then wind the tape back to the beginning, and start reading it back, character

by character, until either we stop or we run off the end of the tape. Or we could give

the tape to someone else, and she could do the same. Or we could read, say, 1,000

characters off the tape, and copy them onto another tape which we give to someone to

work on, then read the next 1,000, and so on, until we run out of characters.

* In fact, it is slightly more constrained than that. The .NET Framework limits arrays to 2 GB, and will throw

an exception if you try to load a larger file into memory all at once.

Streams | 413

Thư viện tri thức trực tuyến

Programming C# 4.0 phần 6 doc

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Programming C# 4.0 phần 5 docx

Programming C# 4.0 phần 1 pdf

Programming C# 4.0 phần 4 pdf

Programming C# 4.0 phần 7 pdf

Programming C# 4.0 phần 2 ppt

Programming C# 4.0 phần 8 doc