Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Programming C# 4.0 phần 6 doc
Nội dung xem thử
Mô tả chi tiết
I warmly recommend that you crank UAC up to the maximum (and put up with the
occasional security dialog), run Visual Studio as a nonadministrator (as far as is possible), and think at every stage about the least possible privileges you can grant to your
users that will still let them get their work done. Making your app more secure benefits
everyone: not just your own users, but everyone who doesn’t receive a spam email or
a hack attempt because the bad guys couldn’t exploit your application.
We’ve now handled the exception nicely—but is stopping really the best thing we could
have done? Would it not be better to log the fact that we were unable to access particular
directories, and carry on? Similarly, if we get a DirectoryNotFoundException or FileNot
FoundException, wouldn’t we want to just carry on in this case? The fact that someone
has deleted the directory from underneath us shouldn’t matter to us.
If we look again at our sample, it might be better to catch the DirectoryNotFoundExcep
tion and FileNotFoundException inside the InspectDirectories method to provide a
more fine-grained response to errors. Also, if we look at the documentation for
FileInfo, we’ll see that it may actually throw a base IOException under some circumstances, so we should catch that here, too. And in all cases, we need to catch the security
exceptions.
We’re relying on LINQ to iterate through the files and folders, which means it’s not
entirely obvious where to put the exception handling. Example 11-28 shows the code
from InspectDirectories that iterates through the folders, to get a list of files. We can’t
put exception handling code into the middle of that query.
Example 11-28. Iterating through the directories
var allFilePaths = from directory in directoriesToSearch
from file in Directory.GetFiles(directory, "*.*",
searchOption)
select file;
However, we don’t have to. The simplest way to solve this is to put the code that gets
the directories into a separate method, so we can add exception handling, as Example 11-29 shows.
Example 11-29. Putting exception handling in a helper method
private static IEnumerable<string> GetDirectoryFiles(
string directory, SearchOption searchOption)
{
try
{
return Directory.GetFiles(directory, "*.*", searchOption);
}
catch (DirectoryNotFoundException dnfx)
{
Console.WriteLine("Warning: The specified directory was not found");
Console.WriteLine(dnfx.Message);
}
catch (UnauthorizedAccessException uax)
406 | Chapter 11: Files and Streams
{
Console.WriteLine(
"Warning: You do not have permission to access this directory.");
Console.WriteLine(uax.Message);
}
return Enumerable.Empty<string>();
}
This method defers to Directory.GetFiles, but in the event of one of the expected
errors, it displays a warning, and then just returns an empty collection.
There’s a problem here when we ask GetFiles to search recursively: if
it encounters a problem with even just one directory, the whole operation throws, and you’ll end up not looking in any directories. So while
Example 11-29 makes a difference only when the user passes multiple
directories on the command line, it’s not all that useful when using
the /sub option. If you wanted to make your error handling more finegrained still, you could write your own recursive directory search. The
GetAllFilesInDirectory example in Chapter 7 shows how to do that.
If we modify the LINQ query to use this, as shown in Example 11-30, the overall progress will be undisturbed by the error handling.
Example 11-30. Iterating in the face of errors
var allFilePaths = from directory in directoriesToSearch
from file in GetDirectoryFiles(directory,
searchOption)
select file;
And we can use a similar technique for the LINQ query that populates the
fileNameGroups—it uses FileInfo, and we need to handle exceptions for that. Example 11-31 iterates through a list of paths, and returns details for each file that it was
able to access successfully, displaying errors otherwise.
Example 11-31. Handling exceptions from FileInfo
private static IEnumerable<FileDetails> GetDetails(IEnumerable<string> paths)
{
foreach (string filePath in paths)
{
FileDetails details = null;
try
{
FileInfo info = new FileInfo(filePath);
details = new FileDetails
{
FilePath = filePath,
FileSize = info.Length
};
When Files Go Bad: Dealing with Exceptions | 407
}
catch (FileNotFoundException fnfx)
{
Console.WriteLine("Warning: The specified file was not found");
Console.WriteLine(fnfx.Message);
}
catch (IOException iox)
{
Console.Write("Warning: ");
Console.WriteLine(iox.Message);
}
catch (UnauthorizedAccessException uax)
{
Console.WriteLine(
"Warning: You do not have permission to access this file.");
Console.WriteLine(uax.Message);
}
if (details != null)
{
yield return details;
}
}
}
We can use this from the final LINQ query in InspectDirectories. Example 11-32
shows the modified query.
Example 11-32. Getting details while tolerating errors
var fileNameGroups = from filePath in allFilePaths
let fileNameWithoutPath = Path.GetFileName(filePath)
group filePath by fileNameWithoutPath into nameGroup
select new FileNameGroup
{
FileNameWithoutPath = nameGroup.Key,
FilesWithThisName = GetDetails(nameGroup).ToList()
};
Again, this enables the query to process all accessible items, while reporting errors for
any problematic files without having to stop completely. If we compile and run again,
we see the following output:
C:\Users\mwa\AppData\Local\dcyx0fv1.hv3
C:\Users\mwa\AppData\Local\0nf2wqwr.y3s
C:\Users\mwa\AppData\Local\kfilxte4.exy
Warning: You do not have permission to access this directory.
Access to the path 'C:\Users\mwa\AppData\Local\r2gl4q1a.ycp\' is denied.
SameNameAndContent.txt
----------------------
C:\Users\mwa\AppData\Local\dcyx0fv1.hv3
C:\Users\mwa\AppData\Local\0nf2wqwr.y3s
C:\Users\mwa\AppData\Local\kfilxte4.exy
408 | Chapter 11: Files and Streams
We’ve dealt cleanly with the directory to which we did not have access, and have continued with the job to a successful conclusion.
Now that we’ve found a few candidate files that may (or may not) be the same, can we
actually check to see that they are, in fact, identical, rather than just coincidentally
having the same name and length?
Reading Files into Memory
To compare the candidate files, we could load them into memory. The File class offers
three likely looking static methods: ReadAllBytes, which treats the file as binary, and
loads it into a byte array; File.ReadAllText, which treats it as text, and reads it all into
a string; and File.ReadLines, which again treats it as text, but loads each line into its
own string, and returns an array of all the lines. We could even call File.OpenRead to
obtain a StreamReader (equivalent to the StreamWriter, but for reading data—we’ll see
this again later in the chapter).
Because we’re looking at all file types, not just text, we need to use one of the binarybased methods. File.ReadAllBytes returns a byte[] containing the entire contents of
the file. We could then compare the files byte for byte, to see if they are the same. Here’s
some code to do that.
First, let’s update our DisplayMatches function to do the load and compare, as shown
by the highlighted lines in Example 11-33.
Example 11-33. Updating DisplayMatches for content comparison
private static void DisplayMatches(
IEnumerable<FileNameGroup> filesGroupedByName)
{
var groupsWithMoreThanOneFile = from nameGroup in filesGroupedByName
where nameGroup.FilesWithThisName.Count > 1
select nameGroup;
foreach (var fileNameGroup in groupsWithMoreThanOneFile)
{
// Group the matches by the file size, then select those
// with more than 1 file of that size.
var matchesBySize = from match in fileNameGroup.FilesWithThisName
group match by match.FileSize into sizeGroup
where sizeGroup.Count() > 1
select sizeGroup;
foreach (var matchedBySize in matchesBySize)
{
List<FileContents> content = LoadFiles(matchedBySize);
CompareFiles(content);
}
}
}
Reading Files into Memory | 409
Notice that we want our LoadFiles function to return a List of FileContents objects.
Example 11-34 shows the FileContents class.
Example 11-34. File content information class
internal class FileContents
{
public string FilePath { get; set; }
public byte[] Content { get; set; }
}
It just lets us associate the filename with the contents so that we can use it later to
display the results. Example 11-35 shows the implementation of LoadFiles, which uses
ReadAllBytes to load in the file content.
Example 11-35. Loading binary file content
private static List<FileContents> LoadFiles(IEnumerable<FileDetails> fileList)
{
var content = new List<FileContents>();
foreach (FileDetails item in fileList)
{
byte[] contents = File.ReadAllBytes(item.FilePath);
content.Add(new FileContents
{
FilePath = item.FilePath,
Content = contents
});
}
return content;
}
We now need an implementation for CompareFiles, which is shown in Example 11-36.
Example 11-36. CompareFiles method
private static void CompareFiles(List<FileContents> files)
{
Dictionary<FileContents, List<FileContents>> potentiallyMatched =
BuildPotentialMatches(files);
// Now, we're going to look at every byte in each
CompareBytes(files, potentiallyMatched);
DisplayResults(files, potentiallyMatched);
}
This isn’t exactly the most elegant way of comparing several files. We’re building a big
dictionary of all of the potential matching combinations, and then weeding out the
ones that don’t actually match. For large numbers of potential matches of the same size
this could get quite inefficient, but we’ll not worry about that right now! Example 11-37 shows the function that builds those potential matches.
410 | Chapter 11: Files and Streams
Example 11-37. Building possible match combinations
private static Dictionary<FileContents, List<FileContents>>
BuildPotentialMatches(List<FileContents> files)
{
// Builds a dictionary where the entries look like:
// { 0, { 1, 2, 3, 4, ... N } }
// { 1, { 2, 3, 4, ... N }
// ...
// { N - 1, { N } }
// where N is one less than the number of files.
var allCombinations = Enumerable.Range(0, files.Count - 1).ToDictionary(
x => files[x],
x => files.Skip(x + 1).ToList());
return allCombinations;
}
This set of potential matches will be whittled down to the files that really are the same
by CompareBytes, which we’ll get to momentarily. The DisplayResults method, shown
in Example 11-38, runs through the matches and displays their names and locations.
Example 11-38. Displaying matches
private static void DisplayResults(
List<FileContents> files,
Dictionary<FileContents, List<FileContents>> currentlyMatched)
{
if (currentlyMatched.Count == 0) { return; }
var alreadyMatched = new List<FileContents>();
Console.WriteLine("Matches");
foreach (var matched in currentlyMatched)
{
// Don't do it if we've already matched it previously
if (alreadyMatched.Contains(matched.Key))
{
continue;
}
else
{
alreadyMatched.Add(matched.Key);
}
Console.WriteLine("-------");
Console.WriteLine(matched.Key.FilePath);
foreach (var file in matched.Value)
{
Console.WriteLine(file.FilePath);
alreadyMatched.Add(file);
}
}
Console.WriteLine("-------");
}
Reading Files into Memory | 411
This leaves the method shown in Example 11-39 that does the bulk of the work, comparing the potentially matching files, byte for byte.
Example 11-39. Byte-for-byte comparison of all potential matches
private static void CompareBytes(
List<FileContents> files,
Dictionary<FileContents, List<FileContents>> potentiallyMatched)
{
// Remember, this only ever gets called with files of equal length.
int fileLength = files[0].Content.Length;
var sourceFilesWithNoMatches = new List<FileContents>();
for (int fileByteOffset = 0; fileByteOffset < fileLength; ++fileByteOffset)
{
foreach (var sourceFileEntry in potentiallyMatched)
{
byte[] sourceContent = sourceFileEntry.Key.Content;
for (int otherIndex = 0; otherIndex < sourceFileEntry.Value.Count;
++otherIndex)
{
// Check the byte at i in each of the two files, if they don't
// match, then we remove them from the collection
byte[] otherContent =
sourceFileEntry.Value[otherIndex].Content;
if (sourceContent[fileByteOffset] != otherContent[fileByteOffset])
{
sourceFileEntry.Value.RemoveAt(otherIndex);
otherIndex -= 1;
if (sourceFileEntry.Value.Count == 0)
{
sourceFilesWithNoMatches.Add(sourceFileEntry.Key);
}
}
}
}
foreach (FileContents fileWithNoMatches in sourceFilesWithNoMatches)
{
potentiallyMatched.Remove(fileWithNoMatches);
}
// Don't bother with the rest of the file if
// there are no further potential matches
if (potentiallyMatched.Count == 0)
{
break;
}
sourceFilesWithNoMatches.Clear();
}
}
We’re going to need to add a test file that differs only in the content. In CreateTest
Files add another filename that doesn’t change as we go round the loop:
string fileSameSizeInAllButDifferentContent =
"SameNameAndSizeDifferentContent.txt";
412 | Chapter 11: Files and Streams
Then, inside the loop (at the bottom), we’ll create a test file that will be the same length,
but varying by only a single byte:
// And now one that is the same length, but with different content
fullPath = Path.Combine(directory, fileSameSizeInAllButDifferentContent);
builder = new StringBuilder();
builder.Append("Now with ");
builder.Append(directoryIndex);
builder.AppendLine(" extra");
CreateFile(fullPath, builder.ToString());
If you build and run, you should see some output like this, showing the one identical
file we have in each file location:
C:\Users\mwa\AppData\Local\e33yz4hg.mjp
C:\Users\mwa\AppData\Local\ung2xdgo.k1c
C:\Users\mwa\AppData\Local\jcpagntt.ynd
Warning: You do not have permission to access this directory.
Access to the path 'C:\Users\mwa\AppData\Local\cmoof2kj.ekd\' is denied.
Matches
-------
C:\Users\mwa\AppData\Local\e33yz4hg.mjp\SameNameAndContent.txt
C:\Users\mwa\AppData\Local\ung2xdgo.k1c\SameNameAndContent.txt
C:\Users\mwa\AppData\Local\jcpagntt.ynd\SameNameAndContent.txt
-------
Needless to say, this isn’t exactly very efficient; and it is unlikely to work so well when
you get to those DVD rips and massive media repositories. Even your 64-bit machine
probably doesn’t have quite that much memory available to it.*
There’s a way to make
this more memory-efficient. Instead of loading the file completely into memory, we can
take a streaming approach.
Streams
You can think of a stream like one of those old-fashioned news ticker tapes. To write
data onto the tape, the bytes (or characters) in the file are typed out, one at a time, on
the continuous stream of tape.
We can then wind the tape back to the beginning, and start reading it back, character
by character, until either we stop or we run off the end of the tape. Or we could give
the tape to someone else, and she could do the same. Or we could read, say, 1,000
characters off the tape, and copy them onto another tape which we give to someone to
work on, then read the next 1,000, and so on, until we run out of characters.
* In fact, it is slightly more constrained than that. The .NET Framework limits arrays to 2 GB, and will throw
an exception if you try to load a larger file into memory all at once.
Streams | 413