Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Programming C# 4.0 phần 8 doc
Nội dung xem thử
Mô tả chi tiết
for as long as it needs to do a particular job—it has to be an illusion because if clients
really took it in turns, scalability would be severely limited. So transactions perform
the neat trick of letting work proceed in parallel except for when that would cause a
problem—as long as all the transactions currently in progress are working on independent data they can all proceed simultaneously, and clients have to wait their turn
only if they’re trying to use data already involved (directly, or indirectly) in some other
transaction in progress.‖
The classic example of the kind of problem transactions are designed to avoid is that
of updating the balance of a bank account. Consider what needs to happen to your
account when you withdraw money from an ATM—the bank will want to make sure
that your account is debited with the amount of money withdrawn. This will involve
subtracting that amount from the current balance, so there will be at least two operations: discovering the current balance, and then updating it to the new value. (Actually
it’ll be a whole lot more complex than that—there will be withdrawal limit checks,
fraud detection, audit trails, and more. But the simplified example is enough to illustrate
how transactions can be useful.) But what happens if some other transaction occurs at
the same time? Maybe you happen to be making a withdrawal at the same time as the
bank processes an electronic transfer of funds.
If that happens, a problem can arise. Suppose the ATM transaction and the electronic
transfer both read the current balance—perhaps they both discover a balance of $1,234.
Next, if the transfer is moving $1,000 from your account to somewhere else, it will write
back a new balance of $234—the original balance minus the amount just deducted.
But there’s the ATM transfer—suppose you withdraw $200. It will write back a new
balance of $1,034. You just withdrew $200 and paid $1,000 to another account, but
your account only has $200 less in it than before rather than $1,200—that’s great for
you, but your bank will be less happy. (In fact, your bank probably has all sorts of
checks and balances to try to minimize opportunities such as this for money to magically come into existence. So they’d probably notice such an error even if they weren’t
using transactions.) In fact, neither you nor your bank really wants this to happen, not
least because it’s easy enough to imagine similar examples where you lose money.
This problem of concurrent changes to shared data crops up in all sorts of forms. You
don’t even need to be modifying data to observe a problem: code that only ever reads
can still see weird results. For example, you might want to count your money, in which
case looking at the balances of all your accounts would be necessary—that’s a readonly operation. But what if some other code was in the middle of transferring money
between two of your accounts? Your read-only code could be messed up by other code
modifying the data.
‖ In fact, it gets a good deal cleverer than that. Databases go to some lengths to avoid making clients wait for
one another unless it’s absolutely necessary, and can sometimes manage this even when clients are accessing
the same data, particularly if they’re only reading the common data. Not all databases do this in the same
way, so consult your database documentation for further details.
Object Context | 577
A simple way to avoid this is to do one thing at a time—as long as each task completes
before the next begins, you’ll never see this sort of problem. But that turns out to be
impractical if you’re dealing with a large volume of work. And that’s why we have
transactions—they are designed to make it look like things are happening one task at
a time, but under the covers they allow tasks to proceed concurrently as long as they’re
working on unrelated information. So with transactions, the fact that some other bank
customer is in the process of performing a funds transfer will not stop you from using
an ATM. But if a transfer is taking place on one of your accounts at the same time that
you are trying to withdraw money, transactions would ensure that these two operations
take it in turns.
So code that uses transactions effectively gets exclusive access to whatever data it is
working with right now, without slowing down anything it’s not using. This means
you get the best of both worlds: you can write code as though it’s the only code running
right now, but you get good throughput.
How do we exploit transactions in C#? Example 14-20 shows the simplest approach:
if you create a TransactionScope object, the EF will automatically enlist any database
operations in the same transaction. The TransactionScope class is defined in the
System.Transactions namespace in the System.Transactions DLL (another class library
DLL for which we need to add a reference, as it’s not in the default set).
Example 14-20. TransactionScope
using (var dbContext = new AdventureWorksLT2008Entities())
{
using (var txScope = new TransactionScope())
{
var customersWithOrders = from cust in dbContext.Customers
where cust.SalesOrderHeaders.Count > 0
select cust;
foreach (var customer in customersWithOrders)
{
Console.WriteLine("Customer {0} has {1} orders",
customer.CustomerID, customer.SalesOrderHeaders.Count);
}
txScope.Complete();
}
}
For as long as the TransactionScope is active (i.e., until it is disposed at the end of the
using block), all the requests to the database this code makes will be part of the same
transaction, and so the results should be consistent—any other database client that
tries to modify the state we’re looking at will be made to wait (or we’ll be made to wait
for them) in order to guarantee consistency. The call to Complete at the end indicates
that we have finished all the work in the transaction, and are happy for it to commit—
without this, the transaction would be aborted at the end of the scope’s using block.
578 | Chapter 14: Databases
For a transaction that modifies data, failure to call Complete will lose any changes. Since
the transaction in Example 14-20 only reads data, this might not cause any visible
problems, but it’s difficult to be certain. If a TransactionScope was already active on
this thread (e.g., a function farther up the call stack started one) our Transaction
Scope could join in with the same transaction, at which point failure to call Complete
on our scope would end up aborting the whole thing, possibly losing data. The documentation recommends calling Complete for all transactions except those you want to
abort, so it’s a good practice always to call it.
Transaction Length
When transactions conflict because multiple clients want to use the same data, the
database may have no choice but to make one or more of the clients wait. This means
you should keep your transaction lifetimes as short as you possibly can—slow transactions can bog down the system. And once that starts happening, it becomes a bit of
a pile-up—the more transactions that are stuck waiting for something else to finish,
the more likely it is that new transactions will want to use data that’s already under
contention. The rosy “best of both worlds” picture painted earlier evaporates.
Worse, conflicts are sometimes irreconcilable—a database doesn’t know at the start of
a transaction what information will be used, and sometimes it can find itself in a place
where it cannot proceed without returning results that will look inconsistent, in which
case it’ll just fail with an error. (In other words, the clever tricks databases use to minimize how often transactions block sometimes backfire.) It’s easy enough to contrive
pathological code that does this on purpose, but you hope not to see it in a live system.
The shorter you make your transactions the less likely you are to see troublesome
conflicts.
You should never start a transaction and then wait for user input before finishing the
transaction—users have a habit of going to lunch mid-transaction. Transaction duration should be measured in milliseconds, not minutes.
TransactionScope represents an implicit transaction—any data access performed inside
its using block will automatically be enlisted on the transaction. That’s why Example 14-20 never appears to use the TransactionScope it creates—it’s enough for it to
exist. (The transaction system keeps track of which threads have active implicit transactions.) You can also work with transactions explicitly—the object context provides
a Connection property, which in turn offers explicit BeginTransaction and EnlistTran
saction methods. You can use these in advanced scenarios where you might need to
control database-specific aspects of the transaction that an implicit transaction cannot
reach.
Object Context | 579
These transaction models are not specific to the EF. You can use the
same techniques with ADO.NET v1-style data access code.
Besides enabling isolation of multiple concurrent operations, transactions provide another very useful property: atomicity. This means that the operations within a single
transaction succeed or fail as one: all succeed, or none of them succeed—a transaction
is indivisible in that it cannot complete partially. The database stores updates performed within a transaction provisionally until the transaction completes—if it succeeds, the updates are permanently committed, but if it fails, they are rolled back and
it’s as though the updates never occurred. The EF uses transactions automatically when
you call SaveChanges—if you have not supplied a transaction, it will create one just to
write the updates. (If you have supplied one, it’ll just use yours.) This means that
SaveChanges will always either succeed completely, or have no effect at all, whether or
not you provide a transaction.
Transactions are not the only way to solve problems of concurrent access to shared
data. They are bad at handling long-running operations. For example, consider a system
for booking seats on a plane or in a theater. End users want to see what seats are
available, and will then take some time—minutes probably—to decide what to do. It
would be a terrible idea to use a transaction to handle this sort of scenario, because
you’d effectively have to lock out all other users looking to book into the same flight
or show until the current user makes a decision. (It would have this effect because in
order to show available seats, the transaction would have had to inspect the state of
every seat, and could potentially change the state of any one of those seats. So all those
seats are, in effect, owned by that transaction until it’s done.)
Let’s just think that through. What if every person who flies on a particular flight takes
two minutes to make all the necessary decisions to complete his booking? (Hours of
queuing in airports and observing fellow passengers lead us to suspect that this is a
hopelessly optimistic estimate. If you know of an airline whose passengers are that
competent, please let us know—we’d like to spend less time queuing.) The Airbus A380
aircraft has FAA and EASA approval to carry 853 passengers, which suggests that even
with our uncommonly decisive passengers, that’s still a total of more than 28 hours of
decision making for each flight. That sounds like it could be a problem for a daily
flight.# So there’s no practical way of avoiding having to tell the odd passenger that,
sorry, in between showing him the seat map and choosing the seat, someone else got
in there first. In other words, we are going to have to accept that sometimes data will
#And yes, bookings for daily scheduled flights are filled up gradually over the course of a few months, so 28
hours per day is not necessarily a showstopper. Even so, forcing passengers to wait until nobody else is
choosing a seat would be problematic—you’d almost certainly find that your customers didn’t neatly space
out their usage of the system, and so you’d get times where people wanting to book would be unable to.
Airlines would almost certainly lose business the moment they told customers to come back later.
580 | Chapter 14: Databases
change under our feet, and that we just have to deal with it when it happens. This
requires a slightly different approach than transactions.
Optimistic Concurrency
Optimistic concurrency describes an approach to concurrency where instead of enforcing isolation, which is how transactions usually work, we just make the cheerful
assumption that nothing’s going to go wrong. And then, crucially, we verify that assumption just before making any changes.
In practice, it’s common to use a mixture of optimistic concurrency and
transactions. You might use optimistic approaches to handle longrunning logic, while using short-lived transactions to manage each individual step of the process.
For example, an airline booking system that shows a map of available seats in an aircraft
on a web page would make the optimistic assumption that the seat the user selects will
probably not be selected by any other user in between the moment at which the application showed the available seats and the point at which the user picks a seat. The
advantage of making this assumption is that there’s no need for the system to lock
anyone else out—any number of users can all be looking at the seat map at once, and
they can all take as long as they like.
Occasionally, multiple users will pick the same seat at around the same time. Most of
the time this won’t happen, but the occasional clash is inevitable. We just have to make
sure we notice. So when the user gets back to us and says that he wants seat 7K, the
application then has to go back to the database to see if that seat is in fact still free. If
it is, the application’s optimism has been vindicated, and the booking can proceed. If
not, we just have to apologize to the user (or chastise him for his slowness, depending
on the prevailing attitude to customer service in your organization), show him an updated seat map so that he can see which seats have been claimed while he was dithering,
and ask him to make a new choice. This will happen only a small fraction of the time,
and so it turns out to be a reasonable solution to the problem—certainly better than a
system that is incapable of taking enough bookings to fill the plane in the time available.
Sometimes optimistic concurrency is implemented in an application-specific way. The
example just described relies on an understanding of what the various entities involved
mean, and would require us to write code that explicitly performs the check described.
But slightly more general solutions are available—they are typically less efficient, but
they can require less code. The EF offers some of these ignorant-but-effective approaches to optimistic concurrency.
The default EF behavior seems, at a first glance, to be ignorant and broken—not only
does it optimistically assume that nothing will go wrong, but it doesn’t even do anything
to check that assumption. We might call this blind optimism—we don’t even get to
Object Context | 581
discover when our optimism turned out to be unfounded. While that sounds bad, it’s
actually the right thing to do if you’re using transactions—transactions enforce isolation and so additional checks would be a waste of time. But if you’re not using transactions, this default behavior is not good enough for code that wants to change or add
data—you’ll risk compromising the integrity of your application’s state.
To get the EF to check that updates are likely to be sound, you can tell it to check that
certain entity properties have not changed since the entity was populated from the
database. For example, in the SalesOrderDetail entity, if you select the ModifiedDate
property in the EDM designer, you could go to the Properties panel and set its Concurrency Mode to Fixed (its default being None). This will cause the EF to check that
this particular column’s value is the same as it was when the entity was fetched whenever you update it. And as long as all the code that modifies this particular table remembers to update the ModifiedDate, you’ll be able to detect when things have changed.
While this example illustrates the concept, it’s not entirely robust. Using
a date and time to track when a row changes has a couple of problems.
First, different computers in the system are likely to have slight differences between their clocks, which can lead to anomalies. And even if
only one computer ever accesses the database, its clock may be adjusted
from time to time. You’d end up wanting to customize the SQL code
used for updates so that everything uses the database server’s clock for
consistency. Such customizations are possible, but they are beyond the
scope of this book. And even that might not be enough—if the row is
updated often, it’s possible that two updates might have the same timestamp due to insufficient precision. A stricter approach based on GUIDs
or sequential row version numbers is more robust. But this is the realm
of database design, rather than Entity Framework usage—ultimately
you’re going to be stuck with whatever your DBA gives you.
If any of the columns with a Concurrency Mode of Fixed change between reading an
entity’s value and attempting to update it, the EF will detect this when you call
SaveChanges and will throw an OptimisticConcurrencyException, instead of completing
the update.
The EF detects changes by making the SQL UPDATE conditional—its
WHERE clause will include checks for all of the Fixed columns. It inspects
the updated row count that comes back from the database to see
whether the update succeeded.
How you deal with an optimistic concurrency failure is up to your application—you
might simply be able to retry the work, or you may have to get the user involved. It will
depend on the nature of the data you’re trying to update.
582 | Chapter 14: Databases
The object context provides a Refresh method that you can call to bring entities back
into sync with the current state of the rows they represent in the database. You could
call this after catching an OptimisticConcurrencyException as the first step in your code
that recovers from a problem. (You’re not actually required to wait until you get a
concurrency exception—you’re free to call Refresh at any time.) The first argument to
Refresh tells it what you’d like to happen if the database and entity are out of sync.
Passing RefreshMode.StoreWins tells the EF that you want the entity to reflect what’s
currently in the database, even if that means discarding updates previously made in
memory to the entity. Or you can pass RefreshMode.ClientWins, in which case any
changes in the entity remain present in memory. The changes will not be written back
to the database until you next call SaveChanges. So the significance of calling Refresh
in ClientWins mode is that you have, in effect, acknowledged changes to the underlying
database—if changes in the database were previously causing SaveChanges to throw an
OptimisticConcurrencyException, calling SaveChanges again after the Refresh will not
throw again (unless the database changes again in between the call to Refresh and the
second SaveChanges).
Context and Entity Lifetime
If you ask the context object for the same entity twice, it will return you the same object
both times—it remembers the identity of the entities it has returned. Even if you use
different queries, it will not attempt to load fresh data for any entities already loaded
unless you explicitly pass them to the Refresh method.
Executing the same LINQ query multiple times against the same context
will still result in multiple queries being sent to the database. Those
queries will typically return all the current data for the relevant entity.
But the EF will look at primary keys in the query results, and if they
correspond to entities it has already loaded, it just returns those existing
entities and won’t notice if their values in the database have changed.
It looks for changes only when you call either SaveChanges or Refresh.
This raises the question of how long you should keep an object context around. The
more entities you ask it for, the more objects it’ll hang on to. Even when your code has
finished using a particular entity object, the .NET Framework’s garbage collector won’t
be able to reclaim the memory it uses for as long as the object context remains alive,
because the object context keeps hold of the entity in case it needs to return it again in
a later query.
The way to get the object context to let go of everything is to call
Dispose. This is why all of the examples that show the creation of an
object context do so in a using statement.
Object Context | 583
There are other lifetime issues to bear in mind. In some situations, an object context
may hold database connections open. And also, if you have a long-lived object context,
you may need to add calls to Refresh to ensure that you have fresh data, which you
wouldn’t have to do with a newly created object context. So all the signs suggest that
you don’t want to keep the object context around for too long.
How long is too long? In a web application, if you create an object context while handling a request (e.g., for a particular page) you would normally want to Dispose it before
the end of that request—keeping an object context alive across multiple requests is
typically a bad idea. In a Windows application (WPF or Windows Forms), it might
make sense to keep an object context alive a little longer, because you might want to
keep entities around while a form for editing the data in them is open. (If you want to
apply updates, you normally use the same object context you used when fetching the
entities in the first place, although it’s possible to detach an entity from one context
and attach it later to a different one.) In general, though, a good rule of thumb is to
keep the object context alive for no longer than is necessary.
WCF Data Services
The last data access feature we’ll look at is slightly different from the rest. So far, we’ve
seen how to write code that uses data in a program that can connect directly to a
database. But WCF Data Services lets you present data over HTTP, making data access
possible from code in some scenarios where direct connections are not possible. It
defines a URI structure for identifying the data you’d like to access, and the data itself
can be represented in either JSON or the XML-based Atom Publishing Protocol
(AtomPub).
As the use of URIs, JSON, and XML suggests, WCF Data Services can be useful in web
applications. Silverlight cannot access databases directly, but it can consume data via
WCF Data Services. And the JSON support means that it’s also relatively straightforward for script-based web user interfaces to use.
WCF Data Services is designed to work in conjunction with the Entity Framework.
You don’t just present an entire database over HTTP—that would be a security liability.
Instead, you define an Entity Data Model, and you can then configure which entity
types should be accessible over HTTP, and whether they are read-only or support other
operations such as updates, inserts, or deletes. And you can add code to implement
further restrictions based on authentication and whatever security policy you require.
(Of course, this still gives you plenty of scope for creating a security liability. You need
to think carefully about exactly what information you want to expose.)
To show WCF Data Services in action, we’ll need a web application, because it’s an
HTTP-based technology. If you create a new project in Visual Studio, you’ll see a Visual
C#→Web category on the left, and the Empty ASP.NET Web Application template will
suit our needs here. We need an Entity Data Model to define what information we’d
584 | Chapter 14: Databases