Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Advanced PHP Programming- P9 pdf
Nội dung xem thử
Mô tả chi tiết
378 Chapter 15 Building a Distributed Environment
Client X Client Y
Server B
Newly
Cached Older Cache
Server A
Client X get a fresh
copy of Joe's page
Client Y gets a stale
copy of Joe's page
Figure 15.6 Stale cache data resulting in inconsistent cluster behavior.
Centralized Caches
One of the easiest and most common techniques for guaranteeing cache consistency is to
use a centralized cache solution. If all participants use the same set of cache files, most of
the worries regarding distributed caching disappear (basically because the caching is no
longer completely distributed—just the machines performing it are).
Network file shares are an ideal tool for implementing a centralized file cache. On Unix
systems the standard tool for doing this is NFS. NFS is a good choice for this application
for two main reasons:
n NFS servers and client software are bundled with essentially every modern Unix
system.
n Newer Unix systems supply reliable file-locking mechanisms over NFS, meaning
that the cache libraries can be used without change.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching in a Distributed Environment 379
Figure 15.7 Inconsistent cached session data breaking shopping carts.
The real beauty of using NFS is that from a user level, it appears no different from any
other filesystem, so it provides a very easy path for growing a cache implementation
from a single file machine to a cluster of machines.
If you have a server that utilizes /cache/www.foo.com as its cache directory, using the
Cache_File module developed in Chapter 10,“Data Component Caching,” you can
extend this caching architecture seamlessly by creating an exportable directory /shares/
cache/www.foo.com on your NFS server and then mounting it on any interested
machine as follows:
Joe
Joe
Server A
Shopping
Cart A
Shopping
Cart A
Server B
Shopping
Cart B
Server A
Empty Cart
Server B
Joe starts his shopping cart on A
When Joe gets served by B
he gets a brand new cart.
Cart A is not merged into B.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
380 Chapter 15 Building a Distributed Environment
#/etc/fstab
nfs-server:/shares/cache/www.foo.com /cache/www.foo.com nfs rw,noatime - -
Then you can mount it with this:
# mount –a
These are the drawbacks of using NFS for this type of task:
n It requires an NFS server. In most setups, this is a dedicated NFS server.
n The NFS server is a single point of failure.A number of vendors sell enterprisequality NFS server appliances.You can also rather easily build a highly available
NFS server setup.
n The NFS server is often a performance bottleneck.The centralized server must
sustain the disk input/output (I/O) load for every Web server’s cache interaction
and must transfer that over the network.This can cause both disk and network
throughput bottlenecks.A few recommendations can reduce these issues:
n Mount your shares by using the noatime option.This turns off file metadata
updates when a file is accessed for reads.
n Monitor your network traffic closely and use trunked Ethernet/Gigabit
Ethernet if your bandwidth grows past 75Mbps.
n Take your most senior systems administrator out for a beer and ask her to
tune the NFS layer. Every operating system has its quirks in relationship to
NFS, so this sort of tuning is very difficult. My favorite quote in regard to
this is the following note from the 4.4BSD man pages regarding NFS
mounts:
Due to the way that Sun RPC is implemented on top of UDP (unreliable
datagram) transport, tuning such mounts is really a black art that can
only be expected to have limited success.
Another option for centralized caching is using an RDBMS.This might seem completely antithetical to one of our original intentions for caching—to reduce the load on the
database—but that isn’t necessarily the case. Our goal throughout all this is to eliminate
or reduce expensive code, and database queries are often expensive. Often is not always,
however, so we can still effectively cache if we make the results of expensive database
queries available through inexpensive queries.
Fully Decentralized Caches Using Spread
A more ideal solution than using centralized caches is to have cache reads be completely
independent of any central service and to have writes coordinate in a distributed fashion
to invalidate all cache copies across the cluster.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching in a Distributed Environment 381
To achieve this, you can use Spread, a group communication toolkit designed at the
Johns Hopkins University Center for Networking and Distributed Systems to provide an
extremely efficient means of multicast communication between services in a cluster with
robust ordering and reliability semantics. Spread is not a distributed application in itself;
it is a toolkit (a messaging bus) that allows the construction of distributed applications.
The basic architecture plan is shown in Figure 15.8. Cache files will be written in a
nonversioned fashion locally on every machine.When an update to the cached data
occurs, the updating application will send a message to the cache Spread group. On
every machine, there is a daemon listening to that group.When a cache invalidation
request comes in, the daemon will perform the cache invalidation on that local machine.
Figure 15.8 A simple Spread ring.
This methodology works well as long as there are no network partitions.A network partition event occurs whenever a machine joins or leaves the ring. Say, for example, that a
machine crashes and is rebooted. During the time it was down, updates to cache entries
may have changed. It is possible, although complicated, to build a system using Spread
whereby changes could be reconciled on network rejoin. Fortunately for you, the nature
of most cached information is that it is temporary and not terribly painful to re-create.
You can use this assumption and simply destroy the cache on a Web server whenever the
cache maintenance daemon is restarted.This measure, although draconian, allows you to
easily prevent usage of stale data.
Please purchase PDF Split-Merge on www.verypdf.com to remove this waterma
382 Chapter 15 Building a Distributed Environment
To implement this strategy, you need to install some tools.To start with, you need to
download and install the Spread toolkit from www.spread.org. Next, you need to install
the Spread wrapper from PEAR:
# pear install spread
The Spread wrapper library is written in C, so you need all the PHP development tools
installed to compile it (these are installed when you build from source). So that you can
avoid having to write your own protocol, you can use XML-RPC to encapsulate your
purge requests.This might seem like overkill, but XML-RPC is actually an ideal choice:
It is much lighter-weight than a protocol such as SOAP, yet it still provides a relatively
extensible and “canned” format, which ensures that you can easily add clients in other
languages if needed (for example, a standalone GUI to survey and purge cache files).
To start, you need to install an XML-RPC library.The PEAR XML-RPC library
works well and can be installed with the PEAR installer, as follows:
# pear install XML_RPC
After you have installed all your tools, you need a client.You can augment the
Cache_File class by using a method that allows for purging data:
require_once ‘XML/RPC.php’;
class Cache_File_Spread extends File {
private $spread;
Spread works by having clients attach to a network of servers, usually a single server per
machine. If the daemon is running on the local machine, you can simply specify the port
that it is running on, and a connection will be made over a Unix domain socket.The
default Spread port is 4803:
private $spreadName = ‘4803’;
Spread clients join groups to send and receive messages on. If you are not joined to a
group, you will not see any of the messages for it (although you can send messages to a
group you are not joined to). Group names are arbitrary, and a group will be automatically created when the first client joins it.You can call your group xmlrpc:
private $spreadGroup = ‘xmlrpc’;
private $cachedir = ‘/cache/’;
public function _ _construct($filename, $expiration=false)
{
parent::_ _construct($filename, $expiration);
You create a new Spread object in order to have the connect performed for you automatically:
$this->spread = new Spread($this->spreadName);
}
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.