Tài liệu Advanced PHP Programming- P9 pdf

378 Chapter 15 Building a Distributed Environment

Client X Client Y

Server B

Newly

Cached Older Cache

Server A

Client X get a fresh

copy of Joe's page

Client Y gets a stale

copy of Joe's page

Figure 15.6 Stale cache data resulting in inconsistent cluster behavior.

Centralized Caches

One of the easiest and most common techniques for guaranteeing cache consistency is to

use a centralized cache solution. If all participants use the same set of cache files, most of

the worries regarding distributed caching disappear (basically because the caching is no

longer completely distributed—just the machines performing it are).

Network file shares are an ideal tool for implementing a centralized file cache. On Unix

systems the standard tool for doing this is NFS. NFS is a good choice for this application

for two main reasons:

n NFS servers and client software are bundled with essentially every modern Unix

system.

n Newer Unix systems supply reliable file-locking mechanisms over NFS, meaning

that the cache libraries can be used without change.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Caching in a Distributed Environment 379

Figure 15.7 Inconsistent cached session data breaking shopping carts.

The real beauty of using NFS is that from a user level, it appears no different from any

other filesystem, so it provides a very easy path for growing a cache implementation

from a single file machine to a cluster of machines.

If you have a server that utilizes /cache/www.foo.com as its cache directory, using the

Cache_File module developed in Chapter 10,“Data Component Caching,” you can

extend this caching architecture seamlessly by creating an exportable directory /shares/

cache/www.foo.com on your NFS server and then mounting it on any interested

machine as follows:

Joe

Server A

Shopping

Cart A

Shopping

Cart A

Server B

Shopping

Cart B

Server A

Empty Cart

Server B

Joe starts his shopping cart on A

When Joe gets served by B

he gets a brand new cart.

Cart A is not merged into B.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

380 Chapter 15 Building a Distributed Environment

#/etc/fstab

nfs-server:/shares/cache/www.foo.com /cache/www.foo.com nfs rw,noatime - -

Then you can mount it with this:

# mount –a

These are the drawbacks of using NFS for this type of task:

n It requires an NFS server. In most setups, this is a dedicated NFS server.

n The NFS server is a single point of failure.A number of vendors sell enterprisequality NFS server appliances.You can also rather easily build a highly available

NFS server setup.

n The NFS server is often a performance bottleneck.The centralized server must

sustain the disk input/output (I/O) load for every Web server’s cache interaction

and must transfer that over the network.This can cause both disk and network

throughput bottlenecks.A few recommendations can reduce these issues:

n Mount your shares by using the noatime option.This turns off file metadata

updates when a file is accessed for reads.

n Monitor your network traffic closely and use trunked Ethernet/Gigabit

Ethernet if your bandwidth grows past 75Mbps.

n Take your most senior systems administrator out for a beer and ask her to

tune the NFS layer. Every operating system has its quirks in relationship to

NFS, so this sort of tuning is very difficult. My favorite quote in regard to

this is the following note from the 4.4BSD man pages regarding NFS

mounts:

Due to the way that Sun RPC is implemented on top of UDP (unreliable

datagram) transport, tuning such mounts is really a black art that can

only be expected to have limited success.

Another option for centralized caching is using an RDBMS.This might seem completely antithetical to one of our original intentions for caching—to reduce the load on the

database—but that isn’t necessarily the case. Our goal throughout all this is to eliminate

or reduce expensive code, and database queries are often expensive. Often is not always,

however, so we can still effectively cache if we make the results of expensive database

queries available through inexpensive queries.

Fully Decentralized Caches Using Spread

A more ideal solution than using centralized caches is to have cache reads be completely

independent of any central service and to have writes coordinate in a distributed fashion

to invalidate all cache copies across the cluster.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Caching in a Distributed Environment 381

To achieve this, you can use Spread, a group communication toolkit designed at the

Johns Hopkins University Center for Networking and Distributed Systems to provide an

extremely efficient means of multicast communication between services in a cluster with

robust ordering and reliability semantics. Spread is not a distributed application in itself;

it is a toolkit (a messaging bus) that allows the construction of distributed applications.

The basic architecture plan is shown in Figure 15.8. Cache files will be written in a

nonversioned fashion locally on every machine.When an update to the cached data

occurs, the updating application will send a message to the cache Spread group. On

every machine, there is a daemon listening to that group.When a cache invalidation

request comes in, the daemon will perform the cache invalidation on that local machine.

Figure 15.8 A simple Spread ring.

This methodology works well as long as there are no network partitions.A network partition event occurs whenever a machine joins or leaves the ring. Say, for example, that a

machine crashes and is rebooted. During the time it was down, updates to cache entries

may have changed. It is possible, although complicated, to build a system using Spread

whereby changes could be reconciled on network rejoin. Fortunately for you, the nature

of most cached information is that it is temporary and not terribly painful to re-create.

You can use this assumption and simply destroy the cache on a Web server whenever the

cache maintenance daemon is restarted.This measure, although draconian, allows you to

easily prevent usage of stale data.

Please purchase PDF Split-Merge on www.verypdf.com to remove this waterma

382 Chapter 15 Building a Distributed Environment

To implement this strategy, you need to install some tools.To start with, you need to

download and install the Spread toolkit from www.spread.org. Next, you need to install

the Spread wrapper from PEAR:

# pear install spread

The Spread wrapper library is written in C, so you need all the PHP development tools

installed to compile it (these are installed when you build from source). So that you can

avoid having to write your own protocol, you can use XML-RPC to encapsulate your

purge requests.This might seem like overkill, but XML-RPC is actually an ideal choice:

It is much lighter-weight than a protocol such as SOAP, yet it still provides a relatively

extensible and “canned” format, which ensures that you can easily add clients in other

languages if needed (for example, a standalone GUI to survey and purge cache files).

To start, you need to install an XML-RPC library.The PEAR XML-RPC library

works well and can be installed with the PEAR installer, as follows:

# pear install XML_RPC

After you have installed all your tools, you need a client.You can augment the

Cache_File class by using a method that allows for purging data:

require_once ‘XML/RPC.php’;

class Cache_File_Spread extends File {

private $spread;

Spread works by having clients attach to a network of servers, usually a single server per

machine. If the daemon is running on the local machine, you can simply specify the port

that it is running on, and a connection will be made over a Unix domain socket.The

default Spread port is 4803:

private $spreadName = ‘4803’;

Spread clients join groups to send and receive messages on. If you are not joined to a

group, you will not see any of the messages for it (although you can send messages to a

group you are not joined to). Group names are arbitrary, and a group will be automatically created when the first client joins it.You can call your group xmlrpc:

private $spreadGroup = ‘xmlrpc’;

private $cachedir = ‘/cache/’;

public function _ _construct($filename, $expiration=false)

{

parent::_ _construct($filename, $expiration);

You create a new Spread object in order to have the connect performed for you automatically:

$this->spread = new Spread($this->spreadName);

}

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Thư viện tri thức trực tuyến

Tài liệu Advanced PHP Programming- P9 pdf

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Tài liệu Advanced IP Features pdf

Tài liệu Advanced IP Features docx

Tài liệu Advanced Digital Signal Processing and Noise Reduction P2 ppt

Tài liệu Advanced Vehicle Technology P1 ppt

Tài liệu Advanced Modern Algebra by Joseph J. Rotman Hardcover: 1040 pages Publisher: Prentice

Tài liệu Advanced Linux Programming: 4-Threads docx