Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Dive Into Python-Chapter 11. HTTP Web Services doc
Nội dung xem thử
Mô tả chi tiết
Chapter 11. HTTP Web Services
11.1. Diving in
You've learned about HTML processing and XML processing, and along the
way you saw how to download a web page and how to parse XML from a
URL, but let's dive into the more general topic of HTTP web services.
Simply stated, HTTP web services are programmatic ways of sending and
receiving data from remote servers using the operations of HTTP directly. If
you want to get data from the server, use a straight HTTP GET; if you want
to send new data to the server, use HTTP POST. (Some more advanced
HTTP web service APIs also define ways of modifying existing data and
deleting data, using HTTP PUT and HTTP DELETE.) In other words, the
“verbs” built into the HTTP protocol (GET, POST, PUT, and DELETE)
map directly to application-level operations for receiving, sending,
modifying, and deleting data.
The main advantage of this approach is simplicity, and its simplicity has
proven popular with a lot of different sites. Data -- usually XML data -- can
be built and stored statically, or generated dynamically by a server-side
script, and all major languages include an HTTP library for downloading it.
Debugging is also easier, because you can load up the web service in any
web browser and see the raw data. Modern browsers will even nicely format
and pretty-print XML data for you, to allow you to quickly navigate through
it.
Examples of pure XML-over-HTTP web services:
* Amazon API allows you to retrieve product information from the
Amazon.com online store.
* National Weather Service (United States) and Hong Kong Observatory
(Hong Kong) offer weather alerts as a web service.
* Atom API for managing web-based content.
* Syndicated feeds from weblogs and news sites bring you up-to-theminute news from a variety of sites.
In later chapters, you'll explore APIs which use HTTP as a transport for
sending and receiving data, but don't map application semantics to the
underlying HTTP semantics. (They tunnel everything over HTTP POST.)
But this chapter will concentrate on using HTTP GET to get data from a
remote server, and you'll explore several HTTP features you can use to get
the maximum benefit out of pure HTTP web services.
Here is a more advanced version of the openanything module that you saw
in the previous chapter:
Example 11.1. openanything.py
If you have not already done so, you can download this and other examples
used in this book.
import urllib2, urlparse, gzip
from StringIO import StringIO
USER_AGENT = 'OpenAnything/1.0
+http://diveintopython.org/http_web_services/'
class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_301(self, req, fp, code, msg, headers):
result = urllib2.HTTPRedirectHandler.http_error_301(
self, req, fp, code, msg, headers)
result.status = code
return result
def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPRedirectHandler.http_error_302(
self, req, fp, code, msg, headers)
result.status = code
return result
class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler):
def http_error_default(self, req, fp, code, msg, headers):
result = urllib2.HTTPError(
req.get_full_url(), code, msg, headers, fp)
result.status = code
return result
def openAnything(source, etag=None, lastmodified=None,
agent=USER_AGENT):
'''URL, filename, or string --> stream
This function lets you define parsers that take any input source
(URL, pathname to local or network file, or actual data as a string)
and deal with it in a uniform manner. Returned object is guaranteed
to have all the basic stdio read methods (read, readline, readlines).
Just .close() the object when you're done with it.
If the etag argument is supplied, it will be used as the value of an
If-None-Match request header.
If the lastmodified argument is supplied, it must be a formatted
date/time string in GMT (as returned in the Last-Modified header of
a previous request). The formatted date/time will be used
as the value of an If-Modified-Since request header.
If the agent argument is supplied, it will be used as the value of a
User-Agent request header.
'''
if hasattr(source, 'read'):
return source
if source == '-':
return sys.stdin
if urlparse.urlparse(source)[0] == 'http':
# open URL with urllib2
request = urllib2.Request(source)
request.add_header('User-Agent', agent)
if etag: