Python urllib2 : How to change the host to connect to... without having the API change the url
By AkH, 2 years, 12 months ago, modified Sept. 12, 2007
I've spend an amazing time to find this.
Urllib2 is a high python API to retrieve files by http ftp ... It's really usefull cause you can use a proxy on it, or change the headers to simulate a special terminal ect.
But high API has a cost, changing low level things is a real mess.
Here is how to have a request that connect to hostname A and ask for Host: http://B/...
change_host = ConnectHTTPHandler()
opener = urllib2.build_opener(change_host)
urllib2.install_opener(opener)
req = urllib2.Request(URL,TXDATA ,TXHEADERS)
req.false_hostname = "localhost"
u = urllib2.urlopen(req)
# this a normal sequence to force urllib to use your own handler
# here is the code of the ConnectHTTPHandler
class ConnectHTTPHandler(urllib2.HTTPHandler):
def do_open(self, http_class, req):
"""Return an addinfourl object for the request, using http_class.
http_class must implement the HTTPConnection API from httplib.
The addinfourl return value is a file-like object. It also
has methods and attributes including:
- info(): return a mimetools.Message object for the headers
- geturl(): return the original request URL
- code: HTTP status code
"""
host = req.false_hostname
if not host:
raise urllib2.URLError('no host given')
h = http_class(host) # will parse host:port
h.set_debuglevel(self._debuglevel)
headers = dict(req.headers)
headers.update(req.unredirected_hdrs)
# We want to make an HTTP/1.1 request, but the addinfourl
# class isn't prepared to deal with a persistent connection.
# It will try to read all remaining data from the socket,
# which will block while the server waits for the next request.
# So make sure the connection gets closed after the (only)
# request.
headers["Connection"] = "close"
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
r = h.getresponse()
except socket.error, err: # XXX what error?
raise urllib2.URLError(err)
# Pick apart the HTTPResponse object to get the addinfourl
# object initialized properly.
# Wrap the HTTPResponse object in socket's file object adapter
# for Windows. That adapter calls recv(), so delegate recv()
# to read(). This weird wrapping allows the returned object to
# have readline() and readlines() methods.
# XXX It might be better to extract the read buffering code
# out of socket._fileobject() and into a base class.
r.recv = r.read
fp = socket._fileobject(r)
resp = addinfourl(fp, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp


Comments
Man’s dearest possession is life. http://www.edhardy-discount.com ed hardy caps It is given to him but once, http://www.gobizfashion.com cheap Polo t shirts and he must live it so as to feel no torturing regrets for wasted years, http://www.gobizfashion.com/wholesale-men-shoes discount Men Shoes never know the burning shame of a mean and petty past; http://www.chinadesignerwholesale.com/china-shoes-wholesale china shoes wholesale so live that, dying, he might say: http://www.chinadesignerwholesale.com Designer Wholesale all my life, http://www.edhardy-discount.com/discount-ED-HARDY-for-sale ED HARDY on sale all my strength were given to the finest cause in all the world—the fight for the Liberation of Mankind.http://www.usabuysale.com/Womens-Clothing-2321 discount womens clothing