PYTHON - urllib.request.urlopen()














































PYTHON - urllib.request.urlopen()



urllib.request.urlopen()


As we have seen urllib is an extensive Python library for working with URLs.

Now we are gonna discuss about the urllib.request.urlopen() method for fetching and opening any URL.

The following code demonstrates the working of urlopen():

Firstly we need to import urllopen from urllib.request which is demonstarted using the following code:


>>> import urllib

>>> from urllib.request import urlopen


Note: Some IDE can import urllib(Spyder) directly, while some need to import urllib.request(PyCharm).

Parameters:

urllib.request.urlopen(urldata=None[timeout]*cafile=Nonecapath=Nonecadefault=Falsecontext=None)

  1. Open the URL url, which can be either a string or an Request object.
  2. data must be an object specifying additional data to be sent to the server, or None if no such data is needed. 
  3. The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, and FTP connections.
  4. The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests. cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files. More information can be found in ssl.SSLContext.load_verify_locations().
  5. The cadefault parameter is ignored.
  6. This function always returns an object which can work as a context manager and has the properties url, headers, and status.

Here we only passing the URL that needs to be fetched.

>>> response = urllib.request.urlopen('http://cppsecrets.com')

>>> print(response)

It returns an HTTP response object as shown below.

 Output:



Response Code:

>>> code = print('Response Code:{}'.format(response.getcode())

>>> print(code)

Output:



Response Code 200 indicates that our HTTP request has processed successfully.

Response Code 400 or 404 indicates a bad request i.e HTTP request is not processed.

In that case, check your internet connection and try again.


Reading the HTML content:

Response object contains a method read() which allows you to access the HTML content of the URL.

Code:

>>> print(response.read())


Output:


You can see it returns the HTML code of the URL mentioned.

Python provides us with a library called BeautifulSoup which provides a cleaner way to work with HTML content of any URL.


Comments