Python provides us with the Urllib module which is basically, the URL handling module and is used to fetch the URLs (Uniform Resource Locators) over the web.
This opens up as many doors for your programs as the internet opens up for you. urllib in Python 3 is slightly different than urllib2 in Python 2, but they are mostly the same.
Through urllib, you can access websites, download data, parse data, modify your headers, and do any GET and POST requests you might need to do.
urllib is a package as a whole and provides us with several modules for working with URLs, such as
- urllib.request for opening and reading.
- urllib.parse for parsing URLs
- urllib.error for the exceptions raised
- urllib.robotparser for parsing robot.txt files
If urllib is not present in your environment, then the following code can be executed to install this package.
pip install urllib
The urllib.request module defines the functions and classes which helps in opening URLs (mostly HTTP).
This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations - like basic authentication, cookies, proxies, and so on. These are
provided by objects called handlers and openers.
urllib.request supports fetching URLs for many "URL schemes" (identified by the string before the ":" in URL - for example "ftp" is the URL scheme of "ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP). However, the most common case it is used for is HTTP.
Urllib.request can be imported by using the following code:
We can fetch and open any URL simply by urlopen() which is demonstrated using the following code
request_url = https://cppsecrets.com/')
The resultant of this code would be an HTTP response object.
We will discuss about the urlopen() method in more detail in the next article.