Introduction to Url-Lib Module
Pre-requisite for Urllib:
1. Before delving with the various options in url-lib, it is expected that the readers have basic knowledge about python. urllib in Python3 is slightly different from urllib2 in Python2, so any knowledge in that module would be helpful for understanding the article even more.
1. Urllib module is the URL handling module provide by python.
2. This module provides a high-level interface for retrieving data across the World Wide Web(www).
3. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.
4. The urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.
Through urllib you can do following things-
- Can access websites.
- Read data from websites.
- Parse data.
- Modify headers.
- do any GET and POST requests.
As urllib is a standard library it is already present in the environment, you just need to import the package.
If it is not present in the environment then use the following code
>>>pip install urllib
Urllib vs Urllib2 vs Request:
urllib and urllib2 are both Python modules that provide URL request functionalities but offer different functionalities.
1. urllib2 accepts a Request object to set the headers for a URL request,but urllib does not have any option to accept Request it accepts only a URL.
2. urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't possess any such function.
3. It is one of the reasons why urllib is often used along with urllib2.
3. Python Requests has the ability to encode the parameters automatically so one can just pass them as simple arguments, unlike in the case of urllib, where you need to call the method urllib.encode() for the parameters to get encoded.
Urllib sub-modules includes:
urllib is a package that combines several modules for working with URLs-
1. urllib.requests - It is used for opening and reading the URLs
2. urllib.error - It contains the exceptions raised by urllib.requests.
3. urllib.parse - Used for parsing URLs.
4. urllib.robotparser - It is used for parsing robots.txt files.
Let's look at a sample program.
request_url = urllib.request.urlopen('https://cppsecrets.com/')
The piece of code outputs the source code of the URL i.e. cppsecrets.com