urllib.request.urlretrieve()
As we have previously seen that urllib.request is a Python Module to open and fetch the HTML content of any URL.
A simple code demonstration to open any URL using urllib.request.urlopen()
import urllib.request
with urllib.request.urlopen('http://cppsecrets.com') as response:
res = response.read()
print(res)
Printing out the res would print out the HTML content of the entire URL.
But what if we want to extract resources (for example we want to fetch all the images present in a particular URL)?
For this purpose, Urllib.request module contains a method known as urlretrieve that can be used to extract the resources out of any URL over the web.
Syntax:
urllib.request.
urlretrieve
(url, filename=None, reporthook=None, data=None)
Parameters:
Note : urlretrieve()
will raise ContentTooShortError
when it detects that the amount of data available was less than the expected amount (which is the size reported by a Content-Length header). This can occur, for example, when the download is interrupted.
Now let's see the working of urlretrieve through the following code example.
We are going to fetch the top 3 images of Baby Yoda from the subreddit r/BabyYoda.
from bs4 import BeautifulSoup
import urllib.request
# Setting URL destination
url = "https://www.reddit.com/r/BabyYoda"
# Fetching Url
response = urllib.request.urlopen(url)
# Checking status code (if you get 502, try rerunning the code)
if response.getcode() != 200:
print(f"Status: {response.getcode()} %u2014 Try rerunning the code\n")
else:
print(f"Status: {response.getcode()}\n")
# Using BeautifulSoup to parse the response object
soup = BeautifulSoup(response.read())
# Finding Post images in the soup
images = soup.find_all("img", attrs={"alt":"Post image"})
# downloading images
number = 0
for image in images[:3]:
print(image["src"])
image_src = image["src"]
urllib.request.urlretrieve(image_src, str(number))
Output:
Images would be saved in a temporary location is your disk.
Status: 200
Image 1:
https://preview.redd.it/6a8ej9umtd171.jpg?width=640&crop=smart&auto=webp&s=f719b5d996b3742fe4935ed75f31280119edfd2b
Image 2:
https://preview.redd.it/kytll23s69171.jpg?width=640&crop=smart&auto=webp&s=e9df7ace6c3bd2e2dedd6ab58eef00921962130a
Image 3:
https://preview.redd.it/yll4jcopzb171.jpg?width=640&crop=smart&auto=webp&s=eb6bbcbc4c110668518ae924fd49e88c34db5c52
Comments