Python UrlLib -
How to handle 403 Forbidden error
Introduction-
In this article we will look into the reason behind getting a 403 Error and the steps through which you can deal with this error.
403 Forbidden Error-
1. This error arises when you successfully make a connection with the server, but the server chooses not to respond to these requests.
2. The reason server chooses to do so is that the several website owner do not want a bot or a program to access their website, they want only real-users to use their services.
3. Hence when to you try to post some data using a python program,many websites detect that it a computer program and not a real user trying to login and refuse to respond to your request.
In this article, we will show you how you can bypass that detection and access the site.
Example-
Consider the following code which will gives a 403 Forbidden, we try to post a search request on Google using our python program.
import urllib.request
class Post():
def __init__(self, url):
self.url = url
def post_method(self):
try:
req = urllib.request.urlopen(self.url)
print(req.read())
except Exception as e:
print(str(e))
def main():
url = 'https://www.google.com/search?q=test'
post_object = Post(url)
post_object.post_method()
if __name__ == "__main__":
main()
Output-
HTTP Error 403: Forbidden
As mentioned before, the site recognises that a computer program to trying to post and raises a 403 error.
How to overcome the error-
1. In order to perform a post request we have to make the site believe that a real-user is trying to access the site and this can be done using the user-agent while trying to make a post request.
2. The user-agent contains information about the browser along with other information that lets you make that post request. We store the user-agent in a dictionary and provide it to the headers parameter of the Request class.
3. The following code will give you an idea about how to make the post request.
Code-
import urllib.request
class ImprovedPost():
def __init__(self, url, headers):
self.url = url
self.headers = headers
def improved_post_method(self):
try:
#Making the post request
request = urllib.request.Request(self.url, headers = self.headers)
response = urllib.request.urlopen(request)
#Reading the response from the site.
data = response.read()
#Writing the response in string format in ResponseData file
response_file = open('ResponseData.txt', 'w')
response_file.write(str(data))
response_file.close()
print("Data Successfully Saved!")
except Exception as e:
print(str(e))
def main():
#URL we want to access
url = 'https://www.google.com/search?q=test'
#The user-agent stored in the headers dictionary
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
post = ImprovedPost(url, headers) #Creating the object
post.improved_post_method() #Calling the method
if __name__ == "__main__":
main()
Output-
Data Successfully Saved!
Fig-1
Fig-2
As you can see we have a text file saved in the location as the program, the text contains the entire source code of the url we made a request to.
We have successfully downloaded the source code in the url given, we store the source code in a text file in the location as the program.
You can see the headers dictionary has the user-agent stored in it, you can get the user-agent using the following steps.
1. Open your browser and inspect the web-page(right-click + inspect).
2. Go to the network section and click on the first request that occurred.
3. In the headers section, scroll down in the end you will be able to see the user-agent just copy and paste.
So with these steps, now you can deal with 403 Forbidden Error efficiently.
Comments