Python Beautiful Soup : Scrape the Redmi mobile detail from Amazon














































Python Beautiful Soup : Scrape the Redmi mobile detail from Amazon



         Scrape the Redmi mobile detail from Amazon

Requirement :
  • Python 3.8.3
  • BeautifulSoup 4.9.3
  • pandas 1.0.5
Description:
                     Amazon is a cloud computing giant and the largest American e-commerce company. we use Amazon sites for online shopping. Today we are going to see how to scrape the data from amazon. in this article, we will see how to collect Redmi Mobile data.
let's start,
                    Google Amazon site then search for Redmi brand you get webpage as follows:



now right-click on first redmi mobile and choose Inspect. we know that inspect option opens the HTML code of the webpage. now retrieve all mobile posts.


every post has many classes but there is one unique class "s-asin". so we can find all posts by using the find_all() method with class ='s-asin'. whenever we use the find_all() method then just remember that find_all() scan all document for the result so we have to pass attrs which give desired result without any time waste.

phone = soup.find_all('div',class_='s-asin')

we can scrape the following point:

  • description
  • global rating
  • price
  • discount
  • star
  • detail





let analysis the first post. so that we can gather the above data.

from the above code, the description occurs within the span tag has 'a-size-base-plus' class. span has three CSS classes but 'a-size-base-plus' is unique so we easily find by find() method.

p.find('span',class_='a-size-base-plus').text


now scrape the star and global rating.




star occur within span tag has specific class "a-icon_alt". we can see above code so by using find method.

p.find('span',class_='a-icon-alt').text

global rating occurs within span tag has "a-size-base". 

p.find('span',class_='a-size-base').text





from the above code, the price situated at span tag has "a-price-whole".

p.find('span',class_='a-price-whole').text

like price scrape discount.  but here span tag has dir "auto" it is possible that the other tag can have dir attribute. 

p.find('span',class_='a-letter-space').find_next('span').text.replace('%u20B9','')


detail of mobile can scrape from "a" tag has "a-link-normal a-text-normal" class

p.find('a',class_='a-link-normal a-text-normal')['href']

now, let write code to create a good dataset.

Program:

from bs4 import BeautifulSoup
import requests
import pandas as pd

# to read multiple page of web
def get_url(page_no):
    url = 'https://www.amazon.in/s?i=electronics&bbn=1389401031&rh=n%3A1389401031%2Cp_89%3ARedmi&dc&qid=1617362315&rnid=3837712031&ref=sr_pg_{}'.format(page_no)
    return url

# collect all information
def gather_information():

    # create a dict
    data = {'Description':[],'global_rating':[],'price':[],'discount':[],'rate':[],'detail':[]}
    # just keep loop to read all page
    for pg in range(1,7):
        res = requests.get(get_url(pg))
        markup = res.content
        soup = BeautifulSoup(markup,'lxml')
        phone = soup.find_all('div',class_='s-asin')
        for p in phone:
            try:
                data['Description'].append(p.find('span',class_='a-size-base-plus').text)
            except:
                pass
            data['global_rating'].append(p.find('span',class_='a-size-base').text)
            try:
                data['price'].append(p.find('span',class_='a-price-whole').text)
            except:
                pass
            try:
                data['discount'].append(p.find('span',class_='a-letter-space').find_next('span').text.replace('%u20B9',''))
            except:
                data['discount'].append('None')
            try:
                data['rate'].append(p.find('span',class_='a-icon-alt').text)
            except:
                data['rate'].append('None')
            data['detail'].append('amazon.in'+p.find('a',class_='a-link-normal a-text-normal')['href'])
            
    # Now store dataframe as csv file        
    df = pd.DataFrame(data)
    df.to_csv('Redmi.csv')
if __name__ == "__main__":
    gather_information()

Conclusion:
                  sometimes specific data not available ex. discount on mobile so handles using exception statement. we can read multiple-page by using for loop .get_url() has the main role for reading the multiple pages.

Redmi.csv:
                please click on these links to see the Redmi.csv file.
  

More Articles of AARTI SHELAR:

Name Views Likes
Python Beautiful Soup : Scrape the Rate of Movie from IMDb 98 0
Python BeautifulSoup : Scrape the Review of mask from Amazon 180 0
Python Beautiful Soup : Scrape the Redmi mobile detail from Amazon 217 0
Python BeautifulSoup: Scape Internship from Internshala 165 0
Python BeautifulSoup.Tag.find_previous() and Beautiful.Tag.find_all_previous() 20 0
Python BeautifulSoup.Tag.find_next() and BeautifulSoup.Tag.find_all_next() 18 0
Python BeautifulSoup.Tag.find_previous_siblings() and BeautifulSoup.Tag.find_previous_sibling() 27 0
Python BeautifulSoup.Tag.find_next_sibling() and BeautifulSoup.Tag.find_next_siblings() 33 0
Python BeautifulSoup.Tag.find_parent() and BeautifulSoup.Tag.find_parents() 29 0
Python BeautifulSoup.find() 23 0
Python logging.Handler.addFilter() 27 0
Python logging.Handler.removeFilter() 16 0
Beautiful Soup Searching Parse Tree with find_all() 28 0
Beautiful Soup Navigating Parse Tree by Going Back and Forth 21 0
Beautiful Soup Navigating Parse Tree by Going Sideways 22 1
Beautiful Soup Navigating Parse Tree by Going Down 15 0
Beautiful Soup Navigating Parse Tree by Going Up 11 0
Beautiful Soup with NavigableString ,comments and other special string 41 0
Beautiful Soup with tag object 25 0
Python Beautiful Soup introduction 47 0
Python logging.config.dictConfig() 38 0
Python logging.config.fileConfig() 29 0
Python configuring logging 29 0
Python logging.getLogger().hasHandlers() 31 0
Python logging.Filter() 27 0
Python logging.FileHandler() 17 0
Python logging.Handler.setFormatter() 19 0
logging logging.StreamHandler() 25 0
Python logging.Formatter() 18 0
Python logging.getLogger().removeHandler(handler) 39 0
python logging.getLogger().addHandler(handler) 16 0
Python handlers in logging 30 0
Python logging.getLogger().getChild(suffix) 21 0
Python logging.getLogger().isEnabledFor(level) 13 0
python logging.getLogger().setLevel() 20 0
Python logging.log() 25 0
Python logging.exception() 25 0
Python logging method of logger object 17 0
python logging.getLogger().getEffectivelevel() 24 1
python logging.getLogger() 23 0
Python logging.getLogRecordFactory() 27 0
Python logging.basicConfig() 46 0
Python logging introduction 134 0

Comments