On the internet, we have a massive source of data. Whereas, those data have not to structure to analysis further. For example, if you want to analyze the weather information for one year, you have to collect one-year data and do the analysis. It will take more manual effort to do this.
It is a simple python web scraping library. It is an efficient HTTP library used for accessing web pages. With the help of Requests, we can get the raw HTML of web pages which can then be parsed for retrieving the data. Before using requests, let us understand its installation. Simple web scraping with Python Beautifulsoup. By Mathi Maheswaran Posted on March 20, 2021 Posted in Python No Comments. On the internet, we have a massive source of data. Whereas, those data have not to structure to analysis further. For example, if you want to analyze the weather information for one year, you have to collect one-year data.
To avoid manual processes, people are using web scraping methods to scraping the data from the web.
Python is a more powerful language for web scraping. Python has a lot of additional packages are available for web scraping. I will explain step-by-step instructions to extract the data from the website.
Necessary python libraries required for web scraping. If you are not installed the libraries, Please install them.
Snow leopard 10.6 4 dmgwizardstree. 1.Requests
The requests library is used to make the request to the website and extract the HTML data.pip install requests
2.Beautifulsoup4
The beautifulsoup4 library is used to navigating the HTML tree structure and extracting what you need from the raw HTML data.pip install beautifulsoup4
3.lxml
BeautifulSoup is also relies on a parser, the default is lxmlpip install lxml
To begin, we need to import BeautifulSoup and request, and grab source data:
from bs4 import BeautifulSoup
import requests
To make the request to get the data.
webpage = 'https://www.cricbuzz.com/cricket-series/3362/england-tour-of-india-2021/stats'
webpage = requests.get(webpage).text # url source
soup = BeautifulSoup(webpage, 'lxml')
It will return raw HTML text content that is parsed and represented in the tree-based structure. We need to identify the HTML DOM element to get the data. You can use the Chrome developer tool to identify DOM.
From the above, The players’ names are available on the “a” tag with the class name of “cb-text-link”
playerNames = soup.findAll('a', attrs = {'class':'cb-text-link'})
Url optimizationcreate the perfect url for seo.
All the player names are available on the “playerNames” array. We can display the players name using the for loop.
for player in playerNames:
print(player.text)
It will give you the below result.
Simple Web Scraping Using Python
Complete code.
Now we are able to scrap the data from the website.
Python Web Scraping Tutorial
Happy scrapping!!