top of page

How to perform web scrapping via Python

In this tutorial I will show you how to perform web scrapping via Python.


 

Learning Materials


webscraper
.py
Download PY • 486B

 

Why Web Scrapping:

Web scraping is an incredibly useful technique that allows us to extract relevant data from websites on a large scale. With the vast amount of information available online, web scraping enables us to gather valuable insights, analyze trends, and make informed decisions. It provides businesses with a competitive edge by automating data collection, eliminating the need for manual and time-consuming processes. Web scraping also aids in market research, allowing companies to monitor their competitors, track pricing information, and gather customer feedback. Moreover, researchers and academics benefit from web scraping by accessing and analyzing data for various studies and experiments. Overall, web scraping empowers us to harness the wealth of information available on the internet, unlocking a world of possibilities in data-driven decision-making and knowledge discovery.


 

The Tutorial:

We will be using Python to create our web scrapping tool. And the website we are scrapping today will be my website! Specifically we are going to be scrapping the main page for all the tutorials on this website which can be located on https://www.tapaway.com.au/blog


The first thing we need to do is make sure we have all the libraries we need for this project which are two. To install those libraries enter the following commands in your project terminal.

#1 command:

use pip3 or pip depending on your environment.

pip3 install bs4

#2 command:

pip3 install requests

Now that we have installed every library we need we can import them on our project.

Copy the following lines in your project


from bs4 import BeautifulSoup

import requests


Now that we have imported our libraries we can start using them.

We will create a function that will handle all our scrapping for the blog page.

def scrapBlogs(): pageToScrape = requests.get("https://www.tapaway.com.au/blog") soup = BeautifulSoup(pageToScrape.text, "html.parser") authors = soup.findAll('span', attrs= {'class':'tQ0Q1A'}) titles = soup.findAll('p', attrs= {'class':'bD0vt9 KNiaIk'}) views = soup.findAll('span', attrs= {'class':'eYQJQu'}) for author, title in zip(authors, titles): print(author.text + " - " + title.text)


This function allows us to create a get request to the blog page. After that we use the BeutifulSoup library to parse the HTML elements we received from the get request in the previous line. Now that we parsed the html content of the page we can then start assigning variables to specific html content. For example we assigned Authors to html content with class "tQ0Q1A" and so on.


The last thing we now need to do is just call that function.


scrapBlogs()


The following is the full code needed to scrap the page. This code can be applied to any other webpage as long as you provide the correct tags and class name.


from bs4 import BeautifulSoup import requests def scrapBlogs(): pageToScrape = requests.get("https://www.tapaway.com.au/blog") soup = BeautifulSoup(pageToScrape.text, "html.parser") authors = soup.findAll('span', attrs= {'class':'tQ0Q1A'}) titles = soup.findAll('p', attrs= {'class':'bD0vt9 KNiaIk'}) views = soup.findAll('span', attrs= {'class':'eYQJQu'}) for author, title in zip(authors, titles): print(author.text + " - " + title.text) scrapBlogs()


Make sure you watch the Youtube Video for further details and explanations.


 



199 views0 comments

Recent Posts

See All
bottom of page