Web scraping in python - Wikipedia Scrapping

What is web scraping?

Web scraping is the process of collecting, gathering, and parsing raw data from the Web. It can be product listing from eCommerce sites, competitors' data gathering, or gathering any random information from the web. The Internet is one of the greatest hosts of information and misinformation on the planet. The Python community has come up with the most exciting and cool tools for web scraping. We all know that Wikipedia is one of the largest sources of information. So we will try scraping data from Wikipedia. Step by step guide on Wikipedia scraping in python

❖ Scraper: This is an automated process that is used to gather public data or information from the websites. Within seconds we can access and extract large amounts of data.

• To start, I'm going to create a new python file called wiki.py:

Wiki.py

• First install the python library Wikipedia as a wiki to extract data from Wikipedia

• We can easily access data from Wikipedia because of the python library wiki. Wikipedia has a module/library so that we can extract any amount of information, or data directly from Wikipedia by installing it on our computer. For more details, you can read Wikipedia documentation.

• import wikipedia as wiki

➢ pip install wiki

• The code for the project :

• Print the statement to search and to get the summary of the project or the detailed information of the python from Wikipedia.

➢ print(wiki.search("Python"))

➢ print(wiki.suggest("Pyth"))

➢ print(wiki.summary("Python"))

• After that to set the language of the summary use

➢ wiki.set_lang("fr")

➢ wiki.set_lang("en")

• Then print the summary of the python

➢ print(wiki.summary("Python"))

• To get the Title

➢ print(p.title)

• To get the url of the article

➢ print(p.url)

• To scrape the full article

➢ print(p.content)

• To get all the images in the article

➢ print(p.images)

• And to get all the referrals used by Wikipedia in the article

➢ print(p.links)

• After this process we can get the output of the program as the whole summary of the python from Wikipedia.

Conclusion :

By using this Wikipedia Scraper Project we can easily extract the information based on any topics, URL, images, content, and much more from Wikipedia.

Cool, Right? We can successfully scrape any website by using python easily. That’s it!

So Which website will you scrape next??

If you find this series of “python projects for beginners interesting”, Subscribe to our newsletter and follow us on social media for more interesting and latest updates on Python, Data Science, AI, and much more.

Post Tags

python datascience artificialintelleingence pythonproject webscrapping career artificialntelligence

About the author

Nikita Padol

Nikita is a passionate programmer and aspiring Data scientist. She is pursuing her Master's (MCA) and is an intern at Edgrow , where she is building some exciting python projects with source code.