Python Web Scraping

Reading time ~1 minute

Main Course Link

This Series is a part of 30 Days of Learning.

  • We use scrapy as it is more mature and well defined library for Web Scraping.

  • Different ways of using xpath selector. The Documentation has it all.

  • Why Scrape when you have APIs.

  • scapy genspider “Website to Scrape”

  • Running commands : Running commands : scrapy runspider indiahikes.py -o articles.csv -t csv -s CLOSESPIDER_PAGECOUNT=10

  • Use the settings file to set the parsing rules and other settings.

  • In piplelines.py you can create the web scraping pipelines.

  • Piplines are the bestway to write in data.

  • I created a webcrawler to crawl indiahikes website and save the cost of all treks in a json file

  • Github Link for the Project. ihScraper

  • Advanced Techniques
    • Sumitting a form.
    • Finding Hidden API that are often embedded in a website. Using Inspect network. (Very intersting detective work)
    • Use Sitemap for better scraping. Use robots.txt for finding this.
    • Automatic Logins
  • Using Selenuim with Scrapy for automation

My First 10K run

> 10K doneI ran my first 5k in Nov 2015. I was quite elated at the experience that wrote a post in this blogThis year, I had set a goal f...… Continue reading

Walking around in Ranikhet and Majkhali

Published on July 04, 2021

Python GUI Development: Tkinter

Published on June 23, 2021