Python Web Scraping

Reading time ~1 minute

Main Course Link

This Series is a part of 30 Days of Learning.

  • We use scrapy as it is more mature and well defined library for Web Scraping.

  • Different ways of using xpath selector. The Documentation has it all.

  • Why Scrape when you have APIs.

  • scapy genspider “Website to Scrape”

  • Running commands : Running commands : scrapy runspider indiahikes.py -o articles.csv -t csv -s CLOSESPIDER_PAGECOUNT=10

  • Use the settings file to set the parsing rules and other settings.

  • In piplelines.py you can create the web scraping pipelines.

  • Piplines are the bestway to write in data.

  • I created a webcrawler to crawl indiahikes website and save the cost of all treks in a json file

  • Github Link for the Project. ihScraper

  • Advanced Techniques
    • Sumitting a form.
    • Finding Hidden API that are often embedded in a website. Using Inspect network. (Very intersting detective work)
    • Use Sitemap for better scraping. Use robots.txt for finding this.
    • Automatic Logins
  • Using Selenuim with Scrapy for automation

Walking around in Ranikhet and Majkhali

In every walk with nature one receives far more than he seeks... The COVID restrictions were lifted and the itch to travel had started ye...… Continue reading

Python GUI Development: Tkinter

Published on June 23, 2021

Remote Office for Maximum Productivity

Published on June 22, 2021