Skip to content

Instantly share code, notes, and snippets.

@vivekthedev
Created July 19, 2024 11:51
Show Gist options
  • Select an option

  • Save vivekthedev/201f994bc14e4dbc7263b03983f917b3 to your computer and use it in GitHub Desktop.

Select an option

Save vivekthedev/201f994bc14e4dbc7263b03983f917b3 to your computer and use it in GitHub Desktop.
lxml Scraping Tutorial Code - Proxy Scrape
import requests
from lxml import html
import json
URL = "https://books.toscrape.com/"
username = ""
password = ""
hostname = ""
proxies = {
"http": f"https://{username}:{password}@{hostname}",
"https": f"https://{username}:{password}@{hostname}",
}
content = requests.get(URL, proxies=proxies).text
parsed = html.fromstring(content)
all_books = parsed.xpath('//article[@class="product_pod"]')
books = []
for book in all_books:
book_title = book.xpath('.//h3/a/@title')
price = book.cssselect("p.price_color")[0].text_content()
books.append({"title": book_title, "price": price})
with open("books.json", "w", encoding="utf-8") as file:
json.dump(books ,file, ensure_ascii=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment