Pixeljets (Page 3)

How to set proxy in Playwright

In this article I will describe how to set a proxy in Playwright (Node.js version of Playwright). Playwright is obviously one of the best and most modern solutions to automate browsers in 2024. It uses the CDP protocol to send commands to browsers and supports Chromium, Chrome and Firefox browsers out of the box. It is open source and very well maintained. It's main use case is UI test automation and web scraping. Setting up proxies is useful for both of these use cases - especially for web scr

Modern web scraping with Playwright: choosing between Python and NodeJS

When diving into the world of automated browser testing and scraping with Playwright, one of the first decisions you'll encounter is the choice of programming language. Playwright is not a one-language wonder; it caters to a polyglot audience. Let's see how Node.js and Python version of Playwright compare. A bit of a history Playwright was created by a guy who was one of authors of Puppeteer.js: Andrey Lushnikov (who was part of Chrome DevTools team back then). Playwright was built on the les

Blocking images in Playwright

Blocking unnecessary resources in Playwright is a pretty easy task, thanks to builtin route() function.

How do download PDF in Playwright

In the ever-evolving world of web scraping, I often come across hurdles that require creative solutions and some quick code workarounds and hacks - and oh boy! this is especially true when I am working with programmatically driven browsers, which I happen to do a lot lately. Today, I'd like to share a challenge I faced while trying to download PDF files using Playwright, and how I managed to overcome it. The Unexpected Twist with Chromium and Playwright Initially, after quickly browsing Playw

Choosing a proxy for web scraping

Once you're familiar with basic web scraping tools like Scrapy, and you've scraped your first 1-2 websites, you'll probably get your first ban because your IP address has made too many requests (what "too many" means really depends on the site, for one site it's just 3 requests per hour, for another site it's 100 requests in a 5 minute window). It's important to make sure that the site ban is actually related to an ip address from which you're sending your requests: to check that it's not a coo

Building company data enrichment API

The journey began when several fellow B2B SaaS founders expressed a common desire: to seamlessly enrich their signup processes with pertinent company data. They were frustrated with the inaccessibility of the Crunchbase API, which seemed only available to Fortune 500 behemoths with deep pockets.

Build, Grow🌱, Repeat.