Anthony Sidashin

Anthony Sidashin

As a web developer and CTO with over 15 years of experience, I am passionate about building profitable small SaaS products and pursuing Go-to-Market strategy for them. My areas of expertise include high performance, networking technology and APIs, SRE, automation using Puppeteer.js, web scraping, and SQL databases.

ScrapeNinja: never handle retries and proxies in your code again

I am glad to announce that ScrapeNinja scraping solution just received major update and got  new features: Retries Retries are must have for every scraping project. Proxies fail to process your request, the target website shows captchas, and all other bad things happen every time you are trying to get HTTP response. ScrapeNinja is smart enough to detect most of failed responses and retries via another proxy until it gets good response (or it fails, when number of retries is bigger than retryN

2 min read
Simple proxy checker script via CURL

Simple proxy checker script via CURL

While working on the ScrapeNinja scraping solution, I often need to verify if particular proxy is alive and if it is performing well. Since I don't want to use various online services, especially for private proxies with user&password authentication, I have written a simple bash script which is much more concise than typing all the commands to terminal CURL manually: #!/bin/bash # download, do chmod +x and copy to /usr/bin/local via ln -s /downloaded-dir/pcheck.sh /usr/local/bin/pcheck # then

2 min read
Sending Requests in Web Scraping: cURL, Chrome, Firefox, REST.client, netcat

Sending Requests in Web Scraping: cURL, Chrome, Firefox, REST.client, netcat

Contents: 1. Chrome Dev Tools 2. Copy as cURL 3. cURL options: proxy, show only headers 4. Firefox: edit&resend; multi-account containers 5. cURL to Python scraper converter 6. VS Code REST.client extension 7. HTTP server one-liner for debugging While working with scraping, I have to quickly debug and replay a lot of web requests every day. Web scraping process usually consists of two phases: 1. Retrieve the response effectively 2. Extract & transform the response to structured data

6 min read

Making PDF look like scanned. Top 4 tools to apply scanner effect, reviewed.

Some bigger companies still require wet signatures on documents, which was a source of constant hassle for me during recent years. My workflow was: * Receive email with the PDF document * Download the document * Print the document on my old friend HP Laserjet 1020 * Sign the document by hand, using a pen * Scan the document (usually via my phone camera and  Google Drive app) This is a fine routine to go through once a month, but once I started doing it every week I realized I can save a l

5 min read
How to bypass CloudFlare 403 (code:1020) errors [UPDATED 2023]

How to bypass CloudFlare 403 (code:1020) errors [UPDATED 2023]

I've recently started getting Cloudflare 1020 (403) errors when scraping some random e-commerce website. At first, I thought that the website didn't like my scraper IP address, but changing IP addresses to a clean residential proxy and even my home network didn't fix the issue. Strangely, when the website was opened in Chrome, it opened without any problems. I've opened Chrome Dev tools and did "Copy as cURL" operation from the Network tab, exactly how I always do it when debugging the scraping

7 min read

VS Code Remote for Node.js caveat: dealing with detached nodemon process

I develop all my new projects on a remote Hetzner Cloud machine, using wonderful and almost too-good-to-be-true VS Code Remote. I recommend this setup for everyone who does not like spinning fans of their Macbooks - and I really prefer Ubuntu and real Linux environment to brew when I setup my web servers, databases, node.js and other infrastructure. Using VS Code Remote terminal for Node.js development has one caveat though: if I use nodemon which reloads the script on every file save, from ti

2 min read