Anthony Sidashin

Anthony Sidashin

As a web developer and CTO with over 15 years of experience, I am passionate about building profitable small SaaS products and pursuing Go-to-Market strategy for them. My areas of expertise include high performance, networking technology and APIs, SRE, automation using Puppeteer.js, web scraping, and SQL databases.

Web scraping in Javascript: node-fetch vs axios vs got vs superagent

There is a number of ways to perform web requests in Node.js: node-fetch, axios, got, superagent Node.js can perform HTTP requests without additional packages While I don't ever use this approach because of it's poor developer ergonomics (using EventEmitter to collect the response data is just too verbose for me), Node.js is perfectly capable of sending HTTP requests without any libraries from npm! const https = require('https'); https.get('https://example.com/some-page', (resp) => { let

5 min read

cURL examples: requests with proxy, set user agent, send POST JSON request, and more

cURL is a small *nix utility to perform network requests. This is a quick cheat sheet on how cURL can be used for web scraping or any other cases when you need to appear as sending web request from another ip address. cURL set proxy Setting proxy URL for cURL: curl --proxy http://login:pw@proxy-hostname.com:port Shortcut for --proxy option is -x, so this is the exact equivalent: curl -x http://login:pw@proxy-hostname.com:port cURL supports http, https, and socks proxies. For a simple

3 min read
I have tested out Zapier, Make.com and Pipedream.com from a developer perspective

I have tested out Zapier, Make.com and Pipedream.com from a developer perspective

A few days ago, I took a deep dive into integrating my ScrapeNinja web scrapers into Zapier, Pipedream.com, and Integromat (Make.com) to better understand the market situation among low-code and no-code automation platforms. I wanted to do a simple job: extract some website data in JSON format from HTML every hour using ScrapeNinja, apply some simple JS processing, and put everything into Google Sheets. This journey took longer than anticipated, and this writeup contains a summary of my conclusi

14 min read

Web scraping in Google Sheets: ImportXML & alternatives

In case you want to import some random website data into Google Sheets, the obvious way to start this exciting adventure is to use importXML() function. The main advantage is that this function is available in Google Sheets out of the box! IMPORTXML Syntax The syntax for the IMPORTXML function is: =IMPORTXML(url, xpath_query) * url refers to the URL of the website. You can represent this value in the formula by including the protocol “http://” or “https://” and URL between double quotation

3 min read
Running untrusted JavaScript in Node.js

Running untrusted JavaScript in Node.js

ScrapeNinja Scraping API recently got an exciting feature called Extractors. Extractors are pieces of user-supplied Javascript code which are executed in ScrapeNinja backend so ScrapeNinja returns pure JSON with data, from any HTML webpage in the world. This feature alone, with ScrapeNinja web-based IDE do write extractors, can shave off hours of development & testing  when building a web scraper. Here is a demo of extractor feature which turns HackerNews HTML frontpage into pure JSON of posts:

11 min read
Cheerio: parse HTML in Javascript. Playground and cheatsheet

Cheerio: parse HTML in Javascript. Playground and cheatsheet

Cheerio is a de-facto standard to parse HTML in a server-side Javascript (Node.js) now. It is a fast, flexible, and lean implementation of jQuery-like syntax designed specifically for the server. Github: https://github.com/cheeriojs/cheerio Stars: 25.8K NPM: https://www.npmjs.com/package/cheerio Cheerio is a pretty performant solution to extract data from raw HTML web pages, and is perfect for web scraping tasks when you don't need real browser rendering or you just don't want to use Puppetee

6 min read