Pixeljets

Build, Grow🌱, Repeat.

Stories from ScrapeNinja founder: bootstrapping SaaS products, web scraping, and more

How to set proxy in node-fetch

Executing http(s) request via proxy might be helpful in a lot of cases, this helps to make your http request look like it was executed from a different country or location. Setting proxy in node-fetch Node.js package is not simple as in Axios (where we can set a proxy by passing simple JS object with options), in node-fetch we need to pass an Agent with proxy set up, so it is a bit more manual work. But, this is also a good thing, because we can use latest and greatest proxy package from npm fo

2 min read

Web scraping in Javascript: node-fetch vs axios vs got vs superagent

There is a number of ways to perform web requests in Node.js: node-fetch, axios, got, superagent Node.js can perform HTTP requests without additional packages While I don't ever use this approach because of it's poor developer ergonomics (using EventEmitter to collect the response data is just too verbose for me), Node.js is perfectly capable of sending HTTP requests without any libraries from npm! const https = require('https'); https.get('https://example.com/some-page', (resp) => { let

5 min read

cURL examples: requests with proxy, set user agent, send POST JSON request, and more

cURL is a small *nix utility to perform network requests. This is a quick cheat sheet on how cURL can be used for web scraping or any other cases when you need to appear as sending web request from another ip address. cURL set proxy Setting proxy URL for cURL: curl --proxy http://login:pw@proxy-hostname.com:port Shortcut for --proxy option is -x, so this is the exact equivalent: curl -x http://login:pw@proxy-hostname.com:port cURL supports http, https, and socks proxies. For a simple

3 min read
I have tested out Zapier, Make.com and Pipedream.com from a developer perspective

I have tested out Zapier, Make.com and Pipedream.com from a developer perspective

A few days ago, I took a deep dive into integrating my ScrapeNinja web scrapers into Zapier, Pipedream.com, and Integromat (Make.com) to better understand the market situation among low-code and no-code automation platforms. I wanted to do a simple job: extract some website data in JSON format from HTML every hour using ScrapeNinja, apply some simple JS processing, and put everything into Google Sheets. This journey took longer than anticipated, and this writeup contains a summary of my conclusi

14 min read

Web scraping in Google Sheets: ImportXML & alternatives

In case you want to import some random website data into Google Sheets, the obvious way to start this exciting adventure is to use importXML() function. The main advantage is that this function is available in Google Sheets out of the box! IMPORTXML Syntax The syntax for the IMPORTXML function is: =IMPORTXML(url, xpath_query) * url refers to the URL of the website. You can represent this value in the formula by including the protocol “http://” or “https://” and URL between double quotation

3 min read
Running untrusted JavaScript in Node.js

Running untrusted JavaScript in Node.js

ScrapeNinja Scraping API recently got an exciting feature called Extractors. Extractors are pieces of user-supplied Javascript code which are executed in ScrapeNinja backend so ScrapeNinja returns pure JSON with data, from any HTML webpage in the world. This feature alone, with ScrapeNinja web-based IDE do write extractors, can shave off hours of development & testing  when building a web scraper. Here is a demo of extractor feature which turns HackerNews HTML frontpage into pure JSON of posts:

11 min read