Instagram is a tough target for scraping.
For one of my side projects, I needed to get information from several public accounts, on a daily basis – for example, their followers counts, and their recent posts. I tried to use most popular Github scrapers like
https://github.com/postaddictme/instagram-php-scraper on DigitalOcean droplet, and it quickly turned out Instagram either redirects to /login location, or throws
429 The maximum number of requests per hour has been exceeded though it was the first request to its GraphQL endpoint. Apparently, all datacenter ip ranges have been banned by Instagram. Issues about 302 and 429 errors are created on Github issue queues almost every day so I definitely was not alone.
I did not want to log in into some fake Instagram account, because scraping via account will violate Instagram Terms and is not the most ethical thing to do. I also did not want & need to do shady things like mass following or anything like that, and public accounts information is what I was interested in.
It turned out, there exists a solution to the problem – the unofficial Instagram API https://rapidapi.com/restyler/api/instagram40 which uses residential proxy networks and smart retries to bypass Instagram restrictions. It helped me to build my project, and it still works good during more than 3 months, so it looks pretty stable to me. Around 3-4% of requests end with 5xx errors, but it's an explicit error that is instantly visible to my software – so I can just retry failed requests once in a while, and considering the situation with Instagram strict policy, and comparing to other solutions it's just perfect. Proxified PHP scraper (uses this RapidAPI provider under the hood) is available on Github: https://github.com/restyler/instagram-php-scraper (it is a fork of
postaddictme/instagram-php-scraper which was mentioned above)
How to scrape Instagram in 2021: step by step
- Sign up on RapidAPI. RapidAPI is a big marketplace where developers submit their APIs and I am really excited with this platform, since it embraces divide&conquer approach: it allows app developers to focus on what their end customers need, delegating part of the work to other developer solutions. The best part about RapidAPI is that their API explorer allows you to subscribe&test several APIs to see how they perform in real time, and quickly decide if specific API is good enough for your use case. It is especially easy for APIs which provide free plans.
- Subscribe to specific API on the RapidAPI marketplace. I recommend https://rapidapi.com/restyler/api/instagram40 for Instagram API.
- Use the API. For this Instagram API, ready-made PHP solution is available on Github: https://github.com/restyler/instagram-php-scraper , but of course you can also just implement API in your own code.