I have mentioned Instagram scraping in my previous post and I've just spent a few spare hours to open source a simple set of scripts for follower scraping:
As far as I know a lot of people are struggling to get their list of followers nowadays so may be this (a bit ugly) code will be helpful for someone.
The main advantage of this solution is that it does not require login/password or cookies for Instagram account to scrape data, since it uses smart Instagram cloud proxy.
Since I think putting data to .csv or .json or MongoDB (like it's done in a lot of scrapers) is not too convenient for later analysis, and I am a fan of SQL, I've used knex library to put everything to old boring MySQL tables. MySQL works pretty good with JSON blobs nowadays (and I didn't want to extract every single possible field from Instagram to separate MySQL column), so this turned out to be pretty convenient, for example for basic engagement analysis by retrieving recent account post stats in SELECT statement:
anonData->>"$.edge_owner_to_timeline_media.edges.node.edge_media_to_comment.count" as second_post_comments
Currently the repo contains two scripts, first one grabs followers, second one enriches every (public) profile grabbed in step2 with useful profile information. Since scraping may take some time (usually more than 10 minutes) and I prefer doing the job from my remote dev server (via awesome VS Code Remote extension), what I usually do is: I launch new tmux session on my remote server via remote terminal (this guarantees that sudden ssh disconnect does not break my connection to scraping process and its output - VS code remote connection sometimes fails to restore its remote terminal sessions after my laptop hybernates), and in this new session I create two panes: first pane executes followers-step1.js and second pane executes followers-step2.js this way the scraping process is done via 2 parallel processes which works fine.
Short how-to guide for the script: