_________ _ _______ _________ _______ ______ _______ _
\__ __/( ( /|( ____ \\__ __/( ___ )( __ \ ( ___ )|\ /|( ( /|
) ( | \ ( || ( \/ ) ( | ( ) || ( \ )| ( ) || ) ( || \ ( |
| | | \ | || (_____ | | | (___) || | ) || | | || | _ | || \ | |
| | | (\ \) |(_____ ) | | | ___ || | | || | | || |( )| || (\ \) |
| | | | \ | ) | | | | ( ) || | ) || | | || || || || | \ |
___) (___| ) \ |/\____) | | | | ) ( || (__/ )| (___) || () () || ) \ |
\_______/|/ )_)\_______) )_( |/ \|(______/ (_______)(_______)|/ )_)
DISCLAIMER 1: I am discontinuing the project right now, as someone at Meta seems to have found a way to make it impossible to consistently use Selenium on Instagram without showing its changes in the HTML/Javascript of the website. I concluded it is some intervention on the server side. If I will have some working tool developed in the future, I will update this, so that I can send it to you privately.
DISCLAIMER 2: There seems to be an incompatibility between some Apple CPUs and the latest GeckoDriver instance (in fact they gave out multiple geckodrivers to overcome the issue, but the downloader does not know that). If the related error pops up as you run the cookie creator, you can manually download the correct driver from the official repository. I am guessing that is the aarch64 version. Then put its location (absolute path) in the cookie_creator.py and the InstaDown.py files swapping this string:
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install(), options=fireFoxOptions)
with this string, with your path to the downloaded geckodriver obviously:
driver = webdriver.Firefox(executable_path="INSERT YOUR PATH", options=fireFoxOptions)
A simple python script that uses GeckoDriver to intiate an headless Firefox browser agent to download pictures from Instagram posts' URLs. The script uses a .txt file with one post link per line to extract the images (the file contains examples already). It renames the downloaded JPG images with the unique code identifying each post (the one after www.instagram.com/p/...). Moreover, to ease the login phase, another .py file can be run to login and create a cookies file, that is needed by InstaDown to operate properly
Clone this repository from Github or download it. Alternatively you can install it with '''pip install InstaDown_new'''. However it will be located inside your main Python folder, in ≈\Python\Lib\site-packages\package_InstaDown. You will need to run it from there then.
First you need to install the packages in requirements.txt:
pip install -r requirements.txt
Then, you need to open the links_list.txt and add your list of links, one per line, save it.
IMPORTANT: Add its absolute path in the InstaDown.py file.
To make it as easy as possible, you just need to open the .py file and search
INSERT ABSOLUTE PATH TO
and substitute it with your path to the file.
You will need to do the same for the cookies file
This passage is necessary to prevent InstaDown from requesting a login everytime you open a new session. You only need to create the file once.
Open a command prompt or powershell window and set its working directory in the InstaDown folder. Then run:
python Cookie_creator.py
It will open a browser in the background (invisible to the user), go to Instagram.com and ask you to input username and password in the command prompt, powershell or terminal window. THIS IS SAFER THAN LOGGING TRADITIONALLY, ESPECIALLY FOR POTENTIALLY COMPROMISED MACHINES, BECAUSE THE INPUT IS HIDDEN TO THE USER AS WELL. If it logs in correctly, a cookie.pkl file will be created in your InstaDown folder.
IMPORTANT: Add its absolute path in the InstaDown.py file.
Just run this in a powershell window with the InstaDown folder as working directory:
python InstaDown.py
It will tell you the unique code corresponding to the image it is downloading as it progresses.
You can find all the information necessary to understand the script and tweak it within the code, as it is heavily commented.
You need to give it a list of links to instagram posts, written one per line in a .txt file. Rename the .txt file links_list.txt (or change the code) and set the absolute path to your file in the script. The images are saved in the same folder as that of InstaDown.py.
When it crashes (because it will crash), you can see the last downloaded post in the terminal window, double check that the image has been downloaded in the output folder (named with the hashtag or username handle you scraped), fix the links_list.txt accordingly and proceed.
**IMPORTANT: if the script finishes the links, it will let you know by printing "No links to download detected!" and shut down
If it crashed because the post has been deleted and it finds no image to download, erase that link as well, otherwise it will keep crashing.
If it does not work at all, check if Meta changed the class id identifying the HTML element containing the picture link. Currently (7/22), it is "_aagv".
This is the very first functional script I have ever made and I am just starting with Python and scraping. I hope someone will find it useful!