......_..._..._..._......
...../ \./ \./ \./ \.....
....( L | A | M | E )....
.....\_/.\_/.\_/.\_/.....
...._..._..._..._..._....
.../ \./ \./ \./ \./ \...
..( L | E | M | O | N )..
...\_/ \_/ \_/ \_/ \_/...
.........................

GrabIt

Demo

This is a Python program to traverse and download posts from subreddits and users on an incremental basis. It also records which posts have been saved so as to only download new submissions.

The program uses multiple methods of extracting data from links posted to Reddit. The Reddit API is used to get submission from Reddit, the links are then parsed and forwarded to the appropriate extractor. With the API restrictions if you wish to download more than 1000 submissions, the Pushshift API is used for the remainder of the submisssions. Images from Imgur are downloaded preserving the original descriptions and title. If the link is not supported by any in-built scrapers it is passed into the youtube_dl library with its large number of information extractors focused on video streaming sites.

All successfuly downloaded submissions are stored into a SQLite database. Authors, posts and subreddits each have their own table, both authors and subreddits have a foreign key in the posts table. The database serves two purposes, first it prevents duplicate downloads since the submission id is queried before it is saved. Secondly, it shows the user which submissions are saved by reddit author or subreddit.

With all the features implemented since 2017, there is still more that needs to be done. Currently the program solely operates through the command line, a graphical-user interface needs to implemented. Furthemore, there are always links from domains that are not supported by the in-built extractors or youtube_dl that need to be created and scraped.

If you would like to use or contribute to the project, visit the project on GitHub.