Open GApps logo

Python Crawler

Today a blog article with an interview with Nick Buse, long term contributor to Open GApps and the originator of the APKCrawler project.

When did you start the APKCrawler project?

First commit was on August 5, 2015, but some prototyping began in July 2015

Why did you start the project?

Between the start of the Open GApps project in March 2015 and August 2015 mfonville had compiled a long list of sites to check for new and updated APK files. When I first saw mfonville’s keyboard (below), I knew something needed to be done to automate the process.

mfonville’s keyboard

What was the first crawler you have written?

The obvious choice then (and still is now) is The have the most complete and accessible list of variants (cpu, dpi, sdk) for all APKs. This is probably due to their crowd sourced upload functionality.

What is the most useful crawler?

The AptoideCrawler. It is the #1 source of APKs that we find and then re-contribute to APKMirror! A close second is the new PlayStoreCrawler that mfonville and therealssj have been working on. There is no more reliable source of APKs than the Google Play Store!

What are the benefits?

mfonville has been using the same keyboard for 6 months now, so that is a huge cost savings for him! Seriously though, any software person knows the benefits of automation. Now we can focus family and friends!

Any challenges or issues?

When a site does not provide a usable API, we rely on the Python module BeautifulSoup to parse the site’s DOM to get the APK’s information as well and download the APK itself. The obvious drawback is when sites go through a redesign. Even with an API in place they break from time to time. One of Aptoide’s official API functions is already broken for months now. That is when I found an undocumented API and rewrote the crawler to be much more efficient. As of today it still kicks ass!

Where do we get more info about this cool thing of yours?

Check out our GitHub repository for the latest development (contributions are welcome!)