Find secret API token in Android application

Jan 14, 2020 by Alexandre | 1770 views

Offensive Security Mobile Device Security

In May 2019, Google announced there are 2.5 billion active Android devices. Thereby, most companies develop their own application. Not only the richest companies like Google, Facebook, Amazon,... but also a lot of smaller businesses.

If most of the big companies have enough resources to make a secure application, smaller ones, to survive, have to develop and release applications very quickly. Sometimes, without the possibility to do a correct security analysis. The potential breaches present in the applications can be a huge problem if they are exploited by a hacker.

One of the easiest issue to find is the presence of secret token API hard-coded in the application. Usually, this kind of problem is not very difficult to solve.

What is a secret token API?

During the development of an application, it is not very common for companies to implement all functionalities by themselves. For example, there are different solutions to manage geolocation (Google Maps,...) or payments (Stripe,...).

To use their services, companies like Stripe or Intercom (messaging service) provide an API (Application Programming Interface). For these external functionalities, the application needs to authenticate itself on the related server. The authentication can be done by sending the secret token API to the server by using an HTTP/HTTPS request. In this case, the token is a static string, used as a username/password combination. Obviously, it is insecure to let the token in the application code. Particularly if the token refers to a payment API or an API that manages a users database.

Recover Android application code

An Android application is usually written in Java, compiled in Java bytecode and converted in Dalvik bytecode (specific bytecode for Android). As for Java bytecode, it is very easy to recover source code from Dalvik bytecode. The bytecode is stored in the file classes.dex inside the .apk file (that is actually a .zip archive). There are different tools to translate bytecode to Java source code. dex2jar or jadx for example.

jadx has a graphical interface, it is easier to use than dex2jar.

After downloading jadx, unzip the archive, go to bin folder and run

user@computer:/home/jadx/bin$ ./jadx-gui

Select the .apk file you want to decompile. jadx-gui

Find token API

After the application was decompiled, it is possible to navigate in the code to find tokens. The problem is it is very tedious to do it manually. Hopefully, there is a tool that automates all the procedure and even finds API token inside the code automatically. The tool, apk_api_key_extractor, is written by Allessandro Di Diego for his thesis. The tool is able to decompile and discover API strings inside the code. It is based on a pre-train neural-network (Multilayer-Perceptron network). For more information about his work, read the Allessandro Di Diego's thesis

The tool requires Python 3 to work.

$ git clone --recursive
$ cd apk_api_key_extractor
$ cp config.example.yml config.yml
$ pip3 install -r requirements.txt
$ python3

The command to run the tool is:

$ python3 --analyze-apk path/to/apk

To result will be displayed in the terminal. It is possible to store results in apikey.jsonl file by changing the line dump_location: console to dump_location: jsonlines in file config.yml.

If you have several applications to analyze, it is possible to run the tool to work on all applications placed in the apks folder.

$ python3 ----monitor-apks-folder

With this argument, the tool will monitor the folder apks and analyzed the application as soon as a new one is placed inside the folder. The analyzed applications are moved to the apks_analyzed.

Thanks to this tool, it is easy and fast to check if we have not let secret tokens inside the application during development.

Orchestration script to simulate user activity on multiple machines thanks to the GHOSTS framework
The GHOSTS Framework is an open-source project created by Dustin Updyke, a cybersecurity researcher from the Carnegie Mellon University. It's a framework which offers a way to simulate user activity, usually for cyber awareness trainings or research in the field of cyber defense.
MITRE ATT&CK and the ATT&CK Matrix
Defining cyber attacks is a difficult task. They vary in origins, goals and, at first glance, the techniques used might seem very different. Luckily a popular model was defined by Lockheed Martin, still used to this day, which illustrates very well the lifecycle of a typical cyber attack. The Cyber Kill Chain, popular but controversial, defines the 7 principal steps of an attack. There have been many advances, since its original conception, one of which is the wildly acclaimed ATT&CK Matrix for Enterprise.
Kali Linux and Parrot Sec OS, Penetration Environment Comparison
For years, hackers have been the main characters of movies, books and generally have captured the imagination of regular folks. When we see these hackers use the tools of their trade, we usually see a black screen with green text flashing as fast as possible on the screen, lost in commands and bright flashing lights. This can't be any further from reality, as most hackers will spend hours and days on end to accomplish their tasks, usually staring at a screen, using their programs of choice.