How to Use APIs (explained from scratch)
Learn how to use APIs with Python with no prior knowledge of either.
This post explains APIs with Python but assumes no prior knowledge of either.
Python Headers, Requests, JSON, and APIs
An API is a means for someone, or more specifically their Python script, to communicate directly with a website’s server to obtain information and/or manipulate data that might not otherwise be available. This avoids the difficulty of getting a Python script to interact with a webpage.
To use APIs, one needs to understand Python’s requests and json libraries as well as Python dictionaries. This article provides a walk though for using these tools.
Background – How to Get Started With Python and Requests From Scratch
To start with, if you are completely new you can download Python from https://www.python.org/downloads/. Then you can download Sublime Text, a tool for accessing Python scripts, at https://www.sublimetext.com/3.
Now, access your Command Line (if you are using Microsoft) or Terminal (if you are using Mac).
Next we want to get a new python library called “requests” and we will do so by using pip. Pip is included in Python but you can see guidance for installation and update at https://pip.pypa.io/en/stable/installing/.
To obtain “requests” (if you are using Windows) from the command line, type “python -m pip install requests”.
If you are using Mac, from Terminal type “pip install requests”.
Requests – a GET Request
The following is a walkthrough for a standard GET request. We start with the “requests” library, which is standard for using APIs or web scraping. This library is standard for using Python with the Internet. “Requests” are basically used to request information on the internet from a website or API.
First, to start our script we import the requests library by typing “import requests” at the top.
Second, identify the url that gives us the location of the API. This url should be identified in the API documentation (API documentation is explained below). If we were web-scraping, this would be the url of the webpage we are scraping, but that is a separate topic. We assign the url to what is called a variable, which in this case is named “Api_url” by typing this ‘ Api_url = “http://FAKE-WEBSITE-URL/processing.php” ‘.
SIDENOTE: A “variable” is kind of like a name or container for information, and that information is called a “value”. So in a script you create a name for the variable and assign the information/value to that variable by typing ” the name of the variable = the information/value “. So in this script the variable name is api_url and the value is the string of characters that create the url and the quotes around it, “http://FAKE-WEBSITE-URL/processing.php”.
Finally, we use the requests library to create a GET request with the format “requests.get(api_url)”. The request for appears in the context of it being assigned to “api_response”. It might seem weird that the request first appears in the context of saying “something equals the request”. It is easier to think of it as: your computer first reads the request before looking at the variable name, then gets the data (also known as the API response) and brings it back, and then gives the data a tag which is the variable name. That may not be accurate but it is easier to understand
Import requests
api_url = “http://FAKE-WEBSITE-URL/processing.php"
api_response = requests.get(api_url)
Requests – a POST Request
Usually with requests, you will do a GET request because you are basically just getting data that is already there on the website or from the API. In other cases you will do a POST request when you need to send information to a server/url in order to get certain information in return. However, these lines dividing GET and POST are often blurred. Sometimes the different requests can be used interchangeably.
Here is an example of a POST request. I in this case below, I want to obtain information about a specific person from a database. Therefore I change “requests.get” to “requests.post” and instead of only putting the url in the request like in the script above, I will also include parameters in the form of “data=params” to tell the database the relevant information.
The Request Parameters (identified as “params” in the script) specify what information you are looking for with your script. S
import requests
params = {'firstname': 'John', 'lastname': 'Smith'}
r = requests.post("http://FAKE-WEBSITE-URL/processing.php", data=params)
print(r.text)
The response to the request will include information on for the specified information (the person) instead of all information at the url.
API Documentation
Each API has certain requirements for how the code is written and formatted. These specific demands should be explained in a guide that accompanies the API on the website that explains or identifies the API itself. The guide is referred to as the “api documentation.”
For example, the website faceplusplus.com overs a tool that will compare faces in photos and there is the option to use an API to access the tool. The website includes the api documentation for Face++ (as show below) where it identifies the requirements or specifications for your script to access their API.
Note that the documentation below identifies the url that needs to be used and that the script must use a POST request. The documentation also identifies the names for the Request Parameters (parameters can be considered one of many ways to include a bit of data in a request) are explained in the Headers section later on in this article) used in the script.
How to use Face++ is explained in OSINT: Automate Face Comparison With Python and Face++. and Python for Investigations if You Don’t Know Python.
API Response
Now, back to the original GET request below.
Import requests
api_url = "http://FAKE-WEBSITE-URL/processing.php"
api_response = requests.get(api_url)
The response from the server, which we assigned to”api_response”, will be written in JSON programming language. So we need to make the JSON response more readable. To do this we need Python’s “json” library (the term json here is put in quotes to specify that it refers to the Python library named “json”, not the JSON programming language). This is not included in Python so first we need to install it through the “Command Prompt” or “Terminal,” depending on what kind of computer you are using.
As referenced above, we use pip to install. If you are using Windows then from the command line, type “python -m pip install requests”. If you are using Mac, from Terminal type “pip install requests”.
Next we add the line “import json” to our script (this refers to importing Python’s json library, not the JSON programming language).
Then we use the “json.loads()” function from the json library and we process the “api_response” that is written in JSON. When we put the api_response in the json.loads() function. We specify that we want to process the text from the api response by typing “api_response.text” when we put the response in the “json.loads()” function, so in full we type”json.loads(api_response.text)”, we then assign it to “Python_Dictionary”. In order to make the response data more readable, we used the json.loads function to transform the data into a python dictionary (explained more below).
Here is how it looks:
import json
import requests
api_url = “url for the api here(listed in api documentation)”
api_response = requests.get(api_url)
Python_Dictionary = json.loads(api_response.text)
For more information on this topic, look at the book Mining Social Media, in particular p. 48, or consider purchasing it.
Recap and Explanation – So, we used the “json.loads()” function from the json library to transform JSON data into a python dictionary. However, (per mining social media, p.49) the loads() function requires text, but by default the requests library returns api_response as an HTTP status code, which is generally a numbered response, like 200 for a working website or 404 for one that wasn’t found.
So if you typed “print(api_response)” you would get the status code.
We need to access the text of our response, or in this case the JSON rendering of our data. We can do so by putting a period after “api_response” variable, followed by the option “.text” and the entire construction thus looks like this: json.loads(api_response.text). This converts the response of our API call into text for our Python script to interpret it as JSON keys and values. We put these JSON keys and values in a Python dictionary, which is used for key-value pairs in general.
Python Dictionary
A Python dictionary contains key-value pairs (explained below) and it is also defined by its formatting. So here is an example of what a Python dictionary and its formatting look like:
headers = {'Content-Type': 'application/json',
'Authorization': 'api_token'}
The dictionary is enclosed in {} and its contents are formatted in key-value pairs (a “key” is assigned to a “value” and if you want to obtain a particular value you can call on its key).
For example, a dictionary would appear in our script like this “Dictionary_Title = {‘key1’ : ‘value1’, ‘key2’ : ‘value2’}”. Separately, we can call upon a value by typing “Dictionary_Title[‘key1’]” and it will give us ‘value1’ because value1 is the value that was assigned to key1.
However, dictionaries can also contain dictionaries within them. See below, where “key2” is a dictionary within a dictionary:
Dictionary_Title = {'key' : 'value',
'key2' : {
'value2': 'subvalue',
'value3': 'subvalue2'}},
In the example above key2 is a dictionary within the larger dictionary named Dictionary_Title. Therefore, if we want to get a value in a dictionary within a dictionary, like subvalue2, we would structure our call like this, “Dictionary_Title[‘key2’][‘value3’]” and that would give us subvalue2.
Note that sometimes a very larger dictionary is assigned to a variable so watch if the dictionary is preceded but something that looks like this “item: [ ” that means that item is a variable that contains the dictionary.
Authentication
APIs will commonly require some form of Authentication like an authentication code, also referred to as a key or a token. This is a means for the API owner to limit access or possibly charge users. Typically the user will have to sign up for an account, often referred to as a developer account and is geared toward app developers, with the API owner in order to obtain the code.
The owner’s API documentation will give instructions for how to include the code in your Python script.
API authentication can be a difficult matter. For reference, consider looking at the requests library’s documentation on authentication, click here.
Often, the documentation will instruct the user to include the code as a “param”, “in the header”, or to use Oauth.
Params Authentication
The API documentation for Face++ (for more information about using Face++, see my article on Secjuice by clicking here) specifically requests that you include your API key and API secret (assigned to you when you get an account) are included as request parameters.
Therefore, in your script you would create a params dictionary with the keys identified above and include that dictionary in your request with typing “params” or “params=params” as seen below.
params = {'api_key': 'YOUR UNIQUE API_KEY',
'api_secret' : 'YOUR UNIQUE API_SECRET',
}
r = requests.post(api_url, params=params)
Headers are a bit more complicated and therefore require an entire section just to explain headers first.
HTTP Headers
Every time you send a request to a server (which includes things like clicking on a link or doing almost anything on the internet) an HTTP header will be attached to the request automatically.
Headers a bit of data that is attached to a request, that is sent to a server, and provides information about the request itself. For example, when you open a website your browser sends a request with a header that identifies information about yourself, such as the fact that you are using a Chrome browser. Your Python script, behind the scenes, also includes a header that identifies itself as a script instead of a browser. This process is automated but you can choose to create a custom header to include in your python script.
A custom header is often needed when you are using an API. Many APIs require that you obtain a sort of authorization code in order to use the API. You must include that authorization code in your script’s header so that the API will give you permission to use it.
In order to create a custom header, you type a bit of code into your script that is a python dictionary named headers. Also, specify in your request to include this dictionary as the header by typing “headers=headers”. See below:
headers = {'Content-Type': 'application/json',
'Authorization': 'Bearer {0}'.format(api_token)}
api_response = requests.get(api_url, headers=headers)
This custom header will get priority over the automated header so, for example, you can set your custom header to identify your Python script as (essentially) a person using a web browser so that you can avoid bot-detection software. In a separate article, we will address how to make your script look human in order to avoid bot-detection software.
There are several predetermined key types and associated meanings. See here for a full list
The api documentation will often give specific instructions for how you must set up the headers for your scripts. Add these lines to the file to set up a dictionary containing your request headers:
This sets two headers at once. The Content-Type
header tells the server to expect JSON-formatted data in the body of the request. The Authorization
header needs to include our token, so we use Python’s string formatting logic to insert our api_token
variable into the string as we create the string. We could have put the token in here as a literal string, but separating it makes several things easier down the road:
See the more official documentation of custom headers below, from the documentation for requests:
“What goes into the request or response headers? Often, the request headers include your authentication token, and the response headers provide current information about your use of the service, such as how close you are to a rate limit”
Github API Authentication
This walkthrough of the Github API shows how to use an authentication token as opposed to the authentication free version (github allows people who do not have an account/token to use their api a limited amount). For more information, there is a great tutorial for the Github API, click here.
Without the API token:
import json
import requests
username = "search-ish"
following = []
api_url = ("https://api.github.com/users/%s/following" % username)
api_response = requests.get(api_url)
orgs = json.loads(api_response.text)
for org in orgs:
following.append(org['login'])
print(following)
api_url = ("https://api.github.com/rate_limit")
api_response = requests.get(api_url)
print(api_response.text)import json
As a result, the script shows that the user “search-ish” follows one person, “lamthuyvo”. But we see that the limit of searches is set at 60 and that I have used 10 of these already.
With the API token:
import json
import requests
username = "search-ish"
headers = {"Authorization": "bearer fake_token_pasted_here"}
following = []
api_url = ("https://api.github.com/users/%s/following" % username)
api_response = requests.get(api_url, headers=headers)
videos = json.loads(api_response.text)
for video in videos:
following.append(video['login'])
print(following)
api_url = ("https://api.github.com/rate_limit")
api_response = requests.get(api_url, headers=headers)
print(api_response.text)
In this script, we have gone gotten a “developer account” (this is generally the name of the kind of account you need to get to obtain an access token). Github uses the widely used Oauth2 software and github’s api documentation says that it wants you to put a bearer token in the header. So we use the fake token “1234” and type with the following formatting
headers = {“Authorization” : “bearer 1234”}
This specifies that the type of authorization is a bearer token and provides the token itself.
Next, we tell our GET request to include this header information by typing the following
api_response = requests.get(api_url, headers=headers)
This is a bit confusing but when you type “headers=headers” you are essentially saying that the HTTP Headers are the “headers” variable that I just typed. This will only replace the Authorization part of the original, automatic HTTP headers.
When we run this script we get the following:
Note that we get the same answer to our followers question and now the limit is set to 5,000 because that is the limit for my account.
That’s it, good luck.