26. Skip to content

26. API Documentation

26.1 Data-Pipelines

Running on Databricks

Source Platform Type Description Status
ACLED ArcGIS Point Layer Conflict Events ✅ every Friday
ACLED GeoSight Indicator Number of Conflicts in the last 30 days ✅ every Friday
ACLED GeoSight Indicator Number of Fatalities in the last 30 days ✅ every Friday
ACLED GeoSight Indicator % Change of Fatalities in the last 7 days ✅ every Friday
ACLED GeoSight Related Table Conflict Events 🔜 coming soon

The following documentation explains the ETL-scripts for pulling data into a Pandas DataFrame, geocoding the data using GeoRepo, and calculating indicators that can be pushed to GeoSight, as well as pushing the data points to an ArcGIS layer.

The different stages and tasks are divided into separate functions to facilitate easier updating and modification.

The directory ACLED contains all scripts specifically made for this source of data, along with further explanation for using the ACLED API and calculating basic conflict indicators. The general flow of data is as follows:

|-- ACLED Data - Pull ACLED data into pandas DataFrame
|  Pulls data from the source and structures it into a Pandas DataFrame.
|
|-- GeoRepo - Geocoding pandas DataFrames
|  Retrieves the 'ucodes' from GeoRepo for the specified administrative level.
  |
  |-- GeoSight - API functions
  |  |-- GeoSight - ACLED Conflict Indicators
  |  |  Calculates indicators and pushes them to GeoSight.
  |  |
  |  |-- GeoSight - ACLED Related Tables
  |  |  COMING SOON
  |
  |-- ArcGIS - ACLED data pipeline
  |  Updates ArcGIS layer.

 

26.2 GeoRepo - Geocoding pandas DataFrames

Please note that the UUID for all boundaries is hard-coded to use the latest 'Global Administrative Boundaries'.

26.2.1 get_georepo_batch_request_id(long_lat_list, admin_level)

This function uses the POST request for batch-geocoding to get the request ID that is necessary for the GET request to get the results. This function is part of the get_georepo_ucodes(long_lat_list, admin_level) function.

INPUTS
long_lat_list takes a list of longitude and latitude values in the following format: [[long, lat], ...]
admin_level administrative level of the ucodes that will be requested, e.g. 0for national level
OUTPUTS
georepo_batch_request_id['request_id'] returns the batch request ID that is necessary to retrieve the actual ucodes

The provided coordinates list is converted into a GeoJSON file:

features_list = []
  for c in long_lat_list: 
    features_list.append({"type": "Feature", "properties": {}, "geometry": {"coordinates": c, "type": "Point"}})

  # Create a GeoJSON string
  geojson_data = {"type": "FeatureCollection", "features": features_list}

  with NamedTemporaryFile(mode='w+', suffix='.json') as temp_file:
    # Write the GeoJSON directly to the temporary file
    temp_file.write(geopandas.GeoDataFrame.from_features(geojson_data).geometry.to_json())
    temp_file.flush()  # Ensure all data is written before sending
    temp_file.seek(0)  # Reset file pointer to the beginning

    geo_json_file = {'file': (temp_file.name, temp_file, 'application/geo+json')}

As mentioned above, the UUID is set to use the latest version of the 'Global Administrative Boundaries'. The spatial_query is set to ST_Intersect, the distance to 0, and id_type to ucode. Since some events might occur outside any of the official boundaries, the find_nearest-parameter is set to true to avoid not returning any ucodes.

georepo_batch_request_id = requests.post(
      url=georepo_endpoint+f'/operation/view/{uuid_view_global_admin__boundaries_latest}/batch-containment-check/ST_Intersects/{distance}/{admin_level}/{id_type}/?find_nearest=true', 
      headers=georepo_header, 
      params={'admin_level': admin_level},
      files=geo_json_file
      ).json()
 

26.2.2 get_georepo_response(geometry, admin_level)

This function is not currently utilized in the ACLED pipelines. However, it can be used to directly retrieve the ucode for a provided geometry.

INPUTS
geometry accepts a dictionary containing the coordinates and type of the provided geometry. e.g. {"coordinates": [long, lat], "type": "Point"}
admin_level administrative level of the ucodes that will be requested, e.g. 0for national level
OUTPUTS
georepo_response'] returns the API response in JSON format.

The parameters for this POST request are similar to those described above. Please note that the header must specify the content type: 'Content-Type': 'application/json'

georepo_response = requests.post(
    url=georepo_endpoint+f'/operation/view/{uuid_view_global_admin__boundaries_latest}/containment-check/ST_Intersects/{distance}/{id_type}/?admin_level={admin_level}', 

    # the 'Content-Type' in the header has to be added for this API call 
    headers={
      'Accept': 'application/json',
      'Authorization': dbutils.secrets.get(scope = "felixs_secrets", key = "georepo_token"),
      'GeoRepo-User-Key': dbutils.secrets.get(scope = "felixs_secrets", key = "georepo_key"),
      'Content-Type': 'application/json'
      },
    data=json.dumps(
      {"type": "FeatureCollection", 
       "features": [{"type": "Feature", "properties": {}, "geometry": geometry}]
       }
      )
    ).json()

 

26.2.3 get_georepo_ucodes(long_lat_list, admin_level)

This function returns the actual ucodes requested with the get_georepo_batch_request_id function and presents them as a list.

INPUTS
long_lat_list takes a list of longitude and latitude values in the following format: [[long, lat], ...]
admin_level administrative level of the ucodes that will be requested, e.g. 0for national level
OUTPUTS
[ucode[0] for ucode in ucodes_list]'] returns a list of codes if a request ID is returned by the get_georepo_batch_request_id function
'ERROR!' when the batch job has failed
'CANCELLED!' when the batch job has been disrupted

Depending on the size of the requested batch job, the ID might not be returned instantly. Therefore, it is necessary to use a GET request to check the status of the batch geocoding. This is done in a while-loop as long as the status returned from this GET request is equal to 'PROCESSING', waiting 1 second before sending a new status request. If the status request returns 'DONE', a GET request is sent to fetch the batch geocoding results. From this result, only the ucodes are extracted and returned as a list.

status = 'PROCESSING'
  while status == 'PROCESSING':
      status = requests.get(
        url=georepo_endpoint+f'/operation/view/{uuid_view_global_admin__boundaries_latest}/batch-containment-check/status/{request_id}/', 
        headers=georepo_header, 
        ).json()['status']

      if status == 'DONE':
        ucodes_response = requests.get(
                  url=georepo_endpoint+f'/operation/view/{uuid_view_global_admin__boundaries_latest}/batch-containment-check/result/{request_id}/', 
                  headers=georepo_header, 
                  ).json()['features']

        # get list of all the ucodes
        ucodes_list = [c['properties']['ucode'] for c in ucodes_response]

        return [ucode[0] for ucode in ucodes_list]

      if status == 'ERROR':
        return print('ERROR!')
      if status == 'CANCELLED':
        return print('CANCELLED!')

      # wait for 1 seconds before checking again 
      time.sleep(1)

 

26.2.4 get_list_of_iso_3_country_code()

As requesting data for multiple countries at once might lead to a timeout due to the dataset size, the data is requested for one country at a time. For the ACLED API, it is necessary to provide ISO3 numeric country codes. This function retrieves all countries that are available in the latest version of the 'Global Admin Boundaries' view on GeoRepo, converts them into numeric ISO3 codes, and returns them as a list.

OUTPUTS
numeric_codes returns a list of numeric ISO3 country codes

Since only the countries and not their subnational boundaries are needed, the GET request for finding geographical entities by level in the view is used with admin_level set to 0. As the maximum number of records per page is 50, a while-loop is implemented to increment the page number while the GET request is still returning values.

alpha3_codes = []
  start_page = 1
  response = [1]

  # get the codes for every page (because the max. number per page is 50)
  while response: 
    county_codes_params = {
      'page': start_page,
    }
    response = requests.get(
          url=georepo_endpoint+f'/search/view/{uuid_view_global_admin__boundaries_latest}/entity/level/0/', 
          headers=georepo_header, 
          params=county_codes_params,
          ).json()['results']

    # add all the ISO3 codes to a list
    alpha3_codes.extend(list(set([c['ext_codes']['ISO3'] for c in response])))

    # get the next page 
    start_page += 1

The alpha3_codes list resulting from this doesn't contain the numeric ISO3 codes but the 3-letter ISO3 alpha3 abbreviations. Some of these alpha3 codes can't be converted as they are not in the official list.

Those countries not found will be put into a separate list, displayed, and excluded from the returned numeric ISO3 list: NOT FOUND: ['xxx', 'xUK', 'xSK', 'xAC', 'xAP', 'xSI', 'xAB', 'xPI', 'xJL', 'xSR', 'xRI', 'xFR', 'xJK']

numeric_codes = []

not_found = []
for c in alpha3_codes:
    try:
      numeric_codes.append(int(countries.get(alpha_3=c).numeric))
    except AttributeError:
      not_found.append(c)

print(f"Number of countries in the list: {len(alpha3_codes)}")
print(f"NOT FOUND: {not_found}")

 

26.3 GeoSight - API functions

Please note that the UUID for all boundaries is hard-coded to use the latest 'Global Administrative Boundaries'.

26.3.1 push_data_geosight(indicator, attributes, df, admin_level, date)

This function pushes data from a DataFrame as indicator data via POST request to GeoSight.

INPUTS
indicator short-code of the indicator that the data should be pushed to
attributes takes a dictionary with attributes, e.g. description
df pandas DataFrame containing the values
admin_level administrative level of the geom_id (ucode) of the data, e.g. 0for national level
date date that is assigned to the data points, in UTC and this format: YYYY-MM-DD
  • The slice of the provided DataFrame for the value and geom_id needs to be adjusted according to the provided DataFrame.
  • It's important to note that the value expects a string, regardless of the original data type in the DataFrame, as this is specified in GeoSight for the indicator directly.
  • The geom_id takes the ucodes that were returned by the GeoRepo API.
  • The administrative level of the provided ucode and the admin_level must match.

The returned status codes will be added to a list. After looping over all the rows in the DataFrame, the number of values pushed to this indicator will be displayed, along with the list of status codes to verify if any problems occurred.

```python status_codes=[] for i in df.iterrows(): geosight_params = { "indicator_shortcode": indicator, "value": str(df.at[i[0], df.columns[1]]), # str(i[1][]) "date": date, "geom_id": df.at[i[0], df.columns[0]], # i[1][0] "dataset_uuid": uuid_view_global_admin__boundaries_latest, "admin_level": admin_level, "attributes": attributes } response = requests.post(url=geosight_endpoint+"/data-browser/", headers=geosight_header, json=geosight_params) status_codes.append(response.status_code)

# get status codes and number of values pushed to GeoSight print(f"{indicator}: {len(status_codes)}")
print(status_codes)