datagovindia

datagovindia is a wrapper around >80,000 APIs of the Government of India’s open data platform data.gov.in. Here is a small guide to take you through the package. Primarily,the functionality is centered around three aspects :

Installation

The package is now on CRAN, download using :

install.packages("datagovindia")

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("econabhishek/datagovindia")

Prerequisites

Setup

library(datagovindia)

Know more about the various functions in the package vignette.

Example workflow

Once you have the API key ready, and have chosen the API you want and have its index_name (vignette for more details) using the search functions in the package, you are ready to extract data from it.

The function get_api_data is really the powerhouse in this package which allows one to do things over and above a manually constructed API query can do by utilizing the data.frame structure of the underlying data. It allows the user to filter, sort, select variables and to decide how much of the data to extract. The website can itself filter on only one field with one value at a time but one command through the wrapper can make multiple requests and append the results from these requests at the same time.

But before we dive into data extraction, we first need to validate our API key relieved from data.gov.in. To get the key, you need to register first register and then get the key from your “My Account” page after logging in. More instruction can be found on this official guide. Once you get your API key, you can validate it as follows (only need to do this once per session, this is a sample key from the website for demonstration) :

##Using a sample key
register_api_key("579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b")
#> Connected to the internet
#> The server is online
#> The API key is valid and you won't have to set it again

Once you have your key registered, you are ready to extract data from a chosen API. Here is what each argument means :

In a nutshell, first find the API you want using the search functions, get the index_name of the API from the results, optionally take a look at the fields present in the data of the API and then use the get_api_data function to extract the data. Suppose we choose the API “Real time Air Quality Index from various location” with index_ name 3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69. First we will look at which fields are available to construct the right query.
Suppose We want to get the data from only 2 cities Chandigarh and Gurugram and pollutants PM10 and NO2. We will let all fields to be returned (dataset columns).

We now look at the fields available to play with.

get_api_fields("3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69")
id name type
document_id document_id double
id id double
country country keyword
state state keyword
city city keyword
station station keyword
last_update last_update date
pollutant_id pollutant_id keyword
pollutant_min pollutant_min double
pollutant_max pollutant_max double
pollutant_avg pollutant_avg double
pollutant_unit pollutant_unit keyword
resource_uuid resource_uuid keyword

We accordingly select the city and pollution_id fields for constructing our query. Note that we use only field id to finally query the data.


get_api_data(api_index="3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69",
             results_per_req=10,filter_by=c(city="Gurugram,Chandigarh",
                                            polutant_id="PM10,NO2"),
             field_select=c(),
             sort_by=c('state','city'))
#> Connected to the internet
#> The server is online
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Gurugram&filters[polutant_id]=PM10
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Chandigarh&filters[polutant_id]=PM10
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Gurugram&filters[polutant_id]=NO2
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Chandigarh&filters[polutant_id]=NO2
#> gave the API a rest
#> No results returned - check your api_index
id country state city station last_update pollutant_id pollutant_min pollutant_max pollutant_avg pollutant_unit
428 India Haryana Gurugram Sector-51, Gurugram - HSPCB 31-05-2021 08:00:00 PM10 36 222 100 NA
435 India Haryana Gurugram Teri Gram, Gurugram - HSPCB 31-05-2021 08:00:00 PM10 24 104 55 NA
108 India Chandigarh Chandigarh Sector-25, Chandigarh - CPCC 31-05-2021 08:00:00 PM10 56 134 84 NA
429 India Haryana Gurugram Sector-51, Gurugram - HSPCB 31-05-2021 08:00:00 NO2 17 23 19 NA
436 India Haryana Gurugram Teri Gram, Gurugram - HSPCB 31-05-2021 08:00:00 NO2 5 8 6 NA
442 India Haryana Gurugram Vikas Sadan, Gurugram - HSPCB 31-05-2021 08:00:00 NO2 19 108 47 NA
109 India Chandigarh Chandigarh Sector-25, Chandigarh - CPCC 31-05-2021 08:00:00 NO2 14 40 23 NA

Python Version

This wrapper is also available on Python (PyPI) visit -

Use

pip install datagovindia

Authors :