Working with Infodengue datasets

InfoDengue is an alert system designed to track arboviruses using a unique hybrid data approach that integrates social web data with climatic and epidemiological data. In this tutorial, we will walk through the process of using InfoDengue’s API with Python to fetch up-to-date arbovirus data.

[1]:
from pysus.online_data.Infodengue import search_string, download

Infodengue is a national-wide system, use the search_string method to check how the city is found in the API:

[2]:
search_string('Rio d janeiro')
[2]:
{'Arroio do Meio': 4301008,
 'Granjeiro': 2304806,
 'Jerônimo Monteiro': 3203106,
 'Minador do Negrão': 2705309,
 'Rio Branco': 5107206,
 'Rio Claro': 3304409,
 'Rio Grande': 4315602,
 'Rio Largo': 2707701,
 'Rio Manso': 3155306,
 'Rio Negrinho': 4215000,
 'Rio Negro': 4122305,
 'Rio Pardo': 4315701,
 'Rio da Conceição': 1718659,
 'Rio das Antas': 4214409,
 'Rio de Janeiro': 3304557,
 'Rio do Antônio': 2926806,
 'Rio do Pires': 2926905,
 'Rio dos Cedros': 4214706,
 'Rodeiro': 3156304,
 'Roteiro': 2707800}

The download method extracts data for a specified range of Epidemiological Weeks (SE in pt) in the format YYYYWW. The output is a Pandas DataFrame containing all the EWs within this range.

[3]:
df = download('dengue', 202301, 202304, 'Rio de Janeiro')
[4]:
df
[4]:
SE 202304 202303 202302 202301
data_iniSE 2023-01-22 2023-01-15 2023-01-08 2023-01-01
casos_est 311.0 241.0 212.0 225.0
casos_est_min 311 241 212 225
casos_est_max 311 241 212 225
casos 311 241 212 225
p_rt1 0.999923 0.969245 0.986386 0.999998
p_inc100k 4.608899 3.571527 3.141758 3.334413
Localidade_id 0 0 0 0
nivel 3 3 3 3
id 330455720230419614 330455720230319614 330455720230219614 330455720230119614
versao_modelo 2023-09-14 2023-09-14 2023-09-14 2023-09-14
tweet 0.0 0.0 0.0 0.0
Rt 1.390005 1.194723 1.254508 1.643794
pop 6747815.0 6747815.0 6747815.0 6747815.0
tempmin 25.142857 26.714286 23.428571 22.428571
umidmax 82.143793 77.157084 89.980829 92.587399
receptivo 1 1 1 1
transmissao 1 1 1 1
nivel_inc 1 1 1 1
umidmed 82.143793 77.157084 82.592395 78.188093
umidmin 82.143793 77.157084 69.331682 63.034302
tempmed 25.142857 26.714286 25.071429 24.976191
tempmax 25.142857 26.714286 28.285714 28.428571
casprov NaN NaN NaN NaN
casprov_est NaN NaN NaN NaN
casprov_est_min NaN NaN NaN NaN
casprov_est_max NaN NaN NaN NaN
casconf NaN NaN NaN NaN
notif_accum_year 989 989 989 989

You can save the dataframe in a CSV file

[5]:
df.to_csv('rio_se01_04.csv')

In order to fetch data with different parameters, it is possible to iterate over a list, for instance:

[6]:
from itertools import product

diseases = ['dengue', 'zika']
cities = ['Rio de Janeiro', 'Rio do Antônio', 'Rio do Pires']

for disease, city in product(diseases, cities):
    df = download(disease, 202301, 202304, city)
    df.to_csv(f'{disease}_{city.lower().replace(" ", "_")}_se01_04.csv')

Expected files:

  • dengue_rio_de_janeiro_se01_04.csv

  • dengue_rio_do_antônio_se01_04.csv

  • dengue_rio_do_pires_se01_04.csv

  • zika_rio_de_janeiro_se01_04.csv

  • zika_rio_do_antônio_se01_04.csv

  • zika_rio_do_pires_se01_04.csv