Working with Infodengue datasets¶
InfoDengue is an alert system designed to track arboviruses using a unique hybrid data approach that integrates social web data with climatic and epidemiological data. In this tutorial, we will walk through the process of using InfoDengue’s API with Python to fetch up-to-date arbovirus data.
[1]:
from pysus.online_data.Infodengue import search_string, download
Infodengue is a national-wide system, use the search_string
method to check how the city is found in the API:
[2]:
search_string('Rio d janeiro')
[2]:
{'Arroio do Meio': 4301008,
'Granjeiro': 2304806,
'Jerônimo Monteiro': 3203106,
'Minador do Negrão': 2705309,
'Rio Branco': 5107206,
'Rio Claro': 3304409,
'Rio Grande': 4315602,
'Rio Largo': 2707701,
'Rio Manso': 3155306,
'Rio Negrinho': 4215000,
'Rio Negro': 4122305,
'Rio Pardo': 4315701,
'Rio da Conceição': 1718659,
'Rio das Antas': 4214409,
'Rio de Janeiro': 3304557,
'Rio do Antônio': 2926806,
'Rio do Pires': 2926905,
'Rio dos Cedros': 4214706,
'Rodeiro': 3156304,
'Roteiro': 2707800}
The download method extracts data for a specified range of Epidemiological Weeks (SE in pt) in the format YYYYWW
. The output is a Pandas DataFrame containing all the EWs within this range.
[3]:
df = download('dengue', 202301, 202304, 'Rio de Janeiro')
[4]:
df
[4]:
SE | 202304 | 202303 | 202302 | 202301 |
---|---|---|---|---|
data_iniSE | 2023-01-22 | 2023-01-15 | 2023-01-08 | 2023-01-01 |
casos_est | 295.0 | 236.0 | 211.0 | 228.0 |
casos_est_min | 295 | 236 | 211 | 228 |
casos_est_max | 295 | 236 | 211 | 228 |
casos | 305 | 236 | 211 | 228 |
p_rt1 | 0.999356 | 0.945977 | 0.983291 | 0.999999 |
p_inc100k | 4.371786 | 3.497428 | 3.126938 | 3.378871 |
Localidade_id | 0 | 0 | 0 | 0 |
nivel | 2 | 2 | 2 | 2 |
id | 330455720230419461 | 330455720230319461 | 330455720230219461 | 330455720230119461 |
versao_modelo | 2023-04-14 | 2023-04-14 | 2023-04-14 | 2023-04-14 |
tweet | 0.0 | 0.0 | 0.0 | 0.0 |
Rt | 1.0 | 1.0 | 1.0 | 2.0 |
pop | 6747815.0 | 6747815.0 | 6747815.0 | 6747815.0 |
tempmin | 25.142857 | 26.714286 | 23.428571 | 22.428571 |
umidmax | 82.143793 | 77.157084 | 89.980829 | 92.587399 |
receptivo | 1 | 1 | 1 | 1 |
transmissao | 0 | 0 | 0 | 0 |
nivel_inc | 0 | 0 | 0 | 0 |
umidmed | 82.143793 | 77.157084 | 82.592395 | 78.188093 |
umidmin | 82.143793 | 77.157084 | 69.331682 | 63.034302 |
tempmed | 25.142857 | 26.714286 | 25.071429 | 24.976191 |
tempmax | 25.142857 | 26.714286 | 28.285714 | 28.428571 |
casprov | NaN | NaN | NaN | NaN |
casprov_est | NaN | NaN | NaN | NaN |
casprov_est_min | NaN | NaN | NaN | NaN |
casprov_est_max | NaN | NaN | NaN | NaN |
casconf | NaN | NaN | NaN | NaN |
notif_accum_year | 980 | 980 | 980 | 980 |
You can save the dataframe in a CSV file
[5]:
df.to_csv('rio_se01_04.csv')
In order to fetch data with different parameters, it is possible to iterate over a list, for instance:
[9]:
from itertools import product
diseases = ['dengue', 'zika']
cities = ['Rio de Janeiro', 'Rio do Antônio', 'Rio do Pires']
for disease, city in product(diseases, cities):
df = download(disease, 202301, 202304, city)
df.to_csv(f'{disease}_{city.lower().replace(" ", "_")}_se01_04.csv')
Expected files:
- dengue_rio_de_janeiro_se01_04.csv
- dengue_rio_do_antônio_se01_04.csv
- dengue_rio_do_pires_se01_04.csv
- zika_rio_de_janeiro_se01_04.csv
- zika_rio_do_antônio_se01_04.csv
- zika_rio_do_pires_se01_04.csv