SIM FTP Database

Sistema de Informação sobre Mortalidade

[1]:
from pysus.ftp.databases.sim import SIM
sim = SIM().load() # Loads the files from DATASUS
[2]:
sim.metadata
[2]:
{'long_name': 'Sistema de Informação sobre Mortalidade',
 'source': 'http://sim.saude.gov.br',
 'description': ''}
[3]:
sim.groups
[3]:
{'CID10': 'DO', 'CID9': 'DOR'}
[4]:
sim.paths
[4]:
[/dissemin/publicos/SIM/CID10/DORES, /dissemin/publicos/SIM/CID9/DORES]

For more information about CID9 and CID10, visit http://tabnet.saude.es.gov.br/cgi/tabnet/sim/sim96/obtdescr.htm

Getting specific files

[5]:
sim.get_files("CID9", uf="SP", year=1995)
[5]:
[DORSP95.DBC]
[6]:
sim.get_files("CID10", uf=["SP", "RJ"], year=[2019, 2020, 2021])
[6]:
[DORJ2019.dbc,
 DORJ2020.dbc,
 DORJ2021.dbc,
 DOSP2019.dbc,
 DOSP2020.dbc,
 DOSP2021.dbc]
[7]:
files = sim.get_files(["CID9", "CID10"], uf=["SP"], year=[1995, 2020])
sp_cid9, sp_cid10 = files

Describing a file inside DATASUS server

[8]:
sim.describe(sp_cid9)
[8]:
{'name': 'DORSP95.DBC',
 'uf': 'São Paulo',
 'year': 1995,
 'group': 'CID9',
 'size': '8.2 MB',
 'last_update': '2020-01-31 02:48PM'}
[9]:
sim.describe(sp_cid10)
[9]:
{'name': 'DOSP2020.dbc',
 'uf': 'São Paulo',
 'year': 2020,
 'group': 'CID10',
 'size': '28.7 MB',
 'last_update': '2022-03-31 04:19PM'}

Downloading files

You can rather download multiple files or download them individually:

[10]:
sim.download(sp_cid9) # Downloads to default directory
DORSP95.parquet: 100%|█████████████| 434k/434k [00:12<00:00, 36.0kB/s]
[10]:
[/home/bida/pysus/DORSP95.parquet]
[11]:
parquet = sp_cid9.download() # Or in a custom directory with `local_dir=`
parquet
[11]:
/home/bida/pysus/DORSP95.parquet

@Note: If the file has been downloaded already, it’s required to delete it in order to download the lastest updated file from DATASUS.

Reading files

PySUS uses Parquets as output, use the method to_dataframe() to read the file as pandas DataFrame

[12]:
parquet.to_dataframe()
[12]:
contador CARTORIO REGISTRO DATAREG TIPOBITO DATAOBITO ESTCIVIL SEXO DATANASC IDADE ... FONTINFO ACIDTRAB LOCACID CRITICA NUMEXPORT CRSOCOR CRSRES RACACOR ETNIA UFINFORM
0 180001 951006 2 951002 2 1 19291003 465 ... 0 0 35
1 180002 951006 2 951002 3 2 18980317 497 ... 0 0 35
2 180003 951006 2 951003 2 2 19281002 467 ... 0 0 35
3 180004 951006 2 951003 3 1 19110613 484 ... 0 0 35
4 180005 951006 2 951004 1 1 19610914 434 ... 0 0 35
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
227832 179996 951004 2 951001 4 1 19380423 457 ... 0 0 35
227833 179997 951004 2 951001 2 1 19470130 448 ... 0 0 35
227834 179998 951004 2 951001 3 2 19160113 479 ... 0 0 35
227835 179999 951006 2 951001 1 1 19550901 440 ... 0 0 35
227836 180000 951006 2 951001 1 1 19700510 425 ... 0 0 35

227837 rows × 50 columns