API Reference
The pysus.api package provides a layered architecture for discovering,
downloading, and reading data from Brazilian public health databases
(DATASUS). It supports three remote data sources.
Architecture Overview
The package is organized into a hierarchy of abstract base classes and concrete implementations:
pysus/api/
├── __init__.py # Package entry (re-exports PySUS)
├── client.py # Main PySUS orchestrator
├── extensions.py # File format handlers
├── models.py # Abstract base classes
├── types.py # Type aliases
├── _impl/
│ └── databases.py # High-level convenience functions
├── ducklake/ # S3 DuckLake catalog client
├── ftp/ # FTP client
└── dadosgov/ # dados.gov.br API client
Quick Start
The simplest way to use PySUS is via the high-level convenience functions:
from pysus import sinan
df = sinan(disease="dengue", year=2023)
Or with the async API:
from pysus.api.client import PySUS
async with PySUS() as pysus:
files = await pysus.query(dataset="sinan", group="DENG", year=2023)
for f in files:
await pysus.download(f)
Main Client
Main orchestrator for the PySUS data pipeline.
Manages file downloads, local state tracking, catalog attachment, Parquet conversion, and query execution across multiple backends.
- class pysus.api.client.Base(**kwargs: Any)[source]
Bases:
DeclarativeBaseBase declarative class for SQLAlchemy ORM models.
- metadata: ClassVar[MetaData] = MetaData()
Refers to the
_schema.MetaDatacollection that will be used for new_schema.Tableobjects.See also
- registry: ClassVar[_RegistryType] = <sqlalchemy.orm.decl_api.registry object>
Refers to the
_orm.registryin use where new_orm.Mapperobjects will be associated.
- class pysus.api.client.DownloadStatus(value)[source]
Bases:
EnumDownload status values tracked for each local file.
- COMPLETED = 'completed'
- DOWNLOADING = 'downloading'
- FAILED = 'failed'
- MISSING = 'missing'
- PENDING = 'pending'
- class pysus.api.client.LocalFileState(**kwargs)[source]
Bases:
BaseORM model tracking the state of a downloaded local file.
- status: Mapped[DownloadStatus]
- class pysus.api.client.PySUS(db_path: Path = PosixPath('/home/docs/pysus/config.db'))[source]
Bases:
objectCentral orchestrator for downloading and querying PySUS datasets.
- async download(file: BaseRemoteFile, token: str | None = None, callback: Callable | None = None, timeout: float | None = None) BaseLocalFile[source]
Download a remote file and return a local file handle.
Skips re-download if a matching local copy already exists.
- Parameters:
file (BaseRemoteFile) – The remote file to download.
token (str, optional) – Access token for authenticated clients (e.g. DadosGov).
callback (Callable, optional) – Progress callback invoked during the download.
timeout (float, optional) – Maximum seconds to wait for the download.
None(default) means no timeout.
- Returns:
The downloaded file wrapped in the appropriate handler.
- Return type:
- Raises:
ValueError – If the file’s client is not recognised.
RuntimeError – If the download fails for any reason.
- async download_to_parquet(file: BaseRemoteFile, token: str | None = None, callback: Callable[[int, int], None] | None = None, timeout: float | None = None, add_dv: bool = True) Parquet[source]
Download a file and convert it to Parquet format.
- Parameters:
file (BaseRemoteFile) – The remote file to download and convert.
token (str, optional) – Access token for authenticated clients.
callback (Callable[[int, int], None], optional) – Progress callback.
timeout (float, optional) – Maximum seconds to wait for the download.
add_dv (bool, optional) – Whether to apply the IBGE verification digit on load (default True).
- Returns:
The converted Parquet file handler.
- Return type:
- Raises:
NotImplementedError – If the downloaded file type cannot be converted to Parquet.
- get_completed_remote_paths() set[str][source]
Return remote paths for all successfully downloaded files.
- async get_dadosgov(access_token: str | None) DadosGov[source]
Return the DadosGov client, connecting lazily if needed.
- async get_ducklake() DuckLake[source]
Return the DuckLake client, initializing it lazily if needed.
- async get_local_file(file: BaseRemoteFile) BaseLocalFile | None[source]
Look up a previously downloaded file by its remote path.
- get_local_hierarchy()[source]
Build a nested dict of cached files grouped by client and dataset.
- Returns:
Nested dict keyed by
{client: {dataset: {group: [files]}}}.- Return type:
dict
- async query(client: Literal['FTP', 'DadosGov'] | None = None, dataset: str | None = None, group: str | None = None, state: str | None = None, year: int | None = None, month: int | None = None)[source]
Query available datasets through the DuckLake catalog.
- read_parquet(paths: list[Path], sql: str | None = None, mode: Literal['union', 'intersection', 'strict'] = 'union', add_dv: bool = True) DuckDBPyConnection | pd.DataFrame[source]
Read Parquet files with optional schema handling and SQL filter.
- Parameters:
paths (list of Path) – One or more Parquet file paths to read.
sql (str, optional) – Optional SQL filter expression applied to the result.
mode ({"union", "intersection", "strict"}, optional) – Schema resolution mode (default
"union").add_dv (bool, optional) – When True, automatically applies the IBGE verification digit to municipality code columns. If matching columns are found, a DataFrame is returned instead of a
DuckDBPyConnection.
- Returns:
The query result.
- Return type:
DuckDBPyConnection or pd.DataFrame
- Raises:
ValueError – If no paths are provided, or if the schema mode is
"strict"and the files have differing schemas.
Types
Type aliases used across the PySUS API.
- FileType:
Discriminated union of supported local file types (FILE, DIR, PARQUET, CSV, JSON, PDF, DBC, DBF, ZIP).
- State:
Brazilian state abbreviations (AC, AL, AP, …, DF).
Utilities
File Format Handlers
Map file extensions and MIME types to their handler classes.
- class pysus.api.extensions.CSV(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'CSV')[source]
Bases:
BaseTabularFileRepresents a CSV file with automatic encoding and separator detection.
- property columns: list[str]
Return the column names from the CSV header row.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property rows: int
Return the number of data rows in the file.
- async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]
Yield the CSV in chunks of the given number of rows.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.DBC(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DBC')[source]
Bases:
BaseTabularFileRepresents a compressed DBC file, convertible to DBF then Parquet.
- property columns: list[str]
Not supported for DBC files. Convert to Parquet first.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property rows: int
Not supported for DBC files. Convert to Parquet first.
- async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]
Convert to Parquet and stream its chunks.
- async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]
Decompress DBC to DBF, then convert to Parquet.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.DBCNotImported(*, path: ~pathlib.Path = <factory>, type: str | ~typing.Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'remote')[source]
Bases:
BaseTabularFilePlaceholder for DBC files when optional dependency is not installed.
- property columns: list[str]
Raise ImportError indicating the missing DBC dependency.
- property extension: str
Return the .dbc extension.
- import_err: ClassVar[str] = '\n run "pip install pysus[dbc]" to handle DBC files.\n Make sure you also have libffi installed on the system. It may not work\n on Windows\n '
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property modify: datetime
Raise ImportError indicating the missing DBC dependency.
- property name: str
Raise ImportError indicating the missing DBC dependency.
- path: Path
- property rows: int
Raise ImportError indicating the missing DBC dependency.
- property size: int
Raise ImportError indicating the missing DBC dependency.
- stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]
Raise ImportError indicating the missing DBC dependency.
- async to_parquet(output_path: str | Path | None = None, chunk_size: int = 10000, callback: Callable[[int, int], None] | None = None) Parquet[source]
Raise ImportError indicating the missing DBC dependency.
- type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.DBF(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DBF')[source]
Bases:
BaseTabularFileRepresents a dBASE (DBF) file.
- property columns: list[str]
Return the field names from the DBF file.
- decode_column(value)[source]
Decode a raw DBF value, handling byte strings and null bytes.
- Parameters:
value (bytes or str or Any) – The value to decode.
- Returns:
The decoded and stripped string, or the original value if it is neither bytes nor str.
- Return type:
str or Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property rows: int
Return the number of records in the DBF file.
- async stream(chunk_size: int = 30000) AsyncGenerator[DataFrame, None][source]
Yield the DBF records in chunks of the given size.
- async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]
Convert the DBF file to Parquet format.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.Directory(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DIR')[source]
Bases:
BaseLocalFileRepresents a directory on the local filesystem.
- async load() list[BaseLocalFile][source]
Load all entries inside the directory as file objects.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async stream(chunksize: int = 10000) AsyncGenerator[BaseLocalFile, None][source]
Yield each entry inside the directory as a file object.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.ExtensionFactory[source]
Bases:
objectFactory that maps file extensions and MIME types to handler classes.
- async classmethod get_file_class(path: Path) type[BaseLocalFile][source]
Return the file handler class for a given path.
First attempts MIME-type identification; falls back to extension matching.
- Parameters:
path (Path) – The file path to classify.
- Returns:
The handler class for the file type.
- Return type:
type[BaseLocalFile]
- async classmethod instantiate(path: str | Path) BaseLocalFile[source]
Create and return the appropriate file handler for a path.
Determines whether the path is a directory or a file, resolves the handler class, and instantiates it.
- Parameters:
path (str or Path) – The filesystem path to wrap in a handler.
- Returns:
The instantiated file handler.
- Return type:
- class pysus.api.extensions.File(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'FILE')[source]
Bases:
BaseLocalFileRepresents a generic local file with no special handling.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async stream(chunk_size: int = 1048576) AsyncGenerator[bytes, None][source]
Yield the file contents in chunks of the given size.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.GZip(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]
Bases:
BaseCompressedFileRepresents a GZip-compressed file.
- async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]
Decompress the file to a target directory and return it as a file object.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.JSON(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'JSON')[source]
Bases:
BaseTabularFileRepresents a JSON file with tabular data.
- property columns: list[str]
Return the column names from the JSON file.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property rows: int
Return the number of rows in the JSON file.
- async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]
Yield the entire JSON file as a single DataFrame.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.PDF(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'PDF')[source]
Bases:
BaseLocalFileRepresents a PDF file.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async stream(chunk_size: int | None = None) AsyncGenerator[bytes, None][source]
Yield the PDF file contents in chunks of the given size.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.Parquet(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'PARQUET', add_dv: bool = True)[source]
Bases:
BaseTabularFileRepresents a Parquet file with optional date and integer type parsing.
- add_dv: bool
- property columns: list[str]
Return the column names from the Parquet schema.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- static parse_dftypes(df: DataFrame) DataFrame[source]
Convert known date and integer columns to their proper types.
- property rows: int
Return the number of rows from the Parquet metadata.
- property schema: Schema
Return the Parquet schema as a PyArrow Schema object.
- async stream(chunk_size: int = 10000, parse: bool = False) AsyncGenerator[DataFrame, None][source]
Yield the Parquet file in batches of the given size.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.Tar(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]
Bases:
BaseCompressedFileRepresents a Tar archive file.
- async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]
Extract members to a target directory and return as file objects.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async open_member(member_name: str) bytes[source]
Read and return the contents of a named archive member.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
- class pysus.api.extensions.Zip(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]
Bases:
BaseCompressedFileRepresents a ZIP archive file.
- async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]
Extract members to a target directory and return as file objects.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async open_member(member_name: str) bytes[source]
Read and return the contents of a named archive member.
- async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]
Extract the archive and convert the first tabular file to Parquet.
- type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
Abstract Base Models
Abstract model hierarchy for PySUS data access.
Provides abstract base classes for local and remote file handling, organized in a layered hierarchy: BaseFile -> BaseLocalFile -> BaseTabularFile / BaseCompressedFile for local files, and BaseFile -> BaseRemoteFile for remote files, alongside BaseRemoteObject -> BaseRemoteGroup / BaseRemoteDataset / BaseRemoteClient for remote data catalogs.
- class pysus.api.models.BaseCompressedFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]
Bases:
BaseLocalFile,ABCAbstract base for a compressed archive file (e.g. .zip, .gz).
Subclasses must implement list_members, open_member, and extract.
- abstractmethod async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]
Extract all members into target_dir and return the file objects.
- abstractmethod async list_members() list[str][source]
Return the list of member names inside the archive.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pysus.api.models.BaseFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]
Bases:
BaseModel,ABCAbstract base for a single file, local or remote.
Subclasses must implement name, extension, size, and modify.
- property basename: str
Return the file name from the path.
- abstract property extension: str
Return the file extension string.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstract property modify: datetime
Return the last modification timestamp.
- abstract property name: str
Return the display name of the file.
- path: Path
- abstract property size: int
Return the file size in bytes.
- type: str | FileType
- class pysus.api.models.BaseLocalFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]
Bases:
BaseFile,ABCAbstract base for a file stored on the local filesystem.
Subclasses must implement load and stream.
- property extension: str
Return the file extension from the local path.
- async get_hash(algorithm: str = 'sha256', chunk_size: int = 1048576) str[source]
Compute the file’s hash digest.
- Parameters:
algorithm (str, optional) – The hash algorithm name (default
"sha256").chunk_size (int, optional) – Read chunk size in bytes (default 1 MiB).
- Returns:
The hex digest string.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property modify: datetime
Return the last modification timestamp from the local filesystem.
- property name: str
Return the file name from the path.
- path: Path
- property size: int
Return the file size in bytes from the local filesystem.
- class pysus.api.models.BaseRemoteClient[source]
Bases:
BaseRemoteObject,ABCAbstract base for a remote API client (e.g. FTP, HTTP).
Subclasses must implement connect, close, login, datasets, and _download_file.
- abstractmethod async datasets(**kwargs) list[source]
Return a list of available datasets matching kwargs.
- abstractmethod async login(**kwargs) None[source]
Authenticate with the remote server using kwargs credentials.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pysus.api.models.BaseRemoteDataset[source]
Bases:
BaseRemoteObject,SearchableMixin,ABCAbstract base for a dataset containing groups and/or files.
Subclasses must implement _fetch_content.
- client: BaseRemoteClient
- property content: Sequence[BaseRemoteGroup | BaseRemoteFile]
Return the dataset content, fetching on first access.
- group_definitions: dict[str, str]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- async search(**kwargs) list[BaseRemoteFile][source]
Recursively search groups and files by attribute kwargs.
Return matching file objects.
- class pysus.api.models.BaseRemoteFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]
Bases:
BaseFile,SearchableMixin,ABCAbstract base for a file stored on a remote server.
Subclasses must implement _download. dataset and group link back to the containing objects.
- property client: BaseRemoteClient
Return the remote client associated with this file.
- dataset: BaseRemoteDataset
- async download(output: str | Path | None = None, callback: Callable[[int, int], None] | None = None) BaseLocalFile[source]
Download the remote file to a local cache or output path.
Return the instantiated local file wrapper.
- group: BaseRemoteGroup | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property month: int | None
Return the month associated with the file, or None.
- property name: str
Return the basename as the display name.
- property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None
Return the state associated with the file, or None.
- property year: int | None
Return the year associated with the file, or None.
- class pysus.api.models.BaseRemoteGroup[source]
Bases:
BaseRemoteObject,SearchableMixin,ABCAbstract base for a named group of remote files within a dataset.
Subclasses must implement _fetch_files.
- dataset: BaseRemoteDataset
- property files: list[BaseRemoteFile]
Return all files in this group, fetching them on first access.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property parent: BaseRemoteDataset
Return the parent dataset.
- async search(**kwargs) list[BaseRemoteFile][source]
Filter files in this group by attribute kwargs.
Return matching file objects.
- class pysus.api.models.BaseRemoteObject[source]
Bases:
BaseModel,ABCAbstract base for a named remote entity with a description.
Subclasses must implement name, long_name, and description.
- abstract property description: str
Return a textual description of the entity.
- abstract property long_name: str
Return the long / human-readable name.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstract property name: str
Return the short name of the remote entity.
- class pysus.api.models.BaseTabularFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]
Bases:
BaseLocalFile,ABCAbstract base for a local tabular file (e.g. CSV, Parquet).
Subclasses must implement columns, rows, load, and stream.
- abstract property columns: list[str]
Return the list of column names.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstract property rows: int
Return the number of data rows.
- abstractmethod stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]
Yield pandas DataFrames in chunks as an async generator.
- async to_parquet(output_path: str | Path | None = None, chunk_size: int = 10000, callback: Callable[[int, int], None] | None = None) Parquet[source]
Convert the file to Parquet format.
- Parameters:
output_path (str or Path, optional) – Destination path for the Parquet file. Defaults to the source path with a
.parquetextension.chunk_size (int, optional) – Number of rows per streaming chunk (default 10 000).
callback (Callable[[int, int], None], optional) – Function called after each chunk with
(current_rows, total_rows).
- Returns:
The resulting Parquet wrapper object.
- Return type:
High-Level Data Functions
High-level convenience functions for fetching Brazilian health data.
Each function wraps an asynchronous query/download pipeline and returns a pandas DataFrame. The available datasets cover disease notification (SINAN), vital statistics (SINASC, SIM), hospital admissions (SIH), ambulatory care (SIA), immunisation (PNI), census data (IBGE), health facilities (CNES), and hospitalisation records (CIHA).
- pysus.api._impl.databases.ciha(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = 'CIHA', **kwargs) DataFrame[source]
Fetch CIHA hospitalisation records for state, year, month, and group.
CIHA (Comunicação de Internação Hospitalar) provides hospitalisation records.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
month (int | list[int]) – Month or list of months to fetch.
group (str, optional) – Additional grouping code. Default is
"CIHA".**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
CIHA hospitalisation records.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.cnes(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch CNES health facilities for a state, year, month, and group.
CNES (Cadastro Nacional de Estabelecimentos de Saúde) is the Brazilian registry of health-care facilities.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
month (int | list[int]) – Month or list of months to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
CNES health-facility records.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.ibge(year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch IBGE census data for given year(s) and optional group.
IBGE (Instituto Brasileiro de Geografia e Estatística) provides census and demographic data.
- Parameters:
year (int | list[int]) – Year or list of years to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
IBGE census data for the specified year(s) and group.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.list_files(dataset: Literal['SINAN', 'SINASC', 'SIM', 'SIH', 'SIA', 'PNI', 'IBGE', 'CNES', 'CIHA'], client: Literal['FTP', 'DadosGov'] | None = None, group: str | None = None, state: str | None = None, year: int | list[int] | None = None, month: int | list[int] | None = None, **kwargs) DataFrame[source]
List catalog files filtered by client, group, state, year, and month.
Queries the PySUS API metadata and returns a DataFrame with file name, path, dataset, group, year, month, state, and last-modified timestamp for every matching file without downloading the actual data.
- Parameters:
dataset (Literal) – Dataset name (e.g.
"SINAN","SINASC", etc.).client (Literal["FTP", "DadosGov"], optional) – Data source client to query.
group (str, optional) – Group or disease code to filter by.
state (str, optional) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int], optional) – Year or list of years to filter by.
month (int | list[int], optional) – Month or list of months to filter by.
**kwargs – Additional arguments forwarded to
PySUS.query().
- Returns:
DataFrame with columns
name,path,dataset,group,year,month,state, andmodify.- Return type:
pd.DataFrame
- pysus.api._impl.databases.pni(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch PNI immunisation records for a given state, year(s), and group.
PNI (Programa Nacional de Imunizações) is the Brazilian national immunisation programme.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
PNI immunisation records.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.sia(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch SIA ambulatory care for a state, year, month, and group.
SIA (Sistema de Informação Ambulatorial) is the Brazilian ambulatory care information system.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
month (int | list[int]) – Month or list of months to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
SIA ambulatory care records.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.sih(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch SIH hospital admissions for a state, year, month, and group.
SIH (Sistema de Informação Hospitalar) is the Brazilian hospital admission information system.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
month (int | list[int]) – Month or list of months to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
SIH hospital admission records.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.sim(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch SIM mortality records for a given state, year(s), and group.
SIM (Sistema de Informação sobre Mortalidade) is the Brazilian mortality information system.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
SIM mortality records for the specified state, year(s), and group.
- Return type:
pd.DataFrame
- pysus.api._impl.databases.sinan(disease: Literal['ACBI', 'ACGR', 'ANIM', 'ANTR', 'BOTU', 'CANC', 'CHAG', 'CHIK', 'COLE', 'COQU', 'DENG', 'DERM', 'DIFT', 'ESQU', 'EXAN', 'FMAC', 'FTIF', 'HANS', 'HANT', 'HEPA', 'IEXO', 'INFL', 'LEIV', 'LEPT', 'LERD', 'LTAN', 'MALA', 'MENI', 'MENT', 'NTRA', 'PAIR', 'PEST', 'PFAN', 'PNEU', 'RAIV', 'SDTA', 'SIFA', 'SIFC', 'SIFG', 'SRC', 'TETA', 'TETN', 'TOXC', 'TOXG', 'TRAC', 'TUBE', 'VARC', 'VIOL', 'ZIKA'], year: int | list[int], **kwargs) DataFrame[source]
Fetch SINAN records for a given disease and year(s).
SINAN (Sistema de Informação de Agravos de Notificação) is the Brazilian notifiable-disease information system.
- Parameters:
disease (Literal) – Disease code (e.g.
"DENG"for dengue,"ZIKA"for zika).year (int | list[int]) – Year or list of years to fetch.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
SINAN records for the specified disease and year(s).
- Return type:
pd.DataFrame
- pysus.api._impl.databases.sinasc(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]
Fetch SINASC birth certificates for a given state, year(s), and group.
SINASC (Sistema de Informação sobre Nascidos Vivos) is the Brazilian live birth information system.
- Parameters:
state (State) – Two-letter state abbreviation (e.g.
"RJ").year (int | list[int]) – Year or list of years to fetch.
group (str, optional) – Additional grouping code.
**kwargs – Additional arguments forwarded to
_fetch_data().
- Returns:
SINASC birth records for the specified state, year(s), and group.
- Return type:
pd.DataFrame
DuckLake Client
High-level client for DuckLake S3-based dataset catalog.
Provides authentication, catalog synchronization, dataset querying, and file download capabilities backed by a local DuckDB engine.
- class pysus.api.ducklake.client.CatalogDatasetAdapter(catalog_dataset: CatalogDataset, ducklake)[source]
Bases:
objectAdapter wrapping a CatalogDataset ORM record for use by File objects.
- Parameters:
catalog_dataset (CatalogDataset) – The ORM record to wrap.
ducklake (DuckLake) – The parent DuckLake client instance.
- property content
Query the DuckLake client for files in this dataset.
- Returns:
List of files belonging to this dataset.
- Return type:
list
- class pysus.api.ducklake.client.DatasetGroupAdapter(dataset_group: DatasetGroup, dataset)[source]
Bases:
objectAdapter wrapping a DatasetGroup ORM record for use by File objects.
- Parameters:
dataset_group (DatasetGroup) – The ORM record to wrap.
dataset (CatalogDataset) – The parent dataset.
- property files
Return the list of files in this group.
- Returns:
List of file objects in this group.
- Return type:
list
- class pysus.api.ducklake.client.DuckLake(engine=None, *, endpoint: str = 'nbg1.your-objectstorage.com', region: str = 'nbg1', bucket: str = 'pysus', credentials: DuckLakeCredentials | None = None)[source]
Bases:
BaseRemoteClientClient for the DuckLake S3-based public health dataset catalog.
- Parameters:
endpoint (str, optional) – S3-compatible object storage endpoint.
region (str, optional) – Storage region name.
bucket (str, optional) – Bucket name containing the catalog.
credentials (DuckLakeCredentials, optional) – Credentials for authenticated S3 operations.
engine (object, optional) – Pre-configured SQLAlchemy engine to reuse.
- bucket: str
- property catalog_path: Path
Return the local path to the downloaded catalog database.
- Returns:
Filesystem path to the local catalog database file.
- Return type:
Path
- async close()[source]
Dispose the engine, then upload the catalog if authenticated.
- Raises:
PermissionError – If the client is not authenticated but an upload is required.
- async connect(force: bool = False)[source]
Connect to the catalog, downloading it first if necessary.
- Parameters:
force (bool, optional) – Whether to re-download and re-connect even if already connected.
- credentials: DuckLakeCredentials | None
- async datasets(**kwargs) list[DuckDataset][source]
Return all datasets from the catalog as DuckDataset instances.
- Parameters:
**kwargs – Additional filter arguments (currently unused).
- Returns:
List of all datasets in the catalog.
- Return type:
list[DuckDataset]
- property description: str
Return a description of this client.
- Returns:
A description string (currently empty).
- Return type:
str
- endpoint: str
- async login(access_key: str | None = None, secret_key: str | None = None, **kwargs) None[source]
Authenticate with S3 credentials and reconnect to the catalog.
- Parameters:
access_key (str, optional) – S3 access key ID. If omitted, credentials are cleared.
secret_key (str, optional) – S3 secret access key. If omitted, credentials are cleared.
**kwargs – Additional arguments (currently unused).
- property long_name: str
Return the human-readable name of this client.
- Returns:
The client display name.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name of this client.
- Returns:
The client short name.
- Return type:
str
- async query(client: Literal['FTP', 'DadosGov'] | None = None, dataset: str | None = None, group: str | None = None, state: str | None = None, year: int | None = None, month: int | None = None) list[File][source]
Filter catalog files by client, dataset, group, state, year.
- Parameters:
client (Literal["FTP", "DadosGov"], optional) – Source client to filter by.
dataset (str, optional) – Dataset name to filter by.
group (str, optional) – Group name pattern to filter by (case-insensitive ILIKE).
state (str, optional) – Two-letter state code to filter by.
year (int, optional) – Year to filter by.
month (int, optional) – Month to filter by.
- Returns:
List of matching file objects.
- Return type:
list[
File]
- region: str
- class pysus.api.ducklake.client.DuckLakeCredentials(*, access_key: SecretStr, secret_key: SecretStr)[source]
Bases:
BaseModelCredentials for authenticating with the S3-compatible object storage.
- Parameters:
access_key (SecretStr) – The S3 access key ID.
secret_key (SecretStr) – The S3 secret access key.
- access_key: SecretStr
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- secret_key: SecretStr
SQLAlchemy ORM models for the DuckLake catalog schema.
Defines tables for datasets, groups, files, and columns stored in the pysus schema of the local DuckDB catalog.
- class pysus.api.ducklake.catalog.Base(**kwargs: Any)[source]
Bases:
DeclarativeBaseBase class for all DuckLake catalog ORM models.
- metadata: ClassVar[MetaData] = MetaData()
Refers to the
_schema.MetaDatacollection that will be used for new_schema.Tableobjects.See also
- registry: ClassVar[_RegistryType] = <sqlalchemy.orm.decl_api.registry object>
Refers to the
_orm.registryin use where new_orm.Mapperobjects will be associated.
- class pysus.api.ducklake.catalog.CatalogDataset(**kwargs)[source]
Bases:
CatalogTableORM model for the datasets table, representing a dataset collection.
- Parameters:
id (int, optional) – Primary key (auto-generated by sequence).
name (str) – Unique short name for the dataset.
long_name (str) – Human-readable full name.
description (str, optional) – Optional description of the dataset contents.
origin (Origin) – Whether the dataset originates from FTP or an API.
- columns
- description
- files
- groups
- id
- long_name
- name
- origin
- class pysus.api.ducklake.catalog.CatalogFile(**kwargs)[source]
Bases:
CatalogTableORM model for the files table, representing individual data files.
- Parameters:
id (int, optional) – Primary key (auto-generated by sequence).
dataset_id (int) – Foreign key referencing the parent dataset.
group_id (int, optional) – Foreign key referencing the parent group.
path (str) – Object storage path to the file.
size (int) – File size in bytes.
rows (int) – Number of rows in the file.
modified (datetime) – Timestamp of the last known modification.
origin_modified (datetime, optional) – Original modification timestamp from the source.
origin_path (str) – Original source path of the file.
sha256 (str, optional) – SHA-256 hex digest for integrity verification.
year (int, optional) – Data year associated with the file.
month (int, optional) – Data month associated with the file.
state (str, optional) – Two-letter state code associated with the file.
- columns: Mapped[list[ColumnDefinition]]
- dataset: Mapped[CatalogDataset]
- group: Mapped[DatasetGroup | None]
- class pysus.api.ducklake.catalog.CatalogTable(**kwargs: Any)[source]
Bases:
BaseAbstract base for catalog tables sharing the pysus schema.
- class pysus.api.ducklake.catalog.ColumnDefinition(**kwargs)[source]
Bases:
CatalogTableORM model for dataset column metadata.
- Parameters:
id (int, optional) – Primary key (auto-generated by sequence).
dataset_id (int) – Foreign key referencing the parent dataset.
name (str) – Column name.
type (str) – Column data type string.
description (str, optional) – Optional description of the column.
nullable (bool, optional) – Whether the column allows null values.
- dataset
- dataset_id
- description
- files
- id
- name
- nullable
- type
- class pysus.api.ducklake.catalog.DatasetGroup(**kwargs)[source]
Bases:
CatalogTableORM model for dataset groups, grouping related files within a dataset.
- Parameters:
id (int, optional) – Primary key (auto-generated by sequence).
name (str) – Short name for the group.
dataset_id (int) – Foreign key referencing the parent dataset.
long_name (str) – Human-readable full name.
description (str, optional) – Optional description of the group contents.
- dataset
- dataset_id
- description
- files
- id
- long_name
- name
- class pysus.api.ducklake.catalog.Origin(value)[source]
Bases:
EnumOrigin type for a dataset.
- FTP
Dataset sourced from the FTP server.
- Type:
str
- API
Dataset sourced from an API.
- Type:
str
- API = 'api'
- FTP = 'ftp'
Application-level models for DuckLake remote resources.
Wraps catalog ORM records into BaseRemoteFile, BaseRemoteDataset, and BaseRemoteGroup interfaces used by the rest of PySUS.
- class pysus.api.ducklake.models.DuckDataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {}, record: CatalogDataset)[source]
Bases:
BaseRemoteDatasetA dataset from the DuckLake catalog, containing groups and files.
- Parameters:
record (CatalogDataset) – The underlying ORM record.
client (BaseRemoteClient) – The parent client instance.
- client: BaseRemoteClient
- property description: str
Return the description of the dataset.
- Returns:
The dataset description, or an empty string if unavailable.
- Return type:
str
- property long_name: str
Return the human-readable name of the dataset.
- Returns:
The dataset display name, falling back to the short name.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name of the dataset.
- Returns:
The dataset short name.
- Return type:
str
- record: CatalogDataset
- class pysus.api.ducklake.models.DuckGroup(*, dataset: DuckDataset, record: DatasetGroup)[source]
Bases:
BaseRemoteGroupA group of related files within a DuckLake dataset.
- Parameters:
record (DatasetGroup) – The underlying ORM record.
dataset (DuckDataset) – The parent dataset instance.
- dataset: DuckDataset
- property description: str
Return the description of the group.
- Returns:
The group description, or an empty string if unavailable.
- Return type:
str
- property long_name: str
Return the human-readable name of the group.
- Returns:
The group display name, falling back to the short name.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name of the group.
- Returns:
The group short name.
- Return type:
str
- record: DatasetGroup
- class pysus.api.ducklake.models.File(*, path: Path, type: str = 'remote', dataset: Any, group: Any = None, record: CatalogFile)[source]
Bases:
BaseRemoteFileA remote file in the DuckLake catalog with download and verification.
- Parameters:
record (CatalogFile) – The underlying ORM record.
type (str, optional) – File type identifier (default
"remote").dataset (Any) – The parent dataset object.
group (Any, optional) – The parent group object, if any.
- property basename: str
Return the file name without directory components.
- Returns:
The base file name.
- Return type:
str
- dataset: Any
- property extension: str
Return the file extension including the leading dot.
- Returns:
File extension (e.g.
'.csv').- Return type:
str
- group: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property modify: datetime
Return the last-modified timestamp.
- Returns:
The last modification timestamp.
- Return type:
datetime
- record: CatalogFile
- property rows: int
Return the number of rows in the file.
- Returns:
Row count.
- Return type:
int
- property sha256: str | None
Return the SHA-256 hash of the file, if available.
- Returns:
SHA-256 hex digest, or None if not recorded.
- Return type:
str or None
- property size: int
Return the file size in bytes.
- Returns:
File size in bytes.
- Return type:
int
- type: str
FTP Client
Async FTP client wrapping the standard ftplib for DATASUS data access.
- class pysus.api.ftp.client.FTP(*, host: str = 'ftp.datasus.gov.br', timeout: int = 60)[source]
Bases:
BaseRemoteClientAsync FTP client for navigating and downloading DATASUS data.
- async close() None[source]
Close the FTP connection and reset the internal client state.
- Raises:
Exception – Any exception raised by ftplib during disconnection.
- async connect() None[source]
Establish the FTP connection to the remote host.
- Raises:
Exception – Any exception raised by ftplib during connection.
- async datasets(**kwargs) list[Dataset][source]
Return a list of all available dataset instances for this client.
- Returns:
A list of Dataset instances for all available databases.
- Return type:
list[
Dataset]- Raises:
ConnectionError – If the FTP client is not connected.
- property description: str
Return a description of this client’s purpose.
- Returns:
A description string explaining the FTP client’s capabilities.
- Return type:
str
- property ftp: FTP | None
Return the underlying ftplib.FTP, or None if not connected.
- Returns:
The ftplib.FTP instance, or None if not connected.
- Return type:
FTPLib | None
- host: str
- async login(**kwargs) None[source]
Authenticate and connect to the FTP server (alias for connect).
- Parameters:
**kwargs – Forwarded to connect() (currently unused).
- Raises:
Exception – Any exception raised by ftplib during authentication.
- property long_name: str
Return the human-readable name of this client.
- Returns:
The human-readable client name.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name of this client.
- Returns:
The client short name (“FTP”).
- Return type:
str
- timeout: int
- class pysus.api.ftp.client.FTPFileInfo[source]
Bases:
TypedDictParsed metadata for a file or directory entry from an FTP listing.
- group: FTPGroupInfo | None
- modify: datetime
- month: int | None
- name: str
- size: int
- state: State | None
- type: str
- year: int | None
- class pysus.api.ftp.client.FTPGroupInfo[source]
Bases:
TypedDictMetadata describing a file group within a dataset.
- description: str | None
- long_name: str | None
- name: str
DATASUS FTP dataset definitions with filename parsers for each database.
- class pysus.api.ftp.databases.CIHA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetComunicação de Internação Hospitalar e Ambulatorial (CIHA).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a CIHA filename into group, state, year and month metadata.
- Parameters:
filename (str) – The raw CIHA filename to parse.
- Returns:
A dict with keys
group,state,year,month. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.CNES(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetCadastro Nacional de Estabelecimentos de Saúde (CNES).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a CNES filename into group, state, year and month metadata.
- Parameters:
filename (str) – The raw CNES filename to parse.
- Returns:
A dict with keys
group,state,year,month. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.IBGEDATASUS(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetPopulação Residente e Projeções (IBGE).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse an IBGE filename into group and year metadata.
- Parameters:
filename (str) – The raw IBGE filename to parse.
- Returns:
A dict with keys
group,year. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.PNI(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetPrograma Nacional de Imunizações (PNI).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a PNI filename into group, state and year metadata.
- Parameters:
filename (str) – The raw PNI filename to parse.
- Returns:
A dict with keys
group,state,year. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.SIA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informações Ambulatoriais — outpatient information system.
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse an SIA filename into group, state, year and month metadata.
- Parameters:
filename (str) – The raw SIA filename to parse.
- Returns:
A dict with keys
group,state,year,month. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.SIH(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informações Hospitalares (SIH).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse an SIH filename into group, state, year and month metadata.
- Parameters:
filename (str) – The raw SIH filename to parse.
- Returns:
A dict with keys
group,state,year,month. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.SIM(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informação sobre Mortalidade (SIM).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SIM filename into group, state and year metadata.
- Parameters:
filename (str) – The raw SIM filename to parse.
- Returns:
A dict with keys
group,state,year. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.SINAN(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informação de Agravos de Notificação (SINAN).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SINAN filename into group and year metadata.
- Parameters:
filename (str) – The raw SINAN filename to parse.
- Returns:
A dict with keys
group,year. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
- class pysus.api.ftp.databases.SINASC(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informações sobre Nascidos Vivos (SINASC).
- property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose in Portuguese.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SINASC filename into group, state and year metadata.
- Parameters:
filename (str) – The raw SINASC filename to parse.
- Returns:
A dict with keys
group,state,year. On parse failure values are set to None.- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the dataset short name.
- Returns:
The dataset acronym (e.g. “CIHA”).
- Return type:
str
Data model classes for FTP directories, files, groups and datasets.
- class pysus.api.ftp.models.Dataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
BaseRemoteDataset,ABCAbstract base for a DATASUS dataset, providing file discovery via FTP.
- abstract property description: str
Return a description of the dataset’s purpose.
- Returns:
A description of the dataset’s purpose.
- Return type:
str
- abstractmethod formatter(filename: str) dict[str, Any][source]
Parse a filename into metadata (group, state, year, etc.).
- Parameters:
filename (str) – The raw filename to parse.
- Returns:
A dictionary of parsed metadata fields.
- Return type:
dict[str, Any]
- group_definitions: dict[str, str]
- abstract property long_name: str
Return the dataset full name in Portuguese.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- abstract property name: str
Return the dataset short name.
- Returns:
The dataset acronym.
- Return type:
str
- class pysus.api.ftp.models.Directory(path: str, parent: Directory | Dataset | Group | None = None, client: BaseRemoteClient | None = None, formatter: Callable | None = None, dataset: Dataset | None = None)[source]
Bases:
objectA remote FTP directory lazily loaded into files and subdirectories.
- class pysus.api.ftp.models.File(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'], dataset: BaseRemoteDataset, group: BaseRemoteGroup | None = None)[source]
Bases:
BaseRemoteFileA single file on the DATASUS FTP server with parsed metadata.
- property extension: str
Return the file extension (e.g. .dbc, .dbf).
- Returns:
The file extension including the leading dot.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property modify: datetime
Return the last modification timestamp.
- Returns:
The file’s last modification datetime.
- Return type:
datetime
- Raises:
ValueError – If no modification date is available.
- property month: int | None
Return the data month extracted from the filename, if available.
- Returns:
The month as an integer, or None if not available.
- Return type:
int | None
- property size: int
Return the file size in bytes.
- Returns:
The file size in bytes.
- Return type:
int
- property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None
Return the state code extracted from the filename, if available.
- Returns:
The state code, or None if not available.
- Return type:
State | None
- property year: int | None
Return the data year extracted from the filename, if available.
- Returns:
The year as an integer, or None if not available.
- Return type:
int | None
- class pysus.api.ftp.models.Group(*, dataset: BaseRemoteDataset)[source]
Bases:
BaseRemoteGroupA group of related files within a dataset (e.g. all files of a type).
- property description: str
Return the group description.
- Returns:
The group description.
- Return type:
str
- property long_name: str
Return the human-readable group name.
- Returns:
The human-readable group name.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the group short code (e.g. ‘RD’, ‘PA’).
- Returns:
The group short code.
- Return type:
str
- path: str
DadosGov Client
HTTP client and data models for the dados.gov.br API.
- class pysus.api.dadosgov.client.ConjuntoDados(*, client: ~pysus.api.models.BaseRemoteClient | None = None, id: str, titulo: str, nome: str, recursos: list[~pysus.api.dadosgov.client.Recurso] = <factory>)[source]
Bases:
BaseModelA dataset group as returned by the dados.gov.br API.
- client: BaseRemoteClient | None
- id: str
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- slug: str
- title: str
- class pysus.api.dadosgov.client.DadosGov(*, base_url: str = 'https://dados.gov.br/dados/api')[source]
Bases:
BaseRemoteClientClient for the dados.gov.br open data portal API.
- base_url: str
- async connect(token: str | None = None) None[source]
Connect to the dados.gov.br API with the given token.
- Parameters:
token (str, optional) – The API authentication token. If not provided, uses the previously stored token.
- Raises:
ValueError – If no token is provided and none was previously stored.
- property description: str
Return a description of the client.
- Returns:
A Portuguese description of the API interface.
- Return type:
str
- async get_dataset(id: str) ConjuntoDados[source]
Fetch a single dataset by its ID.
- Parameters:
id (str) – The unique identifier of the dataset.
- Returns:
The requested dataset.
- Return type:
- Raises:
ConnectionError – If the client is not connected.
- async list_datasets(**kwargs) list[ConjuntoDados][source]
Search and list available datasets from the portal.
- Parameters:
**kwargs –
Search parameters. Supported keys:
pagina(int): Page number for pagination.nome_conjunto(str): Filter by dataset name.dados_abertos(bool): Filter by open data flag.is_privado(bool): Filter by private datasets.id_organizacao(str): Filter by organisation ID.
- Returns:
A list of datasets matching the search criteria.
- Return type:
list[ConjuntoDados]
- Raises:
ConnectionError – If the client is not connected.
- async login(token: str | None = None, **kwargs) None[source]
Authenticate with the API.
Delegates to the
connect()method.- Parameters:
token (str, optional) – The API authentication token.
**kwargs – Additional keyword arguments (currently unused).
- property long_name: str
Return the human-readable client name.
- Returns:
The full Portuguese name of the portal.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short client name.
- Returns:
The abbreviated client name
"DadosGov".- Return type:
str
- class pysus.api.dadosgov.client.Recurso(*, id: str, titulo: str, link: str, tamanho: int, dataUltimaAtualizacaoArquivo: Annotated[datetime | None, BeforeValidator(func=to_datetime, json_schema_input_type=PydanticUndefined)] = None, nomeArquivo: str | None = None)[source]
Bases:
BaseModelA single resource (file) within a dataset on dados.gov.br.
- api_size: int
- file_name: str | None
- async get_size() int[source]
Retrieve the file size from the remote server.
Makes a HEAD request (falling back to GET with a Range header) to determine the Content-Length of the resource.
- Returns:
The file size in bytes, or 0 if the size could not be determined.
- Return type:
int
- id: str
- last_modified: DateTime
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str
- url: str
- pysus.api.dadosgov.client.to_bool(value: Any) bool[source]
Parse a Brazilian Portuguese boolean value into a bool.
- Parameters:
value (Any) – The value to parse (e.g.,
"sim","não",True,False).- Returns:
True if the value represents an affirmative, False otherwise.
- Return type:
bool
- pysus.api.dadosgov.client.to_datetime(value: Any) datetime | None[source]
Parse a Brazilian date string into a datetime object.
- Parameters:
value (Any) – The value to parse, expected to be a date string in Brazilian format (e.g.,
%d/%m/%Y %H:%M:%Sor%d/%m/%Y).- Returns:
Parsed datetime object, or None if the value cannot be parsed.
- Return type:
datetime or None
Pre-configured health database definitions accessible via dados.gov.br.
- class pysus.api.dadosgov.databases.CNES(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetCadastro Nacional de Estabelecimentos de Saúde (CNES).
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the CNES information system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a CNES filename and extract metadata.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"CNES".- Return type:
str
- class pysus.api.dadosgov.databases.COVID19(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetCasos Confirmados de COVID-19.
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the COVID-19 confirmed cases dataset.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a COVID-19 filename and extract metadata.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"COVID19".- Return type:
str
- class pysus.api.dadosgov.databases.PNI(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetPrograma Nacional de Imunizações (PNI).
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the PNI vaccination monitoring system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a PNI vaccination filename into month and year.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- group_aliases: dict[str, str]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"PNI".- Return type:
str
- class pysus.api.dadosgov.databases.SIA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informações Ambulatoriais (SIA).
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the SIA outpatient information system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse an SIA filename into year.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"SIA".- Return type:
str
- class pysus.api.dadosgov.databases.SIM(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informação sobre Mortalidade (SIM).
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the SIM mortality information system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SIM filename into year.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- group_aliases: dict[str, str]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"SIM".- Return type:
str
- class pysus.api.dadosgov.databases.SINAN(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informação de Agravos de Notificação (SINAN).
- property description: str
Return a description of the dataset.
- Returns:
A Portuguese description of the SINAN notifiable diseases system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SINAN filename into state and year.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- group_aliases: dict[str, str]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"SINAN".- Return type:
str
- class pysus.api.dadosgov.databases.SINASC(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
DatasetSistema de Informações sobre Nascidos Vivos (SINASC).
- property description: str
Return a description of the dataset.
- Returns:
Portuguese description of the SINASC live birth system.
- Return type:
str
- formatter(filename: str) dict[str, Any][source]
Parse a SINASC filename into year.
- Parameters:
filename (str) – The name of the file to parse.
- Returns:
A dictionary with keys
state,year, andmonth. Unrecognised files returnNonefor all keys.- Return type:
dict[str, Any]
- group_aliases: dict[str, str]
- ids: list[str]
- property long_name: str
Return the human-readable name.
- Returns:
The full Portuguese name of the dataset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the short name.
- Returns:
The abbreviated dataset name
"SINASC".- Return type:
str
Internal domain models for datasets, groups, and files from dados.gov.br.
- class pysus.api.dadosgov.models.Dataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]
Bases:
BaseRemoteDatasetA health dataset available through dados.gov.br.
Subclasses define a list of API dataset IDs and an optional
formatter()that extracts metadata from file names.- abstractmethod formatter(filename: str) dict[str, Any][source]
Extract structured metadata from a filename.
- group_aliases: dict[str, str]
- ids: list[str]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class pysus.api.dadosgov.models.File(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'], dataset: BaseRemoteDataset, group: BaseRemoteGroup | None = None)[source]
Bases:
BaseRemoteFileA downloadable file from a dados.gov.br dataset.
- property extension: str
Return the file extension.
- Returns:
The file extension (e.g.,
".csv",".zip").- Return type:
str
- async fetch_metadata() None[source]
Fetch file size and last-modified from the remote server.
Updates
record.api_sizeandrecord.last_modifiedin-place. Silently ignores connection errors.
- async fetch_size() int[source]
Fetch the remote file size and update the local record.
Makes a HEAD request (falling back to GET with a Range header) to determine the Content-Length.
- Returns:
The file size in bytes, or 0 if the size could not be determined.
- Return type:
int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_File__context: Any) None[source]
Fetch remote metadata if size or modify date is missing.
If both
api_sizeandlast_modifiedare falsy, schedules a background task to fetch metadata from the remote server.- Parameters:
__context (Any) – Pydantic validation context (unused).
- property modify: datetime
Return the last modification date.
- Returns:
The last modification datetime.
- Return type:
datetime
- Raises:
ValueError – If the modification date has not been set.
- property month: int | None
Return the inferred month from metadata.
- Returns:
The month if present in metadata, otherwise None.
- Return type:
int or None
- property size: int
Return the file size in bytes.
- Returns:
The file size, or 0 if unknown.
- Return type:
int
- property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None
Return the inferred state from metadata.
- Returns:
The state abbreviation if present in metadata, otherwise None.
- Return type:
State or None
- type: str
- property year: int | None
Return the inferred year from metadata.
- Returns:
The year if present in metadata, otherwise None.
- Return type:
int or None
- class pysus.api.dadosgov.models.Group(*, dataset: BaseRemoteDataset)[source]
Bases:
BaseRemoteGroupA group of files within a dataset.
- property description: str
Return an empty description for the group.
- Returns:
An empty string.
- Return type:
str
- property long_name: str
Return the group title.
- Returns:
The title of the underlying API record.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property name: str
Return the group name, resolved through dataset aliases.
- Returns:
The alias for the group slug if defined, otherwise the raw slug.
- Return type:
str
- record: ConjuntoDados