API Reference

The pysus.api package provides a layered architecture for discovering, downloading, and reading data from Brazilian public health databases (DATASUS). It supports three remote data sources.

Architecture Overview

The package is organized into a hierarchy of abstract base classes and concrete implementations:

pysus/api/
├── __init__.py          # Package entry (re-exports PySUS)
├── client.py            # Main PySUS orchestrator
├── extensions.py        # File format handlers
├── models.py            # Abstract base classes
├── types.py             # Type aliases
├── _impl/
│   └── databases.py     # High-level convenience functions
├── ducklake/            # S3 DuckLake catalog client
├── ftp/                 # FTP client
└── dadosgov/            # dados.gov.br API client

Quick Start

The simplest way to use PySUS is via the high-level convenience functions:

from pysus import sinan

df = sinan(disease="dengue", year=2023)

Or with the async API:

from pysus.api.client import PySUS

async with PySUS() as pysus:
    files = await pysus.query(dataset="sinan", group="DENG", year=2023)
    for f in files:
        await pysus.download(f)

Main Client

Main orchestrator for the PySUS data pipeline.

Manages file downloads, local state tracking, catalog attachment, Parquet conversion, and query execution across multiple backends.

class pysus.api.client.Base(**kwargs: Any)[source]

Bases: DeclarativeBase

Base declarative class for SQLAlchemy ORM models.

metadata: ClassVar[MetaData] = MetaData()

Refers to the _schema.MetaData collection that will be used for new _schema.Table objects.

registry: ClassVar[_RegistryType] = <sqlalchemy.orm.decl_api.registry object>

Refers to the _orm.registry in use where new _orm.Mapper objects will be associated.

class pysus.api.client.DownloadStatus(value)[source]

Bases: Enum

Download status values tracked for each local file.

COMPLETED = 'completed'
DOWNLOADING = 'downloading'
FAILED = 'failed'
MISSING = 'missing'
PENDING = 'pending'
class pysus.api.client.LocalFileState(**kwargs)[source]

Bases: Base

ORM model tracking the state of a downloaded local file.

client_name: Mapped[str]
group: Mapped[str | None]
last_synced: Mapped[datetime]
month: Mapped[int | None]
path: Mapped[str]
remote_path: Mapped[str]
sha256: Mapped[str | None]
state: Mapped[str | None]
status: Mapped[DownloadStatus]
year: Mapped[int | None]
class pysus.api.client.PySUS(db_path: Path = PosixPath('/home/docs/pysus/config.db'))[source]

Bases: object

Central orchestrator for downloading and querying PySUS datasets.

async download(file: BaseRemoteFile, token: str | None = None, callback: Callable | None = None, timeout: float | None = None) BaseLocalFile[source]

Download a remote file and return a local file handle.

Skips re-download if a matching local copy already exists.

Parameters:
  • file (BaseRemoteFile) – The remote file to download.

  • token (str, optional) – Access token for authenticated clients (e.g. DadosGov).

  • callback (Callable, optional) – Progress callback invoked during the download.

  • timeout (float, optional) – Maximum seconds to wait for the download. None (default) means no timeout.

Returns:

The downloaded file wrapped in the appropriate handler.

Return type:

BaseLocalFile

Raises:
  • ValueError – If the file’s client is not recognised.

  • RuntimeError – If the download fails for any reason.

async download_to_parquet(file: BaseRemoteFile, token: str | None = None, callback: Callable[[int, int], None] | None = None, timeout: float | None = None, add_dv: bool = True) Parquet[source]

Download a file and convert it to Parquet format.

Parameters:
  • file (BaseRemoteFile) – The remote file to download and convert.

  • token (str, optional) – Access token for authenticated clients.

  • callback (Callable[[int, int], None], optional) – Progress callback.

  • timeout (float, optional) – Maximum seconds to wait for the download.

  • add_dv (bool, optional) – Whether to apply the IBGE verification digit on load (default True).

Returns:

The converted Parquet file handler.

Return type:

Parquet

Raises:

NotImplementedError – If the downloaded file type cannot be converted to Parquet.

get_completed_remote_paths() set[str][source]

Return remote paths for all successfully downloaded files.

async get_dadosgov(access_token: str | None) DadosGov[source]

Return the DadosGov client, connecting lazily if needed.

async get_ducklake() DuckLake[source]

Return the DuckLake client, initializing it lazily if needed.

async get_ftp() FTP[source]

Return the FTP client, connecting lazily if needed.

async get_local_file(file: BaseRemoteFile) BaseLocalFile | None[source]

Look up a previously downloaded file by its remote path.

get_local_hierarchy()[source]

Build a nested dict of cached files grouped by client and dataset.

Returns:

Nested dict keyed by {client: {dataset: {group: [files]}}}.

Return type:

dict

async query(client: Literal['FTP', 'DadosGov'] | None = None, dataset: str | None = None, group: str | None = None, state: str | None = None, year: int | None = None, month: int | None = None)[source]

Query available datasets through the DuckLake catalog.

read_parquet(paths: list[Path], sql: str | None = None, mode: Literal['union', 'intersection', 'strict'] = 'union', add_dv: bool = True) DuckDBPyConnection | pd.DataFrame[source]

Read Parquet files with optional schema handling and SQL filter.

Parameters:
  • paths (list of Path) – One or more Parquet file paths to read.

  • sql (str, optional) – Optional SQL filter expression applied to the result.

  • mode ({"union", "intersection", "strict"}, optional) – Schema resolution mode (default "union").

  • add_dv (bool, optional) – When True, automatically applies the IBGE verification digit to municipality code columns. If matching columns are found, a DataFrame is returned instead of a DuckDBPyConnection.

Returns:

The query result.

Return type:

DuckDBPyConnection or pd.DataFrame

Raises:

ValueError – If no paths are provided, or if the schema mode is "strict" and the files have differing schemas.

Types

Type aliases used across the PySUS API.

FileType:

Discriminated union of supported local file types (FILE, DIR, PARQUET, CSV, JSON, PDF, DBC, DBF, ZIP).

State:

Brazilian state abbreviations (AC, AL, AP, …, DF).

Utilities

pysus.api.utils.add_dv(geocode: str) str[source]

Add the IBGE verification digit to a municipality code.

Parameters:

geocode (str) – The municipality code (6 or 7 digits).

Returns:

The code with the verification digit appended, or the original string if it cannot be processed.

Return type:

str

pysus.api.utils.is_geocode_column(name: str) bool[source]

Check if a column name corresponds to an IBGE municipality code.

File Format Handlers

Map file extensions and MIME types to their handler classes.

class pysus.api.extensions.CSV(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'CSV')[source]

Bases: BaseTabularFile

Represents a CSV file with automatic encoding and separator detection.

property columns: list[str]

Return the column names from the CSV header row.

async load() DataFrame[source]

Read the entire CSV into a DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property rows: int

Return the number of data rows in the file.

async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]

Yield the CSV in chunks of the given number of rows.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.DBC(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DBC')[source]

Bases: BaseTabularFile

Represents a compressed DBC file, convertible to DBF then Parquet.

property columns: list[str]

Not supported for DBC files. Convert to Parquet first.

async load() DataFrame[source]

Convert to Parquet and load the result as a DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property rows: int

Not supported for DBC files. Convert to Parquet first.

async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]

Convert to Parquet and stream its chunks.

async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]

Decompress DBC to DBF, then convert to Parquet.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.DBCNotImported(*, path: ~pathlib.Path = <factory>, type: str | ~typing.Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'remote')[source]

Bases: BaseTabularFile

Placeholder for DBC files when optional dependency is not installed.

property columns: list[str]

Raise ImportError indicating the missing DBC dependency.

property extension: str

Return the .dbc extension.

import_err: ClassVar[str] = '\n        run "pip install pysus[dbc]" to handle DBC files.\n        Make sure you also have libffi installed on the system. It may not work\n        on Windows\n    '
async load() DataFrame[source]

Raise ImportError indicating the missing DBC dependency.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property modify: datetime

Raise ImportError indicating the missing DBC dependency.

property name: str

Raise ImportError indicating the missing DBC dependency.

path: Path
property rows: int

Raise ImportError indicating the missing DBC dependency.

property size: int

Raise ImportError indicating the missing DBC dependency.

stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]

Raise ImportError indicating the missing DBC dependency.

async to_parquet(output_path: str | Path | None = None, chunk_size: int = 10000, callback: Callable[[int, int], None] | None = None) Parquet[source]

Raise ImportError indicating the missing DBC dependency.

type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.DBF(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DBF')[source]

Bases: BaseTabularFile

Represents a dBASE (DBF) file.

property columns: list[str]

Return the field names from the DBF file.

decode_column(value)[source]

Decode a raw DBF value, handling byte strings and null bytes.

Parameters:

value (bytes or str or Any) – The value to decode.

Returns:

The decoded and stripped string, or the original value if it is neither bytes nor str.

Return type:

str or Any

async load() DataFrame[source]

Read the entire DBF file into a DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property rows: int

Return the number of records in the DBF file.

async stream(chunk_size: int = 30000) AsyncGenerator[DataFrame, None][source]

Yield the DBF records in chunks of the given size.

async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]

Convert the DBF file to Parquet format.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.Directory(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'DIR')[source]

Bases: BaseLocalFile

Represents a directory on the local filesystem.

async load() list[BaseLocalFile][source]

Load all entries inside the directory as file objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async stream(chunksize: int = 10000) AsyncGenerator[BaseLocalFile, None][source]

Yield each entry inside the directory as a file object.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.ExtensionFactory[source]

Bases: object

Factory that maps file extensions and MIME types to handler classes.

async classmethod get_file_class(path: Path) type[BaseLocalFile][source]

Return the file handler class for a given path.

First attempts MIME-type identification; falls back to extension matching.

Parameters:

path (Path) – The file path to classify.

Returns:

The handler class for the file type.

Return type:

type[BaseLocalFile]

async classmethod instantiate(path: str | Path) BaseLocalFile[source]

Create and return the appropriate file handler for a path.

Determines whether the path is a directory or a file, resolves the handler class, and instantiates it.

Parameters:

path (str or Path) – The filesystem path to wrap in a handler.

Returns:

The instantiated file handler.

Return type:

BaseLocalFile

class pysus.api.extensions.File(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'FILE')[source]

Bases: BaseLocalFile

Represents a generic local file with no special handling.

async load() bytes[source]

Read the entire file contents into memory as bytes.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async stream(chunk_size: int = 1048576) AsyncGenerator[bytes, None][source]

Yield the file contents in chunks of the given size.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.GZip(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]

Bases: BaseCompressedFile

Represents a GZip-compressed file.

async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]

Decompress the file to a target directory and return it as a file object.

async list_members() list[str][source]

Return a list containing the single decompressed file name.

async load() bytes[source]

Decompress and read the entire file contents into memory.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async open_member(member_name: str) bytes[source]

Read and return the decompressed file contents.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.JSON(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'JSON')[source]

Bases: BaseTabularFile

Represents a JSON file with tabular data.

property columns: list[str]

Return the column names from the JSON file.

async load() DataFrame[source]

Read the entire JSON file into a DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property rows: int

Return the number of rows in the JSON file.

async stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]

Yield the entire JSON file as a single DataFrame.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.PDF(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'PDF')[source]

Bases: BaseLocalFile

Represents a PDF file.

async load() bytes[source]

Read the entire PDF file contents into memory as bytes.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async stream(chunk_size: int | None = None) AsyncGenerator[bytes, None][source]

Yield the PDF file contents in chunks of the given size.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.Parquet(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'PARQUET', add_dv: bool = True)[source]

Bases: BaseTabularFile

Represents a Parquet file with optional date and integer type parsing.

add_dv: bool
property columns: list[str]

Return the column names from the Parquet schema.

async load(parse: bool = True) DataFrame[source]

Read the entire Parquet file into a DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static parse_dftypes(df: DataFrame) DataFrame[source]

Convert known date and integer columns to their proper types.

property rows: int

Return the number of rows from the Parquet metadata.

property schema: Schema

Return the Parquet schema as a PyArrow Schema object.

async stream(chunk_size: int = 10000, parse: bool = False) AsyncGenerator[DataFrame, None][source]

Yield the Parquet file in batches of the given size.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.Tar(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]

Bases: BaseCompressedFile

Represents a Tar archive file.

async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]

Extract members to a target directory and return as file objects.

async list_members() list[str][source]

Return the list of member names inside the archive.

async load() TarFile[source]

Open and return the tar archive.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async open_member(member_name: str) bytes[source]

Read and return the contents of a named archive member.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']
class pysus.api.extensions.Zip(*, path: Path, type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'] = 'ZIP')[source]

Bases: BaseCompressedFile

Represents a ZIP archive file.

async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]

Extract members to a target directory and return as file objects.

async list_members() list[str][source]

Return the list of member names inside the archive.

async load() ZipFile[source]

Open and return the ZIP archive.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

async open_member(member_name: str) bytes[source]

Read and return the contents of a named archive member.

async to_parquet(output_path: str | Path | None = None, chunk_size: int = 30000, callback: Callable[[int, int], None] | None = None) Parquet[source]

Extract the archive and convert the first tabular file to Parquet.

type: Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP']

Abstract Base Models

Abstract model hierarchy for PySUS data access.

Provides abstract base classes for local and remote file handling, organized in a layered hierarchy: BaseFile -> BaseLocalFile -> BaseTabularFile / BaseCompressedFile for local files, and BaseFile -> BaseRemoteFile for remote files, alongside BaseRemoteObject -> BaseRemoteGroup / BaseRemoteDataset / BaseRemoteClient for remote data catalogs.

class pysus.api.models.BaseCompressedFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]

Bases: BaseLocalFile, ABC

Abstract base for a compressed archive file (e.g. .zip, .gz).

Subclasses must implement list_members, open_member, and extract.

abstractmethod async extract(target_dir: Path = PosixPath('/home/docs/pysus')) list[BaseLocalFile][source]

Extract all members into target_dir and return the file objects.

abstractmethod async list_members() list[str][source]

Return the list of member names inside the archive.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstractmethod async open_member(member_name: str) Any[source]

Open and return a single archive member by name.

async stream(chunk_size: int | None = None) AsyncGenerator[Any, None][source]

Yield each archive member as it is opened.

class pysus.api.models.BaseFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]

Bases: BaseModel, ABC

Abstract base for a single file, local or remote.

Subclasses must implement name, extension, size, and modify.

property basename: str

Return the file name from the path.

abstract property extension: str

Return the file extension string.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract property modify: datetime

Return the last modification timestamp.

abstract property name: str

Return the display name of the file.

path: Path
abstract property size: int

Return the file size in bytes.

type: str | FileType
class pysus.api.models.BaseLocalFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]

Bases: BaseFile, ABC

Abstract base for a file stored on the local filesystem.

Subclasses must implement load and stream.

property extension: str

Return the file extension from the local path.

async get_hash(algorithm: str = 'sha256', chunk_size: int = 1048576) str[source]

Compute the file’s hash digest.

Parameters:
  • algorithm (str, optional) – The hash algorithm name (default "sha256").

  • chunk_size (int, optional) – Read chunk size in bytes (default 1 MiB).

Returns:

The hex digest string.

Return type:

str

abstractmethod async load() Any[source]

Load the entire file content into memory and return it.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property modify: datetime

Return the last modification timestamp from the local filesystem.

property name: str

Return the file name from the path.

path: Path
property size: int

Return the file size in bytes from the local filesystem.

abstractmethod stream(chunk_size: int = 10000) AsyncGenerator[Any, None][source]

Yield chunks of the file content as an async generator.

class pysus.api.models.BaseRemoteClient[source]

Bases: BaseRemoteObject, ABC

Abstract base for a remote API client (e.g. FTP, HTTP).

Subclasses must implement connect, close, login, datasets, and _download_file.

abstractmethod async close() None[source]

Close the connection to the remote server.

abstractmethod async connect() None[source]

Establish a connection to the remote server.

abstractmethod async datasets(**kwargs) list[source]

Return a list of available datasets matching kwargs.

abstractmethod async login(**kwargs) None[source]

Authenticate with the remote server using kwargs credentials.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pysus.api.models.BaseRemoteDataset[source]

Bases: BaseRemoteObject, SearchableMixin, ABC

Abstract base for a dataset containing groups and/or files.

Subclasses must implement _fetch_content.

client: BaseRemoteClient
property content: Sequence[BaseRemoteGroup | BaseRemoteFile]

Return the dataset content, fetching on first access.

group_definitions: dict[str, str]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

async search(**kwargs) list[BaseRemoteFile][source]

Recursively search groups and files by attribute kwargs.

Return matching file objects.

class pysus.api.models.BaseRemoteFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]

Bases: BaseFile, SearchableMixin, ABC

Abstract base for a file stored on a remote server.

Subclasses must implement _download. dataset and group link back to the containing objects.

property client: BaseRemoteClient

Return the remote client associated with this file.

dataset: BaseRemoteDataset
async download(output: str | Path | None = None, callback: Callable[[int, int], None] | None = None) BaseLocalFile[source]

Download the remote file to a local cache or output path.

Return the instantiated local file wrapper.

group: BaseRemoteGroup | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property month: int | None

Return the month associated with the file, or None.

property name: str

Return the basename as the display name.

property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None

Return the state associated with the file, or None.

property year: int | None

Return the year associated with the file, or None.

class pysus.api.models.BaseRemoteGroup[source]

Bases: BaseRemoteObject, SearchableMixin, ABC

Abstract base for a named group of remote files within a dataset.

Subclasses must implement _fetch_files.

dataset: BaseRemoteDataset
property files: list[BaseRemoteFile]

Return all files in this group, fetching them on first access.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property parent: BaseRemoteDataset

Return the parent dataset.

async search(**kwargs) list[BaseRemoteFile][source]

Filter files in this group by attribute kwargs.

Return matching file objects.

class pysus.api.models.BaseRemoteObject[source]

Bases: BaseModel, ABC

Abstract base for a named remote entity with a description.

Subclasses must implement name, long_name, and description.

abstract property description: str

Return a textual description of the entity.

abstract property long_name: str

Return the long / human-readable name.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract property name: str

Return the short name of the remote entity.

class pysus.api.models.BaseTabularFile(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'])[source]

Bases: BaseLocalFile, ABC

Abstract base for a local tabular file (e.g. CSV, Parquet).

Subclasses must implement columns, rows, load, and stream.

abstract property columns: list[str]

Return the list of column names.

abstractmethod async load() DataFrame[source]

Load the entire file into a pandas DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract property rows: int

Return the number of data rows.

abstractmethod stream(chunk_size: int = 10000) AsyncGenerator[DataFrame, None][source]

Yield pandas DataFrames in chunks as an async generator.

async to_parquet(output_path: str | Path | None = None, chunk_size: int = 10000, callback: Callable[[int, int], None] | None = None) Parquet[source]

Convert the file to Parquet format.

Parameters:
  • output_path (str or Path, optional) – Destination path for the Parquet file. Defaults to the source path with a .parquet extension.

  • chunk_size (int, optional) – Number of rows per streaming chunk (default 10 000).

  • callback (Callable[[int, int], None], optional) – Function called after each chunk with (current_rows, total_rows).

Returns:

The resulting Parquet wrapper object.

Return type:

Parquet

class pysus.api.models.SearchableMixin[source]

Bases: object

Mixin providing attribute-based filtering for remote objects.

High-Level Data Functions

High-level convenience functions for fetching Brazilian health data.

Each function wraps an asynchronous query/download pipeline and returns a pandas DataFrame. The available datasets cover disease notification (SINAN), vital statistics (SINASC, SIM), hospital admissions (SIH), ambulatory care (SIA), immunisation (PNI), census data (IBGE), health facilities (CNES), and hospitalisation records (CIHA).

pysus.api._impl.databases.ciha(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = 'CIHA', **kwargs) DataFrame[source]

Fetch CIHA hospitalisation records for state, year, month, and group.

CIHA (Comunicação de Internação Hospitalar) provides hospitalisation records.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • month (int | list[int]) – Month or list of months to fetch.

  • group (str, optional) – Additional grouping code. Default is "CIHA".

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

CIHA hospitalisation records.

Return type:

pd.DataFrame

pysus.api._impl.databases.cnes(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch CNES health facilities for a state, year, month, and group.

CNES (Cadastro Nacional de Estabelecimentos de Saúde) is the Brazilian registry of health-care facilities.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • month (int | list[int]) – Month or list of months to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

CNES health-facility records.

Return type:

pd.DataFrame

pysus.api._impl.databases.ibge(year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch IBGE census data for given year(s) and optional group.

IBGE (Instituto Brasileiro de Geografia e Estatística) provides census and demographic data.

Parameters:
  • year (int | list[int]) – Year or list of years to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

IBGE census data for the specified year(s) and group.

Return type:

pd.DataFrame

pysus.api._impl.databases.list_files(dataset: Literal['SINAN', 'SINASC', 'SIM', 'SIH', 'SIA', 'PNI', 'IBGE', 'CNES', 'CIHA'], client: Literal['FTP', 'DadosGov'] | None = None, group: str | None = None, state: str | None = None, year: int | list[int] | None = None, month: int | list[int] | None = None, **kwargs) DataFrame[source]

List catalog files filtered by client, group, state, year, and month.

Queries the PySUS API metadata and returns a DataFrame with file name, path, dataset, group, year, month, state, and last-modified timestamp for every matching file without downloading the actual data.

Parameters:
  • dataset (Literal) – Dataset name (e.g. "SINAN", "SINASC", etc.).

  • client (Literal["FTP", "DadosGov"], optional) – Data source client to query.

  • group (str, optional) – Group or disease code to filter by.

  • state (str, optional) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int], optional) – Year or list of years to filter by.

  • month (int | list[int], optional) – Month or list of months to filter by.

  • **kwargs – Additional arguments forwarded to PySUS.query().

Returns:

DataFrame with columns name, path, dataset, group, year, month, state, and modify.

Return type:

pd.DataFrame

pysus.api._impl.databases.pni(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch PNI immunisation records for a given state, year(s), and group.

PNI (Programa Nacional de Imunizações) is the Brazilian national immunisation programme.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

PNI immunisation records.

Return type:

pd.DataFrame

pysus.api._impl.databases.sia(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch SIA ambulatory care for a state, year, month, and group.

SIA (Sistema de Informação Ambulatorial) is the Brazilian ambulatory care information system.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • month (int | list[int]) – Month or list of months to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

SIA ambulatory care records.

Return type:

pd.DataFrame

pysus.api._impl.databases.sih(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], month: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch SIH hospital admissions for a state, year, month, and group.

SIH (Sistema de Informação Hospitalar) is the Brazilian hospital admission information system.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • month (int | list[int]) – Month or list of months to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

SIH hospital admission records.

Return type:

pd.DataFrame

pysus.api._impl.databases.sim(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch SIM mortality records for a given state, year(s), and group.

SIM (Sistema de Informação sobre Mortalidade) is the Brazilian mortality information system.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

SIM mortality records for the specified state, year(s), and group.

Return type:

pd.DataFrame

pysus.api._impl.databases.sinan(disease: Literal['ACBI', 'ACGR', 'ANIM', 'ANTR', 'BOTU', 'CANC', 'CHAG', 'CHIK', 'COLE', 'COQU', 'DENG', 'DERM', 'DIFT', 'ESQU', 'EXAN', 'FMAC', 'FTIF', 'HANS', 'HANT', 'HEPA', 'IEXO', 'INFL', 'LEIV', 'LEPT', 'LERD', 'LTAN', 'MALA', 'MENI', 'MENT', 'NTRA', 'PAIR', 'PEST', 'PFAN', 'PNEU', 'RAIV', 'SDTA', 'SIFA', 'SIFC', 'SIFG', 'SRC', 'TETA', 'TETN', 'TOXC', 'TOXG', 'TRAC', 'TUBE', 'VARC', 'VIOL', 'ZIKA'], year: int | list[int], **kwargs) DataFrame[source]

Fetch SINAN records for a given disease and year(s).

SINAN (Sistema de Informação de Agravos de Notificação) is the Brazilian notifiable-disease information system.

Parameters:
  • disease (Literal) – Disease code (e.g. "DENG" for dengue, "ZIKA" for zika).

  • year (int | list[int]) – Year or list of years to fetch.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

SINAN records for the specified disease and year(s).

Return type:

pd.DataFrame

pysus.api._impl.databases.sinasc(state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'], year: int | list[int], group: str | None = None, **kwargs) DataFrame[source]

Fetch SINASC birth certificates for a given state, year(s), and group.

SINASC (Sistema de Informação sobre Nascidos Vivos) is the Brazilian live birth information system.

Parameters:
  • state (State) – Two-letter state abbreviation (e.g. "RJ").

  • year (int | list[int]) – Year or list of years to fetch.

  • group (str, optional) – Additional grouping code.

  • **kwargs – Additional arguments forwarded to _fetch_data().

Returns:

SINASC birth records for the specified state, year(s), and group.

Return type:

pd.DataFrame

DuckLake Client

High-level client for DuckLake S3-based dataset catalog.

Provides authentication, catalog synchronization, dataset querying, and file download capabilities backed by a local DuckDB engine.

class pysus.api.ducklake.client.CatalogDatasetAdapter(catalog_dataset: CatalogDataset, ducklake)[source]

Bases: object

Adapter wrapping a CatalogDataset ORM record for use by File objects.

Parameters:
  • catalog_dataset (CatalogDataset) – The ORM record to wrap.

  • ducklake (DuckLake) – The parent DuckLake client instance.

property content

Query the DuckLake client for files in this dataset.

Returns:

List of files belonging to this dataset.

Return type:

list

class pysus.api.ducklake.client.DatasetGroupAdapter(dataset_group: DatasetGroup, dataset)[source]

Bases: object

Adapter wrapping a DatasetGroup ORM record for use by File objects.

Parameters:
property files

Return the list of files in this group.

Returns:

List of file objects in this group.

Return type:

list

async search(**kwargs)[source]

Search for files within this group matching the given criteria.

Parameters:

**kwargs – Arbitrary filter criteria.

Returns:

List of matching file objects.

Return type:

list

class pysus.api.ducklake.client.DuckLake(engine=None, *, endpoint: str = 'nbg1.your-objectstorage.com', region: str = 'nbg1', bucket: str = 'pysus', credentials: DuckLakeCredentials | None = None)[source]

Bases: BaseRemoteClient

Client for the DuckLake S3-based public health dataset catalog.

Parameters:
  • endpoint (str, optional) – S3-compatible object storage endpoint.

  • region (str, optional) – Storage region name.

  • bucket (str, optional) – Bucket name containing the catalog.

  • credentials (DuckLakeCredentials, optional) – Credentials for authenticated S3 operations.

  • engine (object, optional) – Pre-configured SQLAlchemy engine to reuse.

bucket: str
property catalog_path: Path

Return the local path to the downloaded catalog database.

Returns:

Filesystem path to the local catalog database file.

Return type:

Path

async close()[source]

Dispose the engine, then upload the catalog if authenticated.

Raises:

PermissionError – If the client is not authenticated but an upload is required.

async connect(force: bool = False)[source]

Connect to the catalog, downloading it first if necessary.

Parameters:

force (bool, optional) – Whether to re-download and re-connect even if already connected.

credentials: DuckLakeCredentials | None
async datasets(**kwargs) list[DuckDataset][source]

Return all datasets from the catalog as DuckDataset instances.

Parameters:

**kwargs – Additional filter arguments (currently unused).

Returns:

List of all datasets in the catalog.

Return type:

list[DuckDataset]

property description: str

Return a description of this client.

Returns:

A description string (currently empty).

Return type:

str

endpoint: str
async login(access_key: str | None = None, secret_key: str | None = None, **kwargs) None[source]

Authenticate with S3 credentials and reconnect to the catalog.

Parameters:
  • access_key (str, optional) – S3 access key ID. If omitted, credentials are cleared.

  • secret_key (str, optional) – S3 secret access key. If omitted, credentials are cleared.

  • **kwargs – Additional arguments (currently unused).

property long_name: str

Return the human-readable name of this client.

Returns:

The client display name.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name of this client.

Returns:

The client short name.

Return type:

str

async query(client: Literal['FTP', 'DadosGov'] | None = None, dataset: str | None = None, group: str | None = None, state: str | None = None, year: int | None = None, month: int | None = None) list[File][source]

Filter catalog files by client, dataset, group, state, year.

Parameters:
  • client (Literal["FTP", "DadosGov"], optional) – Source client to filter by.

  • dataset (str, optional) – Dataset name to filter by.

  • group (str, optional) – Group name pattern to filter by (case-insensitive ILIKE).

  • state (str, optional) – Two-letter state code to filter by.

  • year (int, optional) – Year to filter by.

  • month (int, optional) – Month to filter by.

Returns:

List of matching file objects.

Return type:

list[File]

region: str
class pysus.api.ducklake.client.DuckLakeCredentials(*, access_key: SecretStr, secret_key: SecretStr)[source]

Bases: BaseModel

Credentials for authenticating with the S3-compatible object storage.

Parameters:
  • access_key (SecretStr) – The S3 access key ID.

  • secret_key (SecretStr) – The S3 secret access key.

access_key: SecretStr
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

secret_key: SecretStr

SQLAlchemy ORM models for the DuckLake catalog schema.

Defines tables for datasets, groups, files, and columns stored in the pysus schema of the local DuckDB catalog.

class pysus.api.ducklake.catalog.Base(**kwargs: Any)[source]

Bases: DeclarativeBase

Base class for all DuckLake catalog ORM models.

metadata: ClassVar[MetaData] = MetaData()

Refers to the _schema.MetaData collection that will be used for new _schema.Table objects.

registry: ClassVar[_RegistryType] = <sqlalchemy.orm.decl_api.registry object>

Refers to the _orm.registry in use where new _orm.Mapper objects will be associated.

class pysus.api.ducklake.catalog.CatalogDataset(**kwargs)[source]

Bases: CatalogTable

ORM model for the datasets table, representing a dataset collection.

Parameters:
  • id (int, optional) – Primary key (auto-generated by sequence).

  • name (str) – Unique short name for the dataset.

  • long_name (str) – Human-readable full name.

  • description (str, optional) – Optional description of the dataset contents.

  • origin (Origin) – Whether the dataset originates from FTP or an API.

columns
description
files
groups
id
long_name
name
origin
class pysus.api.ducklake.catalog.CatalogFile(**kwargs)[source]

Bases: CatalogTable

ORM model for the files table, representing individual data files.

Parameters:
  • id (int, optional) – Primary key (auto-generated by sequence).

  • dataset_id (int) – Foreign key referencing the parent dataset.

  • group_id (int, optional) – Foreign key referencing the parent group.

  • path (str) – Object storage path to the file.

  • size (int) – File size in bytes.

  • rows (int) – Number of rows in the file.

  • modified (datetime) – Timestamp of the last known modification.

  • origin_modified (datetime, optional) – Original modification timestamp from the source.

  • origin_path (str) – Original source path of the file.

  • sha256 (str, optional) – SHA-256 hex digest for integrity verification.

  • year (int, optional) – Data year associated with the file.

  • month (int, optional) – Data month associated with the file.

  • state (str, optional) – Two-letter state code associated with the file.

columns: Mapped[list[ColumnDefinition]]
dataset: Mapped[CatalogDataset]
dataset_id: Mapped[int]
group: Mapped[DatasetGroup | None]
group_id: Mapped[int | None]
id: Mapped[int]
modified: Mapped[datetime]
month: Mapped[int | None]
origin_modified: Mapped[datetime | None]
origin_path: Mapped[str]
path: Mapped[str]
rows: Mapped[int]
sha256: Mapped[str | None]
size: Mapped[int]
state: Mapped[str | None]
year: Mapped[int | None]
class pysus.api.ducklake.catalog.CatalogTable(**kwargs: Any)[source]

Bases: Base

Abstract base for catalog tables sharing the pysus schema.

class pysus.api.ducklake.catalog.ColumnDefinition(**kwargs)[source]

Bases: CatalogTable

ORM model for dataset column metadata.

Parameters:
  • id (int, optional) – Primary key (auto-generated by sequence).

  • dataset_id (int) – Foreign key referencing the parent dataset.

  • name (str) – Column name.

  • type (str) – Column data type string.

  • description (str, optional) – Optional description of the column.

  • nullable (bool, optional) – Whether the column allows null values.

dataset
dataset_id
description
files
id
name
nullable
type
class pysus.api.ducklake.catalog.DatasetGroup(**kwargs)[source]

Bases: CatalogTable

ORM model for dataset groups, grouping related files within a dataset.

Parameters:
  • id (int, optional) – Primary key (auto-generated by sequence).

  • name (str) – Short name for the group.

  • dataset_id (int) – Foreign key referencing the parent dataset.

  • long_name (str) – Human-readable full name.

  • description (str, optional) – Optional description of the group contents.

dataset
dataset_id
description
files
id
long_name
name
class pysus.api.ducklake.catalog.Origin(value)[source]

Bases: Enum

Origin type for a dataset.

FTP

Dataset sourced from the FTP server.

Type:

str

API

Dataset sourced from an API.

Type:

str

API = 'api'
FTP = 'ftp'

Application-level models for DuckLake remote resources.

Wraps catalog ORM records into BaseRemoteFile, BaseRemoteDataset, and BaseRemoteGroup interfaces used by the rest of PySUS.

class pysus.api.ducklake.models.DuckDataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {}, record: CatalogDataset)[source]

Bases: BaseRemoteDataset

A dataset from the DuckLake catalog, containing groups and files.

Parameters:
client: BaseRemoteClient
property description: str

Return the description of the dataset.

Returns:

The dataset description, or an empty string if unavailable.

Return type:

str

property long_name: str

Return the human-readable name of the dataset.

Returns:

The dataset display name, falling back to the short name.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name of the dataset.

Returns:

The dataset short name.

Return type:

str

record: CatalogDataset
class pysus.api.ducklake.models.DuckGroup(*, dataset: DuckDataset, record: DatasetGroup)[source]

Bases: BaseRemoteGroup

A group of related files within a DuckLake dataset.

Parameters:
dataset: DuckDataset
property description: str

Return the description of the group.

Returns:

The group description, or an empty string if unavailable.

Return type:

str

property long_name: str

Return the human-readable name of the group.

Returns:

The group display name, falling back to the short name.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name of the group.

Returns:

The group short name.

Return type:

str

record: DatasetGroup
class pysus.api.ducklake.models.File(*, path: Path, type: str = 'remote', dataset: Any, group: Any = None, record: CatalogFile)[source]

Bases: BaseRemoteFile

A remote file in the DuckLake catalog with download and verification.

Parameters:
  • record (CatalogFile) – The underlying ORM record.

  • type (str, optional) – File type identifier (default "remote").

  • dataset (Any) – The parent dataset object.

  • group (Any, optional) – The parent group object, if any.

property basename: str

Return the file name without directory components.

Returns:

The base file name.

Return type:

str

dataset: Any
property extension: str

Return the file extension including the leading dot.

Returns:

File extension (e.g. '.csv').

Return type:

str

group: Any
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property modify: datetime

Return the last-modified timestamp.

Returns:

The last modification timestamp.

Return type:

datetime

record: CatalogFile
property rows: int

Return the number of rows in the file.

Returns:

Row count.

Return type:

int

property sha256: str | None

Return the SHA-256 hash of the file, if available.

Returns:

SHA-256 hex digest, or None if not recorded.

Return type:

str or None

property size: int

Return the file size in bytes.

Returns:

File size in bytes.

Return type:

int

type: str
async verify(path: Path) bool[source]

Verify the file matches the recorded SHA-256 hash.

Parameters:

path (Path) – Path to the downloaded file on disk.

Returns:

True if the hash matches or no hash is recorded, False otherwise.

Return type:

bool

FTP Client

Async FTP client wrapping the standard ftplib for DATASUS data access.

class pysus.api.ftp.client.FTP(*, host: str = 'ftp.datasus.gov.br', timeout: int = 60)[source]

Bases: BaseRemoteClient

Async FTP client for navigating and downloading DATASUS data.

async close() None[source]

Close the FTP connection and reset the internal client state.

Raises:

Exception – Any exception raised by ftplib during disconnection.

async connect() None[source]

Establish the FTP connection to the remote host.

Raises:

Exception – Any exception raised by ftplib during connection.

async datasets(**kwargs) list[Dataset][source]

Return a list of all available dataset instances for this client.

Returns:

A list of Dataset instances for all available databases.

Return type:

list[Dataset]

Raises:

ConnectionError – If the FTP client is not connected.

property description: str

Return a description of this client’s purpose.

Returns:

A description string explaining the FTP client’s capabilities.

Return type:

str

property ftp: FTP | None

Return the underlying ftplib.FTP, or None if not connected.

Returns:

The ftplib.FTP instance, or None if not connected.

Return type:

FTPLib | None

host: str
async login(**kwargs) None[source]

Authenticate and connect to the FTP server (alias for connect).

Parameters:

**kwargs – Forwarded to connect() (currently unused).

Raises:

Exception – Any exception raised by ftplib during authentication.

property long_name: str

Return the human-readable name of this client.

Returns:

The human-readable client name.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name of this client.

Returns:

The client short name (“FTP”).

Return type:

str

timeout: int
class pysus.api.ftp.client.FTPFileInfo[source]

Bases: TypedDict

Parsed metadata for a file or directory entry from an FTP listing.

group: FTPGroupInfo | None
modify: datetime
month: int | None
name: str
size: int
state: State | None
type: str
year: int | None
class pysus.api.ftp.client.FTPGroupInfo[source]

Bases: TypedDict

Metadata describing a file group within a dataset.

description: str | None
long_name: str | None
name: str

DATASUS FTP dataset definitions with filename parsers for each database.

class pysus.api.ftp.databases.CIHA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Comunicação de Internação Hospitalar e Ambulatorial (CIHA).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a CIHA filename into group, state, year and month metadata.

Parameters:

filename (str) – The raw CIHA filename to parse.

Returns:

A dict with keys group, state, year, month. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.CNES(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Cadastro Nacional de Estabelecimentos de Saúde (CNES).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a CNES filename into group, state, year and month metadata.

Parameters:

filename (str) – The raw CNES filename to parse.

Returns:

A dict with keys group, state, year, month. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.IBGEDATASUS(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

População Residente e Projeções (IBGE).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse an IBGE filename into group and year metadata.

Parameters:

filename (str) – The raw IBGE filename to parse.

Returns:

A dict with keys group, year. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.PNI(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Programa Nacional de Imunizações (PNI).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a PNI filename into group, state and year metadata.

Parameters:

filename (str) – The raw PNI filename to parse.

Returns:

A dict with keys group, state, year. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.SIA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informações Ambulatoriais — outpatient information system.

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse an SIA filename into group, state, year and month metadata.

Parameters:

filename (str) – The raw SIA filename to parse.

Returns:

A dict with keys group, state, year, month. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.SIH(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informações Hospitalares (SIH).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse an SIH filename into group, state, year and month metadata.

Parameters:

filename (str) – The raw SIH filename to parse.

Returns:

A dict with keys group, state, year, month. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.SIM(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informação sobre Mortalidade (SIM).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SIM filename into group, state and year metadata.

Parameters:

filename (str) – The raw SIM filename to parse.

Returns:

A dict with keys group, state, year. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.SINAN(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informação de Agravos de Notificação (SINAN).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SINAN filename into group and year metadata.

Parameters:

filename (str) – The raw SINAN filename to parse.

Returns:

A dict with keys group, year. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]
class pysus.api.ftp.databases.SINASC(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informações sobre Nascidos Vivos (SINASC).

property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose in Portuguese.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SINASC filename into group, state and year metadata.

Parameters:

filename (str) – The raw SINASC filename to parse.

Returns:

A dict with keys group, state, year. On parse failure values are set to None.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the dataset short name.

Returns:

The dataset acronym (e.g. “CIHA”).

Return type:

str

paths: list[Directory]

Data model classes for FTP directories, files, groups and datasets.

class pysus.api.ftp.models.Dataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: BaseRemoteDataset, ABC

Abstract base for a DATASUS dataset, providing file discovery via FTP.

abstract property description: str

Return a description of the dataset’s purpose.

Returns:

A description of the dataset’s purpose.

Return type:

str

abstractmethod formatter(filename: str) dict[str, Any][source]

Parse a filename into metadata (group, state, year, etc.).

Parameters:

filename (str) – The raw filename to parse.

Returns:

A dictionary of parsed metadata fields.

Return type:

dict[str, Any]

group_definitions: dict[str, str]
abstract property long_name: str

Return the dataset full name in Portuguese.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

abstract property name: str

Return the dataset short name.

Returns:

The dataset acronym.

Return type:

str

paths: list[Directory]
class pysus.api.ftp.models.Directory(path: str, parent: Directory | Dataset | Group | None = None, client: BaseRemoteClient | None = None, formatter: Callable | None = None, dataset: Dataset | None = None)[source]

Bases: object

A remote FTP directory lazily loaded into files and subdirectories.

property content: list[Directory | File]

Return the directory contents, loading from FTP if not yet cached.

Returns:

The list of files and subdirectories.

Return type:

list[Directory | File]

async load() None[source]

Fetch and parse the directory listing from the FTP server.

Raises:

ValueError – If the client is not an FTP instance.

class pysus.api.ftp.models.File(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'], dataset: BaseRemoteDataset, group: BaseRemoteGroup | None = None)[source]

Bases: BaseRemoteFile

A single file on the DATASUS FTP server with parsed metadata.

property extension: str

Return the file extension (e.g. .dbc, .dbf).

Returns:

The file extension including the leading dot.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property modify: datetime

Return the last modification timestamp.

Returns:

The file’s last modification datetime.

Return type:

datetime

Raises:

ValueError – If no modification date is available.

property month: int | None

Return the data month extracted from the filename, if available.

Returns:

The month as an integer, or None if not available.

Return type:

int | None

property size: int

Return the file size in bytes.

Returns:

The file size in bytes.

Return type:

int

property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None

Return the state code extracted from the filename, if available.

Returns:

The state code, or None if not available.

Return type:

State | None

property year: int | None

Return the data year extracted from the filename, if available.

Returns:

The year as an integer, or None if not available.

Return type:

int | None

class pysus.api.ftp.models.Group(*, dataset: BaseRemoteDataset)[source]

Bases: BaseRemoteGroup

A group of related files within a dataset (e.g. all files of a type).

property content: list[Directory | File]

Return the contents of the underlying directory.

Returns:

The directory contents.

Return type:

list[Directory | File]

property description: str

Return the group description.

Returns:

The group description.

Return type:

str

property long_name: str

Return the human-readable group name.

Returns:

The human-readable group name.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the group short code (e.g. ‘RD’, ‘PA’).

Returns:

The group short code.

Return type:

str

path: str

DadosGov Client

HTTP client and data models for the dados.gov.br API.

class pysus.api.dadosgov.client.ConjuntoDados(*, client: ~pysus.api.models.BaseRemoteClient | None = None, id: str, titulo: str, nome: str, recursos: list[~pysus.api.dadosgov.client.Recurso] = <factory>)[source]

Bases: BaseModel

A dataset group as returned by the dados.gov.br API.

client: BaseRemoteClient | None
id: str
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: list[Recurso]
slug: str
title: str
class pysus.api.dadosgov.client.DadosGov(*, base_url: str = 'https://dados.gov.br/dados/api')[source]

Bases: BaseRemoteClient

Client for the dados.gov.br open data portal API.

base_url: str
async close() None[source]

Close the underlying HTTP client and release resources.

async connect(token: str | None = None) None[source]

Connect to the dados.gov.br API with the given token.

Parameters:

token (str, optional) – The API authentication token. If not provided, uses the previously stored token.

Raises:

ValueError – If no token is provided and none was previously stored.

async datasets(**kwargs) list[Dataset][source]

Return a list of pre-configured health datasets.

Returns:

A list of available Dataset instances for known health databases.

Return type:

list[Dataset]

property description: str

Return a description of the client.

Returns:

A Portuguese description of the API interface.

Return type:

str

async get_dataset(id: str) ConjuntoDados[source]

Fetch a single dataset by its ID.

Parameters:

id (str) – The unique identifier of the dataset.

Returns:

The requested dataset.

Return type:

ConjuntoDados

Raises:

ConnectionError – If the client is not connected.

async list_datasets(**kwargs) list[ConjuntoDados][source]

Search and list available datasets from the portal.

Parameters:

**kwargs

Search parameters. Supported keys:

  • pagina (int): Page number for pagination.

  • nome_conjunto (str): Filter by dataset name.

  • dados_abertos (bool): Filter by open data flag.

  • is_privado (bool): Filter by private datasets.

  • id_organizacao (str): Filter by organisation ID.

Returns:

A list of datasets matching the search criteria.

Return type:

list[ConjuntoDados]

Raises:

ConnectionError – If the client is not connected.

async login(token: str | None = None, **kwargs) None[source]

Authenticate with the API.

Delegates to the connect() method.

Parameters:
  • token (str, optional) – The API authentication token.

  • **kwargs – Additional keyword arguments (currently unused).

property long_name: str

Return the human-readable client name.

Returns:

The full Portuguese name of the portal.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short client name.

Returns:

The abbreviated client name "DadosGov".

Return type:

str

class pysus.api.dadosgov.client.Recurso(*, id: str, titulo: str, link: str, tamanho: int, dataUltimaAtualizacaoArquivo: Annotated[datetime | None, BeforeValidator(func=to_datetime, json_schema_input_type=PydanticUndefined)] = None, nomeArquivo: str | None = None)[source]

Bases: BaseModel

A single resource (file) within a dataset on dados.gov.br.

api_size: int
file_name: str | None
async get_size() int[source]

Retrieve the file size from the remote server.

Makes a HEAD request (falling back to GET with a Range header) to determine the Content-Length of the resource.

Returns:

The file size in bytes, or 0 if the size could not be determined.

Return type:

int

id: str
last_modified: DateTime
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str
url: str
pysus.api.dadosgov.client.to_bool(value: Any) bool[source]

Parse a Brazilian Portuguese boolean value into a bool.

Parameters:

value (Any) – The value to parse (e.g., "sim", "não", True, False).

Returns:

True if the value represents an affirmative, False otherwise.

Return type:

bool

pysus.api.dadosgov.client.to_datetime(value: Any) datetime | None[source]

Parse a Brazilian date string into a datetime object.

Parameters:

value (Any) – The value to parse, expected to be a date string in Brazilian format (e.g., %d/%m/%Y %H:%M:%S or %d/%m/%Y).

Returns:

Parsed datetime object, or None if the value cannot be parsed.

Return type:

datetime or None

Pre-configured health database definitions accessible via dados.gov.br.

class pysus.api.dadosgov.databases.CNES(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Cadastro Nacional de Estabelecimentos de Saúde (CNES).

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the CNES information system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a CNES filename and extract metadata.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "CNES".

Return type:

str

class pysus.api.dadosgov.databases.COVID19(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Casos Confirmados de COVID-19.

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the COVID-19 confirmed cases dataset.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a COVID-19 filename and extract metadata.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "COVID19".

Return type:

str

class pysus.api.dadosgov.databases.PNI(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Programa Nacional de Imunizações (PNI).

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the PNI vaccination monitoring system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a PNI vaccination filename into month and year.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

group_aliases: dict[str, str]
ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "PNI".

Return type:

str

class pysus.api.dadosgov.databases.SIA(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informações Ambulatoriais (SIA).

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the SIA outpatient information system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse an SIA filename into year.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "SIA".

Return type:

str

class pysus.api.dadosgov.databases.SIM(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informação sobre Mortalidade (SIM).

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the SIM mortality information system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SIM filename into year.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

group_aliases: dict[str, str]
ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "SIM".

Return type:

str

class pysus.api.dadosgov.databases.SINAN(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informação de Agravos de Notificação (SINAN).

property description: str

Return a description of the dataset.

Returns:

A Portuguese description of the SINAN notifiable diseases system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SINAN filename into state and year.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

group_aliases: dict[str, str]
ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "SINAN".

Return type:

str

class pysus.api.dadosgov.databases.SINASC(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: Dataset

Sistema de Informações sobre Nascidos Vivos (SINASC).

property description: str

Return a description of the dataset.

Returns:

Portuguese description of the SINASC live birth system.

Return type:

str

formatter(filename: str) dict[str, Any][source]

Parse a SINASC filename into year.

Parameters:

filename (str) – The name of the file to parse.

Returns:

A dictionary with keys state, year, and month. Unrecognised files return None for all keys.

Return type:

dict[str, Any]

group_aliases: dict[str, str]
ids: list[str]
property long_name: str

Return the human-readable name.

Returns:

The full Portuguese name of the dataset.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the short name.

Returns:

The abbreviated dataset name "SINASC".

Return type:

str

Internal domain models for datasets, groups, and files from dados.gov.br.

class pysus.api.dadosgov.models.Dataset(*, client: BaseRemoteClient, group_definitions: dict[str, str] = {})[source]

Bases: BaseRemoteDataset

A health dataset available through dados.gov.br.

Subclasses define a list of API dataset IDs and an optional formatter() that extracts metadata from file names.

client: DadosGov
abstractmethod formatter(filename: str) dict[str, Any][source]

Extract structured metadata from a filename.

group_aliases: dict[str, str]
ids: list[str]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class pysus.api.dadosgov.models.File(*, path: Path, type: str | Literal['FILE', 'DIR', 'PARQUET', 'CSV', 'JSON', 'PDF', 'DBC', 'DBF', 'ZIP'], dataset: BaseRemoteDataset, group: BaseRemoteGroup | None = None)[source]

Bases: BaseRemoteFile

A downloadable file from a dados.gov.br dataset.

property extension: str

Return the file extension.

Returns:

The file extension (e.g., ".csv", ".zip").

Return type:

str

async fetch_metadata() None[source]

Fetch file size and last-modified from the remote server.

Updates record.api_size and record.last_modified in-place. Silently ignores connection errors.

async fetch_size() int[source]

Fetch the remote file size and update the local record.

Makes a HEAD request (falling back to GET with a Range header) to determine the Content-Length.

Returns:

The file size in bytes, or 0 if the size could not be determined.

Return type:

int

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_File__context: Any) None[source]

Fetch remote metadata if size or modify date is missing.

If both api_size and last_modified are falsy, schedules a background task to fetch metadata from the remote server.

Parameters:

__context (Any) – Pydantic validation context (unused).

property modify: datetime

Return the last modification date.

Returns:

The last modification datetime.

Return type:

datetime

Raises:

ValueError – If the modification date has not been set.

property month: int | None

Return the inferred month from metadata.

Returns:

The month if present in metadata, otherwise None.

Return type:

int or None

record: Recurso
property size: int

Return the file size in bytes.

Returns:

The file size, or 0 if unknown.

Return type:

int

property state: Literal['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO', 'DF'] | None

Return the inferred state from metadata.

Returns:

The state abbreviation if present in metadata, otherwise None.

Return type:

State or None

type: str
property year: int | None

Return the inferred year from metadata.

Returns:

The year if present in metadata, otherwise None.

Return type:

int or None

class pysus.api.dadosgov.models.Group(*, dataset: BaseRemoteDataset)[source]

Bases: BaseRemoteGroup

A group of files within a dataset.

property description: str

Return an empty description for the group.

Returns:

An empty string.

Return type:

str

property long_name: str

Return the group title.

Returns:

The title of the underlying API record.

Return type:

str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property name: str

Return the group name, resolved through dataset aliases.

Returns:

The alias for the group slug if defined, otherwise the raw slug.

Return type:

str

record: ConjuntoDados