HomeDocumentationAPI ReferenceBenchmarksContributingArchitectureRoadmapCommunityDiscordChangelog

Production data preparation before pandas does the analysis

Arnio combines a native C++ CSV engine with Python APIs for cleaning, quality profiling, schema validation, and data-stack interop.

$pip install arnioclick to copy
PyPI version Python versions CI
Latest releasev1.18.0
RuntimePython 3.9+
CoreC++ + pybind11
Current mainUnreleased updates

What Arnio ships today

The website now reflects the release train through v1.18.0 and the merged work currently on main after that release.

Hardened CSV I/O

Read whole files or chunks with delimiter sniffing, explicit dtypes, decimal separators, encoding error policy, skip rows, and configurable bad-line handling.

Quality reports

Profile nulls, duplicates, uniqueness, string cleanliness, high-cardinality columns, near-constant columns, quality score, drift, and CI quality gates.

Schema contracts

Validate types, ranges, regexes, emails, URLs, dates, datetimes, country and currency codes, custom validators, and row-level failures with max-error limits.

Interop built in

Move between ArFrame, pandas, Arrow, DuckDB, JSONL, CSV, and Parquet without scattering glue code across notebooks and pipelines.

A current Arnio workflow

Python
import arnio as ar

frame = ar.read_csv(
    "sales.csv",
    dtype={"id": "string"},
    on_bad_lines="warn",
    encoding_errors="replace",
)

report = ar.profile(frame, exclude_columns=["internal_notes"])
clean, explanation = ar.auto_clean(frame, mode="safe", explain=True)

schema = ar.Schema({
    "id": ar.String(nullable=False, unique=True),
    "email": ar.Email(nullable=True),
    "homepage": ar.URL(allowed_schemes=["https"]),
})

result = ar.validate(clean, schema, max_errors=100)
table = ar.to_arrow(clean)
ar.write_parquet(clean, "sales.parquet", compression="zstd")

Current main / unreleased

These changes are already merged after v1.18.0 and should be treated as upcoming release notes.

Frame ergonomics

ArFrame.__getitem__ column selection, ArFrame.drop_columns, and stronger tuple-key replacement handling.

Quality exports

profile(exclude_columns=...), DataQualityReport.to_dict(exclude_columns=...), and DataQualityReport.to_json().

Docs and CI polish

Benchmark regression smoke checks, duplicate-header hardening, and extra dry-run coverage for the pandas accessor.