Production data preparation before pandas does the analysis

Arnio combines a native C++ CSV engine with Python APIs for cleaning, quality profiling, schema validation, and data-stack interop.

Read the docs View API PyPI v1.19.0

$pip install arnioclick to copy

Latest releasev1.19.0

RuntimePython 3.9+

CoreC++ + pybind11

Current mainUnreleased updates

What Arnio ships today

The website now reflects the release train through v1.19.0 and the merged work currently on main after that release.

Hardened CSV I/O

Read whole files or chunks with delimiter sniffing, explicit dtypes, decimal separators, encoding error policy, skip rows, and configurable bad-line handling.

Quality reports

Profile nulls, duplicates, uniqueness, string cleanliness, high-cardinality columns, near-constant columns, quality score, drift, and CI quality gates.

Schema contracts

Validate types, ranges, regexes, emails, URLs, dates, datetimes, country and currency codes, custom validators, and row-level failures with max-error limits.

Interop built in

Move between ArFrame, pandas, Arrow, DuckDB, JSONL, CSV, and Parquet without scattering glue code across notebooks and pipelines.

A current Arnio workflow

Python

import arnio as ar

frame = ar.read_csv(
    "sales.csv",
    dtype={"id": "string"},
)

report = ar.profile(frame)
clean, explanation = ar.auto_clean(frame, mode="safe", explain=True)

schema = ar.Schema({
    "id": ar.String(nullable=False, unique=True),
    "email": ar.Email(nullable=True),
    "homepage": ar.URL(allowed_schemes=["https"]),
})

result = ar.validate(clean, schema, max_errors=100)
table = ar.to_arrow(clean)
ar.write_parquet(clean, "sales.parquet", compression="zstd")

Current main / unreleased

These changes are already merged after v1.19.0 and should be treated as upcoming release notes.

Frame ergonomics

ArFrame.__getitem__ column selection, ArFrame.drop_columns, and stronger tuple-key replacement handling.

Quality exports

profile(exclude_columns=...), DataQualityReport.to_dict(exclude_columns=...), and DataQualityReport.to_json().

Docs and CI polish

Benchmark regression smoke checks, duplicate-header hardening, and extra dry-run coverage for the pandas accessor.

The Problem Arnio Solves

Stop writing brittle pandas glue code. Arnio handles the messy parts.

❌ The Old Way (Pandas)

Manual null handling scattered across notebooks
No schema validation before processing
Silent type coercion bugs in production
CSV encoding errors crash pipelines

⚡ The Arnio Way

Auto-clean with ar.auto_clean() in one line
Schema contracts catch bad data early
Quality reports surface issues before they bite
Hardened CSV engine handles encoding gracefully

Explore Arnio

Documentation

Install, read, clean, validate, export, and integrate.

API Reference

Current public functions, classes, and options.

Benchmarks

Reproducible benchmark commands and CI checks.

Changelog

v1.19.0 plus current-main changes.