Hardened CSV I/O
Read whole files or chunks with delimiter sniffing, explicit dtypes, decimal separators, encoding error policy, skip rows, and configurable bad-line handling.
Arnio combines a native C++ CSV engine with Python APIs for cleaning, quality profiling, schema validation, and data-stack interop.
The website now reflects the release train through v1.18.0 and the merged work currently on main after that release.
Read whole files or chunks with delimiter sniffing, explicit dtypes, decimal separators, encoding error policy, skip rows, and configurable bad-line handling.
Profile nulls, duplicates, uniqueness, string cleanliness, high-cardinality columns, near-constant columns, quality score, drift, and CI quality gates.
Validate types, ranges, regexes, emails, URLs, dates, datetimes, country and currency codes, custom validators, and row-level failures with max-error limits.
Move between ArFrame, pandas, Arrow, DuckDB, JSONL, CSV, and Parquet without scattering glue code across notebooks and pipelines.
import arnio as ar
frame = ar.read_csv(
"sales.csv",
dtype={"id": "string"},
on_bad_lines="warn",
encoding_errors="replace",
)
report = ar.profile(frame, exclude_columns=["internal_notes"])
clean, explanation = ar.auto_clean(frame, mode="safe", explain=True)
schema = ar.Schema({
"id": ar.String(nullable=False, unique=True),
"email": ar.Email(nullable=True),
"homepage": ar.URL(allowed_schemes=["https"]),
})
result = ar.validate(clean, schema, max_errors=100)
table = ar.to_arrow(clean)
ar.write_parquet(clean, "sales.parquet", compression="zstd")
These changes are already merged after v1.18.0 and should be treated as upcoming release notes.
ArFrame.__getitem__ column selection, ArFrame.drop_columns, and stronger tuple-key replacement handling.
profile(exclude_columns=...), DataQualityReport.to_dict(exclude_columns=...), and DataQualityReport.to_json().
Benchmark regression smoke checks, duplicate-header hardening, and extra dry-run coverage for the pandas accessor.