Frame and Column
Typed columns store values and null masks. Frame-level guards protect row-count consistency and duplicate names.
Arnio is a small native core wrapped by Python APIs for data preparation, validation, reporting, and interop with pandas-first workflows.
CSV / JSONL / pandas records
|
v
C++ parser and columnar Frame
|
v
pybind11 extension (_arnio_cpp)
|
v
Python ArFrame wrapper
|
+--> cleaning pipeline
+--> quality profile and gates
+--> schema validation
+--> pandas / Arrow / DuckDB / ParquetThe native layer owns the columnar data model, CSV parsing, CSV writing, and performance-sensitive cleaning primitives.
Typed columns store values and null masks. Frame-level guards protect row-count consistency and duplicate names.
Handles quoted records, multiline fields, malformed row widths, decimal separators, dtype overrides, Unicode paths, and decoding policy.
Writes validated delimiters, headers, quoted values, multiline fields, and line terminators.
Common string and numeric operations run close to the stored columns where the implementation benefits from C++.
The Python package wraps native frames with stable, typed user-facing APIs. This layer validates Python inputs, translates lower-level errors into Arnio exceptions, preserves pandas attrs where supported, and exposes helpers that are easier to compose from notebooks and ETL jobs.
The quality layer converts an ArFrame to a pandas DataFrame for inspection-oriented analytics. That keeps reporting expressive while the parser and storage stay native.
| Object | Role |
|---|---|
ColumnProfile | Column-level dtype, null, uniqueness, examples, warnings, and top-value signals. |
DataQualityReport | Whole-frame score, score components, suggestions, Markdown/HTML/Pandas/JSON exports, and optional redaction/exclusions. |
ProfileComparison | Drift summary between two quality reports. |
QualityGateResult | Pass/fail wrapper for CI and release gates. |
The schema layer validates public data contracts after ingestion and cleaning. It supports strictness, row-level issues, warning severities, max-error limits, Markdown/Pandas output, JSON round-trip, schema diffs, YAML export helpers, and custom validators.
to_pandas, from_pandas, and the df.arnio accessor make pandas the analysis boundary.
to_arrow returns a pyarrow table; write_parquet writes files through the optional Parquet extra.
register_duckdb registers an ArFrame as a SQL relation through pandas interop.
ArnioCleaner provides controlled transforms with row-count and feature-name checks.
Recent releases fixed duplicate-header behavior, malformed CSV row handling, zero-column row-count preservation, duplicate column guards, JSONL nrows parsing, Windows clean targets, and clearer validation errors. These are reflected in the API and changelog pages.