arnio.__all__ on current origin/main. Items marked current-main are merged after the v1.18.0 tag.I/O
Read CSV-like input into an ArFrame. Supports extension-flexible paths, TSV delimiter inference, explicit dtypes, decoding policy, and configurable malformed-row handling.
Returns: ArFrame. Raises CsvReadError for read and parse failures.
Yield ArFrame chunks from large files. on_bad_lines may error, warn, or skip malformed row-width records.
Infer column names and dtypes without loading the full file.
Read JSON Lines or NDJSON into an ArFrame; mixed object columns are coerced to strings.
Write an ArFrame to CSV or TSV with delimiter and line terminator validation.
Write an ArFrame to Parquet via the optional pyarrow dependency. Install with pip install "arnio[parquet]".
Detect a likely delimiter for CSV-like input before reading or scanning.
Frame API
Python wrapper over the native C++ frame. Core properties include shape, columns, dtypes, is_empty, memory_usage(), head(), tail(), and preview().
Build an ArFrame from record dictionaries or row-like records. Added in v1.17.0.
Current-main API. Select one column as a Python list or multiple columns as an ArFrame.
Return a dict[str, list] representation for serialization or tests.
Current-main method equivalent to the top-level drop_columns helper.
Summary statistics and column summaries. schema_summary returns ColumnSummary objects with name, dtype, and nullability.
Conversion
Convert to pandas, preserving the fast zero-copy path where supported. Use copy=True for defensive isolation.
Convert a pandas DataFrame to ArFrame with dtype validation, duplicate-column checks, attrs preservation, and zero-column row-count preservation.
Export to pyarrow.Table. Install with pip install "arnio[arrow]". Added in v1.18.0.
Cleaning
Cleaning functions return new frames and are usable directly or from pipeline().
| Function | Purpose |
|---|---|
drop_nulls, keep_rows_with_nulls, fill_nulls | Control null-bearing rows and values. |
drop_columns, select_columns, drop_empty_columns, drop_columns_matching | Manage columns explicitly or by regex. |
strip_whitespace, normalize_case, normalize_unicode, trim_column_names | Normalize text and headers. |
cast_types, parse_bool_strings, round_numeric_columns, clip_numeric, winsorize_outliers | Type and numeric cleanup. |
replace_values, standardize_missing_tokens, combine_columns, coalesce_columns, safe_divide_columns | Common feature engineering and value repair helpers. |
Pipeline
Run named cleaning steps or registered Python callables. verbose=True enables lightweight diagnostics through the arnio logger and optional PipelineContext.
Manage the step registry and discover available built-in signatures.
Quality
Return DataQualityReport with row/column counts, duplicate metrics, column profiles, warnings, quality score, score components, and cleaning suggestions.
Export reports for notebooks, CI, docs, and automation. exclude_columns and to_json() are current-main updates after v1.18.0.
Compare quality over time, fail CI on drift, generate pipeline-compatible steps, or run automatic cleaning with optional audit output.
Schema
Validate frames against field definitions. max_errors was added in v1.18.0.
Int64, Float64, String, Bool, Email, URL(allowed_schemes=...), PhoneNumber, CountryCode, CurrencyCode, Date, DateTime, Regex, and Custom.
Compare schema contracts, export schema definitions, and register custom semantic validators.
Integrations
Run Arnio workflows directly from pandas DataFrames.
Register an ArFrame as a DuckDB relation through pandas interop.
Optional scikit-learn transformer for pipeline-safe data preparation. Install with pip install "arnio[sklearn]".