Changelog
Latest published release: v1.18.0. Current main also contains unreleased documentation, quality, frame, and benchmark updates.
Added
DataQualityReport.to_json() for JSON report export.
profile(exclude_columns=...) and DataQualityReport.to_dict(exclude_columns=...) for privacy-aware quality exports.
ArFrame.__getitem__ column selection and ArFrame.drop_columns.
- Lightweight benchmark regression checks and dry-run coverage for the pandas accessor.
Fixed
- Reject headers that differ only by whitespace.
- Safely handle tuple mapping keys in
replace_values.
Features
- Added Arrow export API with
ar.to_arrow(frame) and bool dtype detection.
- Added
max_errors support to Schema.validate() and ar.validate().
Bug fixes
- Improved cleaning mapping validation errors by including the received type.
Documentation
- Clarified that chunked schema validation is not a separate streaming schema-validation contract.
Features
- Added URL
allowed_schemes, ArFrame.from_records, ArFrame.schema_summary, schema YAML export, and ArFrame.describe.
- Added configurable bad-line handling, CSV dtype support, skip rows, encoding error handling, decimal separators, and DuckDB registration.
- Added
write_parquet, to_dict, drop_empty_columns, winsorize_outliers, select_columns, near-constant warnings, and high-cardinality quality warnings.
- Added pipeline context and verbose diagnostics, plus Python interface and extension stub updates.
Bug fixes
- Hardened CSV parsing, malformed rows, permission messages, Unicode paths, duplicate headers, extra fields, and unterminated quoted-field diagnostics.
- Fixed zero-column row-count preservation, duplicate add-column guards, drop-duplicates row-key collisions, JSONL
nrows, Windows clean target, and safer cleaning validation paths.
Performance and docs
- Optimized CSV parser allocation paths, integer parsing, string mutation paths, and unmodified-column moves.
- Added sparse-null benchmarks, optimized theme logos, schema validation tutorial docs, and Windows build troubleshooting.
Older release details remain available in the repository CHANGELOG.md and GitHub releases.