Current mainUnreleased

Added

  • DataQualityReport.to_json() for JSON report export.
  • profile(exclude_columns=...) and DataQualityReport.to_dict(exclude_columns=...) for privacy-aware quality exports.
  • ArFrame.__getitem__ column selection and ArFrame.drop_columns.
  • Lightweight benchmark regression checks and dry-run coverage for the pandas accessor.

Fixed

  • Reject headers that differ only by whitespace.
  • Safely handle tuple mapping keys in replace_values.
v1.18.0Latest release

Features

  • Added Arrow export API with ar.to_arrow(frame) and bool dtype detection.
  • Added max_errors support to Schema.validate() and ar.validate().

Bug fixes

  • Improved cleaning mapping validation errors by including the received type.
v1.17.1Released

Documentation

  • Clarified that chunked schema validation is not a separate streaming schema-validation contract.
v1.17.0Released

Features

  • Added URL allowed_schemes, ArFrame.from_records, ArFrame.schema_summary, schema YAML export, and ArFrame.describe.
  • Added configurable bad-line handling, CSV dtype support, skip rows, encoding error handling, decimal separators, and DuckDB registration.
  • Added write_parquet, to_dict, drop_empty_columns, winsorize_outliers, select_columns, near-constant warnings, and high-cardinality quality warnings.
  • Added pipeline context and verbose diagnostics, plus Python interface and extension stub updates.

Bug fixes

  • Hardened CSV parsing, malformed rows, permission messages, Unicode paths, duplicate headers, extra fields, and unterminated quoted-field diagnostics.
  • Fixed zero-column row-count preservation, duplicate add-column guards, drop-duplicates row-key collisions, JSONL nrows, Windows clean target, and safer cleaning validation paths.

Performance and docs

  • Optimized CSV parser allocation paths, integer parsing, string mutation paths, and unmodified-column moves.
  • Added sparse-null benchmarks, optimized theme logos, schema validation tutorial docs, and Windows build troubleshooting.

Older release details remain available in the repository CHANGELOG.md and GitHub releases.