End-to-end CSV pipeline
benchmark_vs_pandas.py compares deterministic tall and wide datasets across pandas and Arnio.
The site now avoids stale one-off timing claims. Arnio publishes reproducible benchmark scripts, dry-run smoke tests, and current-main regression checks.
benchmark_vs_pandas.py compares deterministic tall and wide datasets across pandas and Arnio.
Dedicated scripts cover from_pandas, auto-clean memory, sparse nulls, and to-pandas overhead.
Benchmarks cover numeric parsing, strip-whitespace, combine-columns, clip-numeric, duplicate profiling, and GIL/threading behavior.
Post-v1.18.0 main adds lightweight benchmark regression checks and dry-run smoke coverage so benchmark scripts keep running in automation.
The full suite can generate large deterministic data files. Use dry-run mode first when checking a branch or CI environment.
python -m pip install -e ".[dev]"
python benchmarks/benchmark_vs_pandas.py --dry-run
pytest tests/test_benchmarks_smoke.pypython benchmarks/generate_data.py
python benchmarks/benchmark_vs_pandas.py
python benchmarks/benchmark_auto_clean_memory.py --rows 100000| Signal | Interpretation |
|---|---|
| Wall-clock time | Useful for same-machine comparisons. Small differences are normal between runs. |
| Peak memory | Use for broad regressions and compare the same Python, pandas, NumPy, and compiler stack. |
| Dry-run success | Confirms benchmark scripts import, generate tiny data, and complete without exercising full-scale performance. |
| Regression checks | Current-main CI guardrails catch accidental script breakage and large benchmark drift when baselines are configured. |
benchmarks/benchmark_vs_pandas.py — reference pandas vs Arnio workflow.benchmarks/benchmark_csv.py — parser-focused CSV work.benchmarks/benchmark_strip_whitespace.py — native whitespace cleaning.benchmarks/benchmark_sparse_nulls.py — sparse null workloads.benchmarks/benchmark_from_pandas_memory.py — conversion memory behavior.benchmarks/benchmark_auto_clean_memory.py — automatic cleaning memory behavior.Include OS, CPU, Python version, pandas and NumPy versions, compiler/build mode, Arnio commit, command, and full output. Without that context, raw seconds are not actionable.