A short example of calculating order_value in three ways and comparing their performance.
pip install -r requirements.txtpython main.py- demo run for 3 implementations:pandera_validate(validation viaDataFrame[Schema](...))pandera_typed_no_validate(typedcast, without runtime validation)pure_polars(plain Polars)
python benchmark.py- benchmark for the same implementations plus a variant with validation disabled viaconfig_context.
- Dataset size:
200_000rows (build_benchmark_df). - Number of measurements:
20iterations per test. - Before timing: 1 warm-up call.
- Correctness check: before benchmarking, all implementations are verified to return the same result.
- Metrics:
mean_msandstd_ms.
There is an in-code option using a context manager:
with config_context(validation_enabled=False):
...You can also control this globally via env:
PANDERA_VALIDATION_ENABLED=False- disable runtime validation.PANDERA_VALIDATION_DEPTH=SCHEMA_ONLY|DATA_ONLY|SCHEMA_AND_DATA- validation depth.PANDERA_CACHE_DATAFRAME=True|False- cache dataframe during validation.PANDERA_KEEP_CACHED_DATAFRAME=True|False- keep cache after validation.
Example run without validation:
PANDERA_VALIDATION_ENABLED=False python benchmark.py- Explicit data schema (columns and types) next to transformation code.
- Early detection of data format issues at runtime.
- Better integration with static typing (
mypy+pandera.typing). - A controllable trade-off between safety and speed (validation can be enabled/disabled).
Results from the latest python benchmark.py run:
rows = 200,000
| name | mean_ms | std_ms |
|---|---|---|
| pandera_validate | 0.759 | 0.070 |
| pandera_validate_config_off | 0.125 | 0.050 |
| pandera_typed_no_validate | 0.100 | 0.002 |
| pure_polars | 0.106 | 0.011 |