Skip to content

Polars

logo

Blazingly Fast DataFrame Library

Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is available in Python, Rust & NodeJS. Its key features are:

  • Fast: Polars is written from the ground up, designed close to the machine and without external dependencies.
  • I/O: First class support for all common data storage layers: local, cloud storage & databases.
  • Easy to use: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
  • Out of Core: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
  • Parallel: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
  • Vectorized Query Engine: Polars uses Apache Arrow, a columnar data format, to process your queries in a vectorized manner. It uses SIMD to optimize CPU usage.

About this guide

The Polars user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use Polars and to provide meaningful examples. The guide is split into two parts:

  • Getting Started: A 10 minute helicopter view of the library and its primary function.
  • User Guide: A detailed explanation of how the library is setup and how to use it most effectively.

If you are looking for details on a specific level / object, it is probably best to go the API documentation: Python | NodeJS | Rust.

Performance 馃殌 馃殌

Polars is very fast, and in fact is one of the best performing solutions available. See the results in h2oai's db-benchmark, revived by the DuckDB project.

Polars TPCH Benchmark results are now available on the official website.

Example

scan_csv filter groupby collect

import polars as pl

q = (
    pl.scan_csv("docs/src/data/iris.csv")
    .filter(pl.col("sepal_length") > 5)
    .groupby("species")
    .agg(pl.all().sum())
)

df = q.collect()

LazyCsvReader filter groupby collect Available on feature csv Available on feature streaming

use polars::prelude::*;

let q = LazyCsvReader::new("docs/src/data/iris.csv")
    .has_header(true)
    .finish()?
    .filter(col("sepal_length").gt(lit(5)))
    .groupby(vec![col("species")])
    .agg([col("*").sum()]);

let df = q.collect();

scanCSV filter groupBy collect

const pl = require("nodejs-polars");

q = pl
  .scanCSV("docs/src/data/iris.csv")
  .filter(pl.col("sepal_length").gt(5))
  .groupBy("species")
  .agg(pl.all().sum());

df = q.collect();

Sponsors

Community

Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:

ritchie46 stinodego alexander-beedie MarcoGorelli zundertj ghuls reswqa universalmind303 orlp dependabot[bot] mcrumiller nameexhaustion matteosantama c-peters Dandandan magarick ibENPC moritzwilksch jorgecarleitao jonashaag marcvanheerden borchero cjermain josh ryanrussell cnpryer marioloko thatlittleboy braaannigan cmdlineluser illumination-k jakob-keller messense mhconradt rben01 sorhawell SeanTroyUWO ion-elgreco svaningelgem chitralverma YuRiTan elbaro nickray adamgreg CloseChoice jrycw owrior romanovacca paq JulianCologne

Contribute

Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to implementing new features. If you're unclear on how to proceed read our contribution guide or contact us on discord.

License

This project is licensed under the terms of the MIT license.