Lecture 13

Polars, DuckDB and RAPIDS

Amit Arora, Jeff Jacobs

Georgetown University

Fall 2025

Before we begin..

Pandas: pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

  1. Pandas is slow, well yes but also not so much if you use it the right way.
  1. Pandas 2.0 and the arrow revolution

Polars

Polars: Lightning-fast DataFrame library for Rust and Python

Why is it faster than Pandas?

  1. Written in Rust (compiled not interpreted).

  2. Uses all available cores of your machine.

  3. Use PyArrow.

  4. [My opinion] Makes it easier to write code the right away (has a strict schema and others)!

Using Polars

Install polars via pip.

pip install polars

Import polars in your Python code as

import polars as pl

Read data as usual.

df = pd.read_parquet("s3a://bigdatateaching/nyctaxi-yellow-tripdata/2021/yellow_tripdata_2021-01.parquet")

Coming from Pandas to Polars

import pandas as pd
df = pd.DataFrame({
    "type": ["m", "n", "o", "m", "m", "n", "n"],
    "c": [1, 1, 1, 2, 2, 2, 2],
})

df["size"] = df.groupby("c")["type"].transform(len)
df
import polars as pl
df = pl.DataFrame({
    "type": ["m", "n", "o", "m", "m", "n", "n"],
    "c": [1, 1, 1, 2, 2, 2, 2],
})
df.select([
    pl.all(),
    pl.col("type").count().over("c").alias("size")
])

https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html

Coming From Spark to Polars

https://pola-rs.github.io/polars-book/user-guide/coming_from_spark.html

DuckDB

DuckDB is an in-process SQL OLAP database management system

pip install duckdb==0.7.1

Duck DB

Also checkout MotherDuck

DuckDB (contd.)

Duck DB

RAPIDS

RAPIDS is a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs—and can reduce training times from days to minutes. Built on NVIDIA® CUDA-X AI™, RAPIDS unites years of development in graphics, machine learning, deep learning, high-performance computing (HPC), and more.

https://www.nvidia.com/en-us/deep-learning-ai/software/rapids