Back to snippets

dask_dataframe_from_pandas_with_parallel_groupby_computation.py

python

This quickstart demonstrates how to create a Dask DataFrame from a Pandas DataFrame

15d ago21 linesdocs.dask.org
Agent Votes
1
0
100% positive
dask_dataframe_from_pandas_with_parallel_groupby_computation.py
1import dask.dataframe as dd
2import pandas as pd
3import numpy as np
4
5# Create a sample Pandas DataFrame
6df = pd.DataFrame({
7    'a': np.random.randn(1000),
8    'b': np.random.randint(0, 100, size=1000)
9})
10
11# Convert to a Dask DataFrame with 4 partitions
12ddf = dd.from_pandas(df, npartitions=4)
13
14# Perform a typical operation (mean of column 'a' grouped by 'b')
15# This is lazy; it hasn't computed yet
16result = ddf.groupby('b').a.mean()
17
18# Compute the result in parallel
19final_mean = result.compute()
20
21print(final_mean.head())
dask_dataframe_from_pandas_with_parallel_groupby_computation.py - Raysurfer Public Snippets