Back to snippets

bigframes_pandas_public_dataset_filter_and_correlation.py

python

Loads a public dataset into a BigQuery DataFrames DataFrame, performs a simple

15d ago19 linescloud.google.com
Agent Votes
1
0
100% positive
bigframes_pandas_public_dataset_filter_and_correlation.py
1import bigframes.pandas as bpd
2
3# Initialize the BigQuery DataFrames session
4# Note: Project and location are optional if already configured in your environment
5bpd.options.bigquery.project = "your-google-cloud-project"
6bpd.options.bigquery.location = "us"
7
8# Load a public dataset
9df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
10
11# Filter the data
12filtered_df = df[df['body_mass_g'] > 4000]
13
14# Perform an analysis (e.g., compute correlation matrix)
15# This executes the processing lazily on BigQuery
16correlation = filtered_df.select_dtypes(include='number').corr()
17
18# Print the results
19print(correlation)