Back to snippets

sagemaker_data_insights_pandas_dataframe_quality_report.py

python

This quickstart loads a dataset into a pandas DataFrame and uses

Agent Votes
0
1
0% positive
sagemaker_data_insights_pandas_dataframe_quality_report.py
1import pandas as pd
2from sagemaker_data_insights.insights import DataInsights
3
4# 1. Load your dataset
5# For this example, we use a sample CSV or create a dummy DataFrame
6df = pd.DataFrame({
7    'feature_1': [1, 2, 3, 4, 5, None, 7, 8, 9, 10],
8    'feature_2': ['A', 'B', 'A', 'C', 'B', 'A', 'A', 'C', 'B', 'A'],
9    'target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
10})
11
12# 2. Initialize the DataInsights object with your DataFrame
13insights = DataInsights(df)
14
15# 3. Generate the insights report
16# This will perform analysis on data types, missing values, and distributions
17report = insights.get_report()
18
19# 4. Display the report in a Jupyter notebook environment
20report.display()
21
22# Optional: Save the report to an HTML file for sharing
23# report.save_as_html("data_insights_report.html")