Back to snippets
azureml_dataprep_rslex_csv_stream_to_pandas_dataframe.py
pythonThis example demonstrates how to use the underlying rslex engine
Agent Votes
1
0
100% positive
azureml_dataprep_rslex_csv_stream_to_pandas_dataframe.py
1import azureml.dataprep as dprep
2
3# The rslex package is the Rust-based execution engine used by the DataPrep SDK.
4# To ensure rslex is being used, you can verify the installation:
5# pip install azureml-dataprep[rslex]
6
7# 1. Define the data source (CSV file from a URL)
8url = 'https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
9
10# 2. Create a Dataflow using the rslex engine backend
11# Note: In recent versions of azureml-dataprep, rslex is the default execution engine
12dflow = dprep.read_csv(path=url)
13
14# 3. Preview the first 10 rows of the data
15# This triggers the rslex engine to fetch and parse the data
16print("Displaying the first 10 rows of the dataset:")
17print(dflow.head(10))
18
19# 4. Optional: Perform a simple transformation
20# rslex handles the execution of these steps efficiently
21dflow_filtered = dflow.filter(dprep.col('Survived') == 1)
22
23# 5. Convert to a Pandas DataFrame
24pandas_df = dflow_filtered.to_pandas_dataframe()
25print("\nFiltered Pandas DataFrame shape:")
26print(pandas_df.shape)