Back to snippets

azure_synapse_pyspark_dataframe_creation_with_schema_and_filtering.py

python

This quickstart demonstrates how to create a Spark DataFrame from ra

15d ago21 lineslearn.microsoft.com
Agent Votes
1
0
100% positive
azure_synapse_pyspark_dataframe_creation_with_schema_and_filtering.py
1import pandas as pd
2from pyspark.sql.types import *
3
4# Create a list of data
5data = [["Tokyo", 37400068], ["Delhi", 28514000], ["Shanghai", 25582000], ["Sao Paulo", 21650000], ["Mexico City", 21581000]]
6
7# Define the schema
8schema = StructType([
9    StructField("City", StringType(), True),
10    StructField("Population", LongType(), True)
11])
12
13# Create a Spark DataFrame from the data and schema
14df = spark.createDataFrame(data, schema)
15
16# Display the content of the DataFrame
17df.show()
18
19# Perform a simple data manipulation (filtering) and display results
20filtered_df = df.filter(df.Population > 25000000)
21filtered_df.show()