Skip to content

Instantly share code, notes, and snippets.

@jess-sol
Created January 23, 2020 22:39
Show Gist options
  • Select an option

  • Save jess-sol/b95a527d2e8b3d6f42ca282630737e6f to your computer and use it in GitHub Desktop.

Select an option

Save jess-sol/b95a527d2e8b3d6f42ca282630737e6f to your computer and use it in GitHub Desktop.
import pyspark
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession, Window, Row
from datetime import datetime
import pyspark.sql.types as T
BUCKETS = 5
sc = pyspark.SparkContext('local[*]')
spark = SparkSession(sc)
schema = T.StructType([
T.StructField('Date', T.DateType()),
T.StructField('Mountain', T.StringType()),
T.StructField('M1', T.IntegerType()),
T.StructField('M2', T.IntegerType()),
])
rows = []
for i in range(5):
rows.append(Row(
Date=datetime.now().date(), # Removing this makes it work
Mountain='Mount' + str(i),
M1=100,
M2=200,
))
spark.createDataFrame(rows, schema).show(5, False)
# Removing date row makes it work
# Renaming M{i} to another value works (like V{i})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment