Skip to content

Instantly share code, notes, and snippets.

@ChengzhiZhao
Created December 9, 2021 07:06
Show Gist options
  • Select an option

  • Save ChengzhiZhao/4b656c13f80ba78c7c09c7307a9fba37 to your computer and use it in GitHub Desktop.

Select an option

Save ChengzhiZhao/4b656c13f80ba78c7c09c7307a9fba37 to your computer and use it in GitHub Desktop.
spark_SALT.scala
df.withColumn("salt_random_column", (rand * n).cast(IntegerType)) // n is the size of partition you'd like to have
.groupBy(groupByFields, "salt_random_column")
.agg(aggFields)
.groupBy(groupByFields)
.agg(aggFields)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment