ChengzhiZhao · December 9, 2021 07:06
diff --git a/spark_SALT.scala b/spark_SALT.scala
 df.withColumn("salt_random_column", (rand * n).cast(IntegerType)) // n is the size of partition you'd like to have
  .groupBy(groupByFields, "salt_random_column")
  .agg(aggFields)
  .groupBy(groupByFields)
  .agg(aggFields)
	df.withColumn("salt_random_column", (rand * n).cast(IntegerType)) // n is the size of partition you'd like to have
	.groupBy(groupByFields, "salt_random_column")
	.agg(aggFields)
	.groupBy(groupByFields)
	.agg(aggFields)
No results found