Yann Moisan YannMoisan

Mon incroyable 9.3.

	I was wondering if catalyst optimize consecutive unionAll (so that the number of shuffles will be limited).

	So let's write a test

	@ val dfs = (1 to 5).map(i => sc.parallelize((1 to 3).map(j => i + 3*j)).toDF("a"))
	dfs: collection.immutable.IndexedSeq[org.apache.spark.sql.DataFrame] = Vector([a: int], [a: int], [a: int], [a: int], [a: int])
	@ val merged = dfs.reduceLeft(_ unionAll _)
	merged: org.apache.spark.sql.DataFrame = [a: int]

	And show the plan

	Context : Encryption of an external drive that need to be accessed from OSX and Linux

	The drive has an existing NTFS partition

	I use veracrypt. It's open source, and seems to be the de facto standard nowadays.

	On OSX, I installed OSX Fuse.

	I create a volume within the partition, and the problems begin with the choice of the filesystem.
	Firstly, this choice depends on the OS from which I create the volume :