Skip to content

Instantly share code, notes, and snippets.

View YannMoisan's full-sized avatar

Yann Moisan YannMoisan

View GitHub Profile
@YannMoisan
YannMoisan / gist:66e28bb578f321de831c365f254b26e9
Last active November 29, 2025 10:07
Mon incroyable 9.3.
@YannMoisan
YannMoisan / gist:5b8f6cd51399db1a5e2f6a93a2dbfba2
Created November 10, 2016 13:58
Does catalyst optimize consecutive unionAll ?
I was wondering if catalyst optimize consecutive unionAll (so that the number of shuffles will be limited).
So let's write a test
@ val dfs = (1 to 5).map(i => sc.parallelize((1 to 3).map(j => i + 3*j)).toDF("a"))
dfs: collection.immutable.IndexedSeq[org.apache.spark.sql.DataFrame] = Vector([a: int], [a: int], [a: int], [a: int], [a: int])
@ val merged = dfs.reduceLeft(_ unionAll _)
merged: org.apache.spark.sql.DataFrame = [a: int]
And show the plan
Context : Encryption of an external drive that need to be accessed from OSX and Linux
The drive has an existing NTFS partition
I use veracrypt. It's open source, and seems to be the de facto standard nowadays.
On OSX, I installed OSX Fuse.
I create a volume within the partition, and the problems begin with the choice of the filesystem.
Firstly, this choice depends on the OS from which I create the volume :