Skip to content

Instantly share code, notes, and snippets.

@YasuhiroYoshida
Last active April 13, 2021 21:58
Show Gist options
  • Select an option

  • Save YasuhiroYoshida/c1fd6efa55e6d8a19d4ea4fd5affffd9 to your computer and use it in GitHub Desktop.

Select an option

Save YasuhiroYoshida/c1fd6efa55e6d8a19d4ea4fd5affffd9 to your computer and use it in GitHub Desktop.
Snippet to split a dataset file in csv format into training data file and testing data file from terminal
/**
This splits a dataset file in csv format into two files,
"training data file" and "testing data file,"
at the same location as the original file,
using MLDataTable#randomSplit.
Put this in a new file and run it on terminal.
Example:
$ swift thisFile.swift originalFile.csv 0.8 5
- Parameter 3 kinds, all required
- original file's location
- proportion passed to MLDataTable#randomSplit
- seed passed to MLDataTable#randomSplit
- Returns: Void
*/
import Cocoa
import CreateML
let args = CommandLine.arguments
let path = FileManager.default.currentDirectoryPath
let fileName = args[1]
let fileNameWithoutExt = fileName.replacingOccurrences(of: ".csv", with: "")
let trainingDataFileName = "\(fileNameWithoutExt)TrainingData.csv"
let testingDataFileName = "\(fileNameWithoutExt)TestingData.csv"
let by = Double(args[2])!
let seed = Int(args[3])!
var data: MLDataTable?
do {
data = try MLDataTable(contentsOf: URL(fileURLWithPath: "\(path)/\(fileName)"))
} catch {
print("Error generating MLDataTable: \(error)")
}
let (trainingData, testingData) = data!.randomSplit(by: by, seed: seed)
do {
try trainingData.writeCSV(toFile: "\(path)/\(trainingDataFileName)")
try testingData.writeCSV(toFile: "\(path)/\(testingDataFileName)")
} catch {
print("Error generating CSV files: \(error)")
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment