Last active
April 13, 2021 21:58
-
-
Save YasuhiroYoshida/c1fd6efa55e6d8a19d4ea4fd5affffd9 to your computer and use it in GitHub Desktop.
Snippet to split a dataset file in csv format into training data file and testing data file from terminal
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /** | |
| This splits a dataset file in csv format into two files, | |
| "training data file" and "testing data file," | |
| at the same location as the original file, | |
| using MLDataTable#randomSplit. | |
| Put this in a new file and run it on terminal. | |
| Example: | |
| $ swift thisFile.swift originalFile.csv 0.8 5 | |
| - Parameter 3 kinds, all required | |
| - original file's location | |
| - proportion passed to MLDataTable#randomSplit | |
| - seed passed to MLDataTable#randomSplit | |
| - Returns: Void | |
| */ | |
| import Cocoa | |
| import CreateML | |
| let args = CommandLine.arguments | |
| let path = FileManager.default.currentDirectoryPath | |
| let fileName = args[1] | |
| let fileNameWithoutExt = fileName.replacingOccurrences(of: ".csv", with: "") | |
| let trainingDataFileName = "\(fileNameWithoutExt)TrainingData.csv" | |
| let testingDataFileName = "\(fileNameWithoutExt)TestingData.csv" | |
| let by = Double(args[2])! | |
| let seed = Int(args[3])! | |
| var data: MLDataTable? | |
| do { | |
| data = try MLDataTable(contentsOf: URL(fileURLWithPath: "\(path)/\(fileName)")) | |
| } catch { | |
| print("Error generating MLDataTable: \(error)") | |
| } | |
| let (trainingData, testingData) = data!.randomSplit(by: by, seed: seed) | |
| do { | |
| try trainingData.writeCSV(toFile: "\(path)/\(trainingDataFileName)") | |
| try testingData.writeCSV(toFile: "\(path)/\(testingDataFileName)") | |
| } catch { | |
| print("Error generating CSV files: \(error)") | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment