Skip to content

Instantly share code, notes, and snippets.

@vmirly
Created October 12, 2013 18:23
Show Gist options
  • Select an option

  • Save vmirly/6953154 to your computer and use it in GitHub Desktop.

Select an option

Save vmirly/6953154 to your computer and use it in GitHub Desktop.
AWK: generate random sample for training and testing
awk 'BEGIN {srand()} !/^$/ {printf "%s %f\n",$0, rand()}' merged.arff | sort -n -k2 |
awk '{
if(NR<=10) {printf "%s\n", $1 >> "t01.csv"}
if(NR<=20) {printf "%s\n", $1 >> "t02.csv"}
if(NR<=40) {printf "%s\n", $1 >> "t03.csv"}
if(NR<=80) {printf "%s\n", $1 >> "t04.csv"}
if(NR<=160) {printf "%s\n", $1 >> "t05.csv"}
if(NR<=320) {printf "%s\n", $1 >> "t06.csv"}
if(NR>320) {printf "%s\n", $1 >> "test.csv"}
}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment