My input file is under the following form:
gold,Attribute1,Attribute2
T,1,1
T,1,2
T,1,1
N,1,2
N,2,1
T,2,1
T,2,2
N,2,2
T,3,1
N,3,2
N,3,1
T,3,2
N,3,3
N,3,3
I am trying to predict the first column using the second and third columns. I would like to split this input data randomly into a training set and a test set such that all the rows having a specific combination of the values of <attribute1, attribute2> fall either in the test set or the training set. For example, all the rows with values <1,1>, <1,2>, <2,1> should fall into the training set and all the rows with values <2,2>, <3,1>, <3,2>, <3,3> should fall in the test set. This has to be made randomly, this was just an example. How can I make such a split?
question from:
https://stackoverflow.com/questions/65933830/split-into-training-set-and-test-set-with-specific-attribute-values-for-rows 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…