Suppose ABC is a dataframe as given below:
ABC <- data.frame(Column1 = c(1.222, 3.445, 5.621, 8.501, 9.302),
Column2 = c(654231, 12347, -2365, 90000, 12897),
Column3 = c('A1', 'B2', 'E3', 'C1', 'F5'),
Column4 = c('I bought it', 'The flower has a beautiful fragrance', 'It was bought by me', 'I have bought it', 'The flower smells good'),
Column5 = c('Good', 'Bad', 'Ok', 'Moderate', 'Perfect'))
My intention is to find synonymous strings in Column4. In this case, I bought it, It was bought by me and I have bought it are synonymous or similar strings and The flower has a beautiful fragrance and The flower smells good convey similar meaning.
I tried the approach of IVR in the following thread and got stuck: Find similar texts based on paraphrase detection
When I run the HLS.Extract code chunk, I get the following error message:
Error in strsplit(PlainTextDocument(synonyms(word)), ",") : non-character Argument
Using as.character didn't resolve the problem either:
Syns = function(word){
word <- as.character(word) ###
wl = gsub("(.*[[:space:]].*)","",
gsub("^c\(|[[:punct:]]+|^[[:space:]]+|[[:space:]]+$","",
unlist(strsplit(PlainTextDocument(synonyms(word)),","))))
wl = wl[wl!=""]
return(wl)
}
What is going wrong?
Is there a better way to code it using R, and create a new column additionally which has, say a number 1 as an entry for the first synonymous strings and 2 as the entry for the next set of synonymous strings?
Does it work with German text?