Suppose you have a corpus, e.g.
myCorpus <- c("Carles werwa went to sadaf buy trsfr in the supermanket",
"Marta needs to werwa sadaf go to Jamaica")
I have a dictionary (data_int_syllables
) containing a list of words which I could like to remove from mytext
.
Using library('quanteda')
, I tried the following:
myTokens <- tokens(myCorpus, remove_punct = TRUE, remove_numbers = TRUE)
myTokens <- tokens_select(myTokens, names(data_int_syllables))
The issue is, this code amends myTokens
to keep only the tokens found in an English dictionary (data_int_syllables
). Instead, I want to remove all words found in data_int_syllables
.
Does anyone know how to adjust the code so that the words are removed, rather than kept?