Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Suppose you have a corpus, e.g.

myCorpus <- c("Carles werwa went to sadaf buy trsfr in the supermanket", 
           "Marta needs to werwa sadaf go to Jamaica")

I have a dictionary (data_int_syllables) containing a list of words which I could like to remove from mytext.

Using library('quanteda'), I tried the following:

myTokens <- tokens(myCorpus, remove_punct = TRUE, remove_numbers = TRUE)
myTokens <- tokens_select(myTokens, names(data_int_syllables))

The issue is, this code amends myTokens to keep only the tokens found in an English dictionary (data_int_syllables). Instead, I want to remove all words found in data_int_syllables.

Does anyone know how to adjust the code so that the words are removed, rather than kept?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
390 views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...