Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm new to Machine Learning and Tensorflow, since I don't know python so I decide to use there javascript version (maybe more like a wrapper).

The problem is I tried to build a model that process the Natural Language. So the first step is tokenizer the text in order to feed the data to model. I did a lot research, but most of them are using python version of tensorflow that use method like: tf.keras.preprocessing.text.Tokenizer which I can't find similar in tensorflow.js. I'm stuck in this step and don't know how can I transfer text to vector that can feed to model. Please help :)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
394 views
Welcome To Ask or Share your Answers For Others

1 Answer

To transform text to vectors, there are lots of ways to do it, all depending on the use case. The most intuitive one, is the one using the term frequency, i.e , given the vocabulary of the corpus (all the words possible), all text document will be represented as a vector where each entry represents the occurrence of the word in text document.

With this vocabulary :

["machine", "learning", "is", "a", "new", "field", "in", "computer", "science"]

the following text:

["machine", "is", "a", "field", "machine", "is", "is"] 

will be transformed as this vector:

[2, 0, 3, 1, 0, 1, 0, 0, 0] 

One of the disadvantage of this technique is that there might be lots of 0 in the vector which has the same size as the vocabulary of the corpus. That is why there are others techniques. However the bag of words is often referred to. And there is a slight different version of it using tf.idf

const vocabulary = ["machine", "learning", "is", "a", "new", "field", "in", "computer", "science"]
const text = ["machine", "is", "a", "field", "machine", "is", "is"] 
const parse = (t) => vocabulary.map((w, i) => t.reduce((a, b) => b === w ? ++a : a , 0))
console.log(parse(text))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...