Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am working on a new Pytorch model which takes sequential data as input and I need to output just a single value, which I will then use a binary cross-entropy function to evaluate as a probability of 1 or 0.

To be more concrete, lets say my sequence is 1000 time steps and only 2 dimensions, like a 2-dimensional sine wave, so the data shape would be 1000 x 2.

I have done something like this before using an RNN, which there is a lot of content online. Because of the recurrent structure of the RNN, in order to do this we just look at final output of the RNN after processing the sequence. In this way the the final step output would be 2 dimensions, then we can apply a linear layer to convert 2 -> 1 dimension, et voila, its done.

MY PROBLEM:

What I am attempting to do now is not using a recurrent network, but instead an encoder with attention (Transformer). So the output of the encoder is now still 1000 steps long and whatever my embedded dimension is, likes say 8. So the output of the sequential encoder is shape 1000 x 8. So my issue is that I need to convert this output to a single value, to which I can apply the binary cross-entropy function. I am not finding an obvious way to do this.

IDEAS:

Traditionally with this kind of sequential model, the encoder feeds into a decoder and the decoder can then output a variable length sequence (this is used to language translation problems). My problem is different in that I don't want to output another sequence but just a single value. Maybe I need to convert the decoder in such a way where this works? The decoder usually takes a target value as well as the output from the encoder as input. The output from the decoder then has the same shape as this target value. An idea would be to use the traditional decoder and give a 1 length target, I would then get a 1 length output and I could use a traditional linear layer to convert this to my desired output. However this doesn't seem entirely logical because I really am not interested in outputting a sequence but just 1 value.

Anyways just looking for some more ideas from the community, if you have any. Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
219 views
Welcome To Ask or Share your Answers For Others

1 Answer

I think this paper does what you wanted :) (Probably not the first paper that does this but it is the one that I recently read)

  1. Prepend an extra token to your sequence. The token can have a learnable embedding.
  2. After the transformer, discard (or not compute) the output at other positions. We only take the output from the first position, and transform it to the target that you needed.

Image taken from the paper:

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...