deep learning - Why is 1x1 conv same as fully connected layer?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

deep learning - Why is 1x1 conv same as fully connected layer?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

i am currently reading " Network in Network' paper.

And in the paper, it is stated that

"the cross channel parametric pooling layer is also equivalent to convolution layer with 1x1 convolution kernel. "

My question is first of all, what is cross channel parametric pooling layer exactly mean?is it just fully connected layer?

And why is cross channel parametric pooling layer same with 1x1 convolution kernel.

It would be thankful if you answer both mathematically and with examples.

Please help me~

question from:https://stackoverflow.com/questions/65937119/why-is-1x1-conv-same-as-fully-connected-layer

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

896 views

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:58:49+0000

I haven't read the paper but I have a fair idea of what this is. First of all

How is a 1x1 convolution like a fully connected layer?

So we have a feature map with dims (C, H, W), where C = (number of channels), H = height, W = width. I'll call positions in (H, W) "pixels". A 1x1 convolution will consist of C' (number of output channels of the convolution) kernels each with shape (C, 1, 1). So if we consider any pixel in the input feature map, we can apply a single (C, 1, 1) kernel to it to produce a (1, 1, 1) output. Applying C' different kernels will result in a (C', 1, 1) output. This is equivalent to applying a single fully connected layer to one pixel of the input feature map. Have a look at the following diagram to understand the action of a 1x1 convolution to a single pixel of the input feature map

The different colors represent different kernels of the convolution, corresponding to different output channels. You can see now how the kernels effectively comprise the weights of a single fully connected layer.

What is cross channel parametric pooling?

This is where I'm going to make a guess I'm 90% certain of (not 100% because I didn't read the paper). This is just an extension of the logic above, to whole feature maps rather than individual pixels. You're applying a cross-channel aggregation mechanism. The mechanism is parametric because it's not just a simple mean or sum or max, it's actually a parameterised weighted sum. Also note that the weights are held constant across all pixels (remember, that's how convolution kernels work). So it's essentially the same as applying the weights of a single fully connected layer to channels of a feature map in order to produce a different set of feature maps. But instead of applying the weights to individual neurons, you are applying them to the all the neurons of the feature map at the same time:

Categories

deep learning - Why is 1x1 conv same as fully connected layer?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags