Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

i am currently reading " Network in Network' paper.

And in the paper, it is stated that

"the cross channel parametric pooling layer is also equivalent to convolution layer with 1x1 convolution kernel. "

My question is first of all, what is cross channel parametric pooling layer exactly mean?is it just fully connected layer?

And why is cross channel parametric pooling layer same with 1x1 convolution kernel.

It would be thankful if you answer both mathematically and with examples.

Please help me~

question from:https://stackoverflow.com/questions/65937119/why-is-1x1-conv-same-as-fully-connected-layer

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

I haven't read the paper but I have a fair idea of what this is. First of all

How is a 1x1 convolution like a fully connected layer?

So we have a feature map with dims (C, H, W), where C = (number of channels), H = height, W = width. I'll call positions in (H, W) "pixels". A 1x1 convolution will consist of C' (number of output channels of the convolution) kernels each with shape (C, 1, 1). So if we consider any pixel in the input feature map, we can apply a single (C, 1, 1) kernel to it to produce a (1, 1, 1) output. Applying C' different kernels will result in a (C', 1, 1) output. This is equivalent to applying a single fully connected layer to one pixel of the input feature map. Have a look at the following diagram to understand the action of a 1x1 convolution to a single pixel of the input feature map

enter image description here

The different colors represent different kernels of the convolution, corresponding to different output channels. You can see now how the kernels effectively comprise the weights of a single fully connected layer.

What is cross channel parametric pooling?

This is where I'm going to make a guess I'm 90% certain of (not 100% because I didn't read the paper). This is just an extension of the logic above, to whole feature maps rather than individual pixels. You're applying a cross-channel aggregation mechanism. The mechanism is parametric because it's not just a simple mean or sum or max, it's actually a parameterised weighted sum. Also note that the weights are held constant across all pixels (remember, that's how convolution kernels work). So it's essentially the same as applying the weights of a single fully connected layer to channels of a feature map in order to produce a different set of feature maps. But instead of applying the weights to individual neurons, you are applying them to the all the neurons of the feature map at the same time:

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...