In the solver file, we can set a global regularization loss using the weight_decay
and regularization_type
options.
In many cases we want different weight decay rates for different layers. This can be done by setting the decay_mult
option for each layer in the network definition file, where decay_mult
is the multiplier on the global weight decay rate, so the actual weight decay rate applied for one layer is decay_mult*weight_decay
.
For example, the following defines a convolutional layer with NO weight decay regardless of the options in the solver file.
layer {
name: "Convolution1"
type: "Convolution"
bottom: "data"
top: "Convolution1"
param {
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 0
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
}
}
See this thread for more information.