The following is an example definition for training a BatchNorm layer with channel-wise scale and bias. Typically a BatchNorm layer is inserted between convolution and rectification layers. In this example, the convolution would output the blob layerx
and the rectification would receive the layerx-bn
blob.
layer { bottom: 'layerx' top: 'layerx-bn' name: 'layerx-bn' type: 'BatchNorm'
batch_norm_param {
use_global_stats: false # calculate the mean and variance for each mini-batch
moving_average_fraction: .999 # doesn't effect training
}
param { lr_mult: 0 }
param { lr_mult: 0 }
param { lr_mult: 0 }}
# channel-wise scale and bias are separate
layer { bottom: 'layerx-bn' top: 'layerx-bn' name: 'layerx-bn-scale' type: 'Scale',
scale_param {
bias_term: true
axis: 1 # scale separately for each channel
num_axes: 1 # ... but not spatially (default)
filler { type: 'constant' value: 1 } # initialize scaling to 1
bias_filler { type: 'constant' value: 0.001 } # initialize bias
}}
More information can be found in this thread.