NFNet Pytorch Implementation
This repo contains pretrained NFNet models F0-F6 with high ImageNet accuracy from the paper High-Performance Large-Scale Image Recognition Without Normalization. The small models are as accurate as an EfficientNet-B7, but train 8.7 times faster. The large models set a new SOTA top-1 accuracy on ImageNet.
|Top-1 accuracy Brock et al.||83.6||84.7||85.1||85.7||85.9||86.0||86.5|
|Top-1 accuracy this implementation||82.82||84.63||84.90||85.46||85.66||85.62||TBD|
git clone https://github.com/benjs/nfnets_pytorch.git pip3 install -r requirements.txt
Download pretrained weights from the official repository and place them in the pretrained folder.
from pretrained import pretrained_nfnet model_F0 = pretrained_nfnet('pretrained/F0_haiku.npz') model_F1 = pretrained_nfnet('pretrained/F1_haiku.npz') # ...
The model variant is automatically derived from the parameter count in the pretrained weights file.
python3 eval.py --pretrained pretrained/F0_haiku.npz --dataset path/to/imagenet/valset/
Scaled weight standardization convolutions in your own model
Simply replace all your
WSConv2D and all your
VPGELU (variance preserving ReLU/GELU).
import torch.nn as nn from model import WSConv2D, VPReLU, VPGELU # Simply replace your nn.Conv2d layers class MyNet(nn.Module): def __init__(self): super(MyNet, self).__init__() self.activation = VPReLU(inplace=True) # or VPGELU self.conv0 = WSConv2D(in_channels=128, out_channels=256, kernel_size=1, ...) # ... def forward(self, x): out = self.activation(self.conv0(x)) # ...
SGD with adaptive gradient clipping in your own model
Simply replace your
SGD optimizer with
from optim import SGD_AGC optimizer = SGD_AGC( named_params=model.named_parameters(), # Pass named parameters lr=1e-3, momentum=0.9, clipping=0.1, # New clipping parameter weight_decay=2e-5, nesterov=True)
It is important to exclude certain layers from clipping or momentum. The authors recommends to exclude the last fully convolutional from clipping and the bias/gain parameters from weight decay:
import re for group in optimizer.param_groups: name = group['name'] # Exclude from weight decay if len(re.findall('stem.*(bias|gain)|conv.*(bias|gain)|skip_gain', name)) > 0: group['weight_decay'] = 0 # Exclude from clipping if name.startswith('linear'): group['clipping'] = None
Train your own NFNet
Adjust your desired parameters in default_config.yaml and start training.
python3 train.py --dataset /path/to/imagenet/
There is still some parts missing for complete training from scratch:
- Multi-GPU training
- Data augmentations
- FP16 activations and gradients
The implementation is still in an early stage in terms of usability / testing. If you have an idea to improve this repo open an issue, start a discussion or submit a pull request.
The current development status can be seen in this project board.