Revisiting Self-Supervised Visual Representation Learning

drawing

The paper presents a large empirical study comparing different self-supervised techniques and neural architectures (all CNNs; this is a 2019 paper).
Increasing the number of filters consistently improves performance. This is similar in the fully-suppervised scenario. But in self-supervising the effect is even more pronounced. Accuracy improvements were observed even in the low-data regime.
Overall, the performance on the pretext task is not predictive of the performace on the downstream task.
For ResNets, the downstream accuracy is higher when using the representations obtained from the last layer (the one before the “logits layer”, which, in turn, is task-specific).
On the downstream task, an MLP performed just marginally better than a simple linear classifier (logistic regression).
Training the logistic regression to convergence can take a lot of epochs. They saw improvements even after 500 epochs.