Revisiting Self-Supervised Visual Representation Learning

Takeaways
- 
    The paper presents a large empirical study comparing different self-supervised techniques and neural architectures (all CNNs; this is a 2019 paper). 
- 
    Increasing the number of filters consistently improves performance. This is similar in the fully-suppervised scenario. But in self-supervising the effect is even more pronounced. Accuracy improvements were observed even in the low-data regime. 
- 
    Overall, the performance on the pretext task is not predictive of the performace on the downstream task. 
- 
    For ResNets, the downstream accuracy is higher when using the representations obtained from the last layer (the one before the “logits layer”, which, in turn, is task-specific). 
- 
    On the downstream task, an MLP performed just marginally better than a simple linear classifier (logistic regression). 
- 
    Training the logistic regression to convergence can take a lot of epochs. They saw improvements even after 500 epochs.