The original implementation uses ResNet-101 for cityscapes dataset
The image dimension used to train the model is 1024x512
15 custom classes used
Main idea
Apply pyramid pooling to feature maps of output stride 8 of input size and concatenate the output of pyramid pooling block to its input. Perform bilinear upsampling by a factor of 8