Freelancer : ashishkumar70785
video classification entry

following images shows the training , inference and prediction file .Contact me to get proper .py file of training and inference as this site only accept in images. since dataset is small so accuracy is around 75percent overall and 95 % for speakers that have more audio is done using pretrained resnet18 (large modes overfit) for 60 epoch and choose the best weights.

