Optimized frame detection technique in vehicle accident using deep learning

  • mardin abdullah anwer Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq
  • Shareef M. Shareef Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq
  • Abbas M. Ali Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq
Keywords: Intelligent transportation system; Object detection; video processing technique; video segmentation; Gaussian mixture model; transfer learning; deep learning; GoogleNet; AlexNet.


 Video processing becomes one of the most popular and needed steps in machine leering. Todays, Cameras are installed in many places for many reasons including government services. One of the most applications for this concern is traffic police services. One of the main problems of using videos in machine learning application is the duration of the video; which is consuming time, paperwork and space in processing. This leads to increase the computation cost through a high number of frames. This paper proposes an algorithm to optimize videos duration using a Gaussian mixture model (GMM) method for real accident video. The Histogram of Gradient (HoG) has been used to extract the features of the video frames, a scratch CNN has been designed and conducted on two common datasets; Stanford Dogs Dataset (SDD) and Vehicle Make and Model Recognition Dataset (VMMRdb) in addition to a local dataset that created for this research. The experimental work is done in two ways, the first is after applying GMM, the finding revealed that the number of frames in the dataset was decreased by nearly 51%. The second is comparing the accuracy and complexity of these datasets has been done. Whereas the experimental results of accuracy illustrated for the proposed CNN, 85% on the local dataset, 85% on SDD Dataset and 86% on VMMRdb Dataset. However, applying GoogleNet and AlexNet on the same datasets achieved 82%, 79%, 80%, 83%, 81%, 83% respectively.


[1] Minaee S., Abdolrashidi A., and Y. Wang, 2015. Iris recognition using scattering transform and texturalfeatures, in Signal Processing and Signal Processing Education Workshop SP/SPE, 2015 IEEE, 2015, pp. 37-42.
[2] Clady, Xavier & Negri, Pablo & Milgram, Maurice & Poulenard, Raphael,2008. Multi-class Vehicle Type Recognition System. 5064. 228-239. 10.1007/978-3-540-69939-2_22.
[3] S., Ranganatha, Gowramma, Y. 2018. Image Training and LBPH Based Algorithm for Face Tracking in Different Background Video Sequence. International Journal of Computer Sciences and Engineering. 6. 349-354. 10.26438/ijcse/v6i9.349354.
[4] Hasanpour, S., Rouhani M., Fayyaz, M., Sabokrou, M., 2018. Let’s keep it simple,using simple architectures to outperformdeeper and more complex architectures. arXiv:1608.06037v7
[5] J. C. Niebles, C.-W. Chen, and L. Fei-Fei. 2010. Modeling tempo- ral structure of decomposable motion segments for activity classification.In ECCV, pages 392–405. Springer.
[6] Jakob Verbeek, Nikos Vlassis, Ben Krose 2003. Efficient greedy learning of Gaussian mixture models.Neural Computation, Massachusetts Institute of Technology Press MIT Press, 2003, 15 2, pp.469-485. 10.1162/089976603762553004.
[7] Kolekar M., 2018. Intelligent video surveillance system: An algorithm Approach. CRC Press, Tayor & Francis Group.
[8] Navneet Dalal, Bill Triggs.2005. Histograms of Oriented Gradients for Human Detection. International Conference on Computer Vision & Pattern Recognition (CVPR '05), San Diego, United States. pp.886--893,
[9] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9.
[10] Siddharth Das., 2017. CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more. online at [ https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5]
[11] Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey.2012. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.
[12] H.J. Zhang, J. Wu, D. Zhong, and S.W. Smoliar,1997. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 304:643–658.
[13] H.S. Chang, S. Sull, and S.U. Lee, 1999. Efficient video indexing scheme for content- based retrieval. IEEE Transactions on Circuits and Systems for Video Technol- ogy, 98:1269–1279.
[14] M.J. Pickering and S. Ru ̈ger,2003. Evaluation of key-frame based retrieval techniques for video. Computer Vision and Image Understanding, 922-3:217–235.
[15] K.W. Sze, K.M. Lam, and G. Qiu., 2005. A new key frame representation for video segment retrieval. IEEE Transactions on Circuits and Systems for Video Tech- nology, 159:1148–1155.
[16] Dey N., Ashour A.S. 2018. Applied Examples and Applications of Localization and Tracking Problem of Multiple Speech Sources. In: Direction of Arrival Estimation and Localization of Multi-Speech Sources. SpringerBriefs in Electrical and Computer Engineering. Springer, Cham.
[17] Huang YL. 2018. Video Signal Processing. In: Dolecek G. eds Advances in Multirate Systems. Springer, Cham
[18] Sreenu, G., Saleem Durai, M.A. 2019. Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6, 48 doi:10.1186/s40537-019-0212-5
[19] Padalkar, Milind.2010. Histogram Based Efficient Video Shot Detection Algorithms. 10.13140/RG.2.1.1590.3847.
[20] A. Karpathy, G. Toderici, S. Shetty, 2014. Large-scale Video Classification with Convolutional Neural Networks.
[21] J. Sochor, A. Herout, and J. Havel,2016. BoxCars: 3D boxes as CNN input for improved fine-grained vehicle recognition, in Proc. Comput. Vis. Pattern Recognit., Jun., pp. 3006–3015.
[22] A. Mustaffa, K. HOKAO, 2013. Database development of road traffic accident case study Johor Bahru, Malaysia. Journal of Society for Transportation and Traffic Studies JSTS Vol.3 No.1.
[23] Xinchen Wang X., Zhang W.,Wu X., Xiao L., Qian Y.,Fang Z., 2017.Real-time vehicle type classification with deep convolutional neural networks.J Real-Time Image Proc DOI 10.1007/s11554-017-0712-5, Springer.
[24] Sun X., Gu J., Huang R.,Zou R.,Giron B., 2019.Surface Defects Recognition of Wheel Hub Based on Improved Faster R-CNN.Electronics 2019, 8, 481; doi:10.3390/electronics8050481.
[25] L. Yang, P. Luo, C. C. Loy, and X. Tang, 2015. A large-scale car dataset for fine-grained categorization and verification. in Proc. Comput. Vis. Pattern Recognit., Jun., pp. 3973–3981.
[26] J. Fang, Y. Zhou, Y. Yu, and S. Du,2017. Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture, IEEE Trans. Intell. Transp. Syst., vol. 18, no. 7, pp. 1782–1792, Jul.
[27] Liu H., Tang H.,Xiao W.,Guo Z., Tian L., Gao Y.,2016.Sequential Bag-of-Words model for human action classification.CAAI Transactions on Intelligence Technology, Volume 1, Issue 2, Pages 125-136
[28] K. Sirinukunwattana, S. Raza, Y. Tsang, D. Snead, I. A. Cree, and N. M. Rajpoot, 2016. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging.
How to Cite
anwer, mardin, M. Shareef, S. and M. Ali, A. (2020) “Optimized frame detection technique in vehicle accident using deep learning”, Zanco Journal of Pure and Applied Sciences, 32(4), pp. 38-47. doi: 10.21271/ZJPAS.32.4.5.
Mathematics ,Physics and Engineering Researches