Document Type : Research Paper
Authors
1 Faculty of Agricultural Engineering and Technology, University of Tehran, Karaj, Iran
2 Professor, Department of Agricultural Machinery Engineering, Faculty of Agricultural Engineering and Technology, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
Abstract
Keywords
Main Subjects
Detection of Iranian Foods in Images Using Deep Learning
EXTENDED ABSTRACT
Recent attention has been drawn to the application of deep learning models in various domains, with a particular focus on nutritional analysis and food quality evaluation. This study explores the use of YOLO-based models, including YOLOv5, YOLOv7, and YOLOv8, for the automatic detection and segmentation of Iranian cuisine.
The primary aim of this study is to assess the effectiveness of several YOLO-based algorithms in detecting specific food classes commonly found in Iranian meals. YOLOv7 is employed for the detection of 22 food classes, while instance segmentation for 19 different food classes is conducted using YOLOv5, YOLOv7, and YOLOv8.
A meticulously curated dataset of Iranian food images serves as the training data for the models. To ensure the models' robustness and generalization, the dataset comprises images captured under various lighting conditions and from different viewpoints. Transfer learning strategies and hyperparameter optimization techniques are employed to enhance model precision and effectiveness.
The YOLOv7 image detection method was employed to detect 22 types of Iranian food items. YOLOv7 utilized deep convolutional neural networks for hierarchical feature learning from images. After training the model with 100 epochs, its performance remained stable, justifying the choice of 40 epochs for training. YOLOv7 achieved satisfactory results, with average precision values of 77% for food detection. The model demonstrated good performance with a mean average precision and recall of 75.0% and 66.9%, respectively. However, YOLOv7 exhibited imbalanced accuracy in classifying different food classes, ranging from 50% for "Greens" and "Ketchup" to 100% for "Havij Polo" and "Kuku Sabzi." Improving accuracy in specific classes can be achieved by augmenting the training dataset and fine-tuning the model. Other factors, like hyperparameter adjustments, can also influence performance.
Evaluation of classification models, i.e., YOLOv5, YOLOv7, and YOLOv8 indicated that YOLOv7 outperformed the others with an accuracy value of 0.955. YOLOv7 showed a mean average precision value of 94.5%, making it the best model. Fine-tuning and post-processing techniques could further improve the accuracy in specific classes. In conclusion, YOLOv7 proved to be a strong and efficient method for detecting and classifying Iranian food items.
The research highlights the significance of accurate food detection and segmentation in Iranian cuisines, enabling applications in food quality assessment, health monitoring, and dietary analysis. Furthermore, the study emphasizes the impact of different YOLO-based models on performance metrics and their potential to enhance computer vision applications.
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2281. https://doi.org/10.1109/TPAMI.2012.120
Afzaal, U., Bhattarai, B., Pandeya, Y. R., & Lee, J. (2021). An Instance Segmentation Model for Strawberry Diseases Based on Mask R-CNN. Sensors 2021, Page 6565, 21(19), 6565. https://doi.org/10.3390/S21196565
Ando, Y., Ege, T., Cho, J., & Yanai, K. (2019). DepthCalorieCam: A mobile application for volume-based food calorie estimation using depth cameras. MADiMa 2019 - Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management, Co-Located with MM 2019, 76–81. https://doi.org/10.1145/3347448.3357172
Aslan, S., Ciocca, G., Mazzini, D., & Schettini, R. (2020). Benchmarking algorithms for food localization and semantic segmentation. International Journal of Machine Learning and Cybernetics, 11(12), 2827–2847. https://doi.org/10.1007/S13042-020-01153-Z/FIGURES/13
Ciocca, G., Napoletano, P., & Schettini, R. (2015). Food recognition and leftover estimation for daily diet monitoring. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 334–341. https://doi.org/10.1007/978-3-319-23222-5_41/COVER
Dehaerne, E., Dey, B., Halder, S., & De Gendt, S. (2023). Optimizing YOLOv7 for semiconductor defect detection. In Metrology, Inspection, and Process Control XXXVII , 12496, 635-642. https://doi.org/10.1117/12.2657564
Dehais, J., Anthimopoulos, M., & Mougiakakou, S. (2015). Dish detection and segmentation for dietary assessment on smartphones. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 433–440. https://doi.org/10.1007/978-3-319-23222-5_53/COVER
Dutta, A., & Zisserman, A. (2019). The VIA annotation software for images, audio and video. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2276–2279). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3350535
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation (pp. 580–587).
He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980–2988. https://doi.org/10.1109/ICCV.2017.322
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8691 LNCS(PART 3), 346–361. https://doi.org/10.1007/978-3-319-10578-9_23
Islam, A., Chowdhury, T., Hossain, M., Nahid, N., & Rifat, A. I. (2022). An Automatic System for Identifying and Categorizing Tribal Clothing Based on Convolutional Neural Networks. 4th International Conference on Emerging Research in Electronics, Computer Science and Technology, ICERECT 2022. https://doi.org/10.1109/ICERECT56837.2022.10060409
Jamnekar, R.V., Keole, R.R., Mohod, S.W., Mahore, T.R., Pande, S. (2023). Food Classification Using Deep Learning Algorithm. International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 492. Springer, Singapore. https://doi.org/10.1007/978-981-19-3679-1_62
Jiang, L., Qiu, B., Liu, X., Huang, C., & Lin, K. (2020). DeepFood: Food Image Analysis and Dietary Assessment via Deep Model. IEEE Access, 8, 47477–47489. https://doi.org/10.1109/ACCESS.2020.2973625
Jiang, S., Min, W., Liu, L., & Luo, Z. (2020). Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, 29, 265–276. https://doi.org/10.1109/TIP.2019.2929447
Kaur, P., Sikka, K., Wang, W., Belongie, S., & Divakaran, A. (2019). FoodX-251: A Dataset for Fine-grained Food Classification. https://arxiv.org/abs/1907.06167v1
Kaur, R., Kumar, R., & Gupta, M. (2023). Deep neural network for food image classification and nutrient identification: A systematic review. Reviews in Endocrine and Metabolic Disorders, 1–21. https://doi.org/10.1007/S11154-023-09795-4/METRICS
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25.
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., & Yang, J. (2020). Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Advances in Neural Information Processing Systems, 2020-December. https://arxiv.org/abs/2006.04388v1
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection (pp. 2117–2125).
Lo, F. P. W., Sun, Y., Qiu, J., & Lo, B. P. L. (2020). Point2Volume: A vision-based dietary assessment approach using view synthesis. IEEE Transactions on Industrial Informatics, 16(1), 577–586. https://doi.org/10.1109/TII.2019.2942831
Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Christodoulidis, S., Stanga, Z., & Mougiakakou, S. (2021). An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients. IEEE Transactions on Multimedia, 23, 1136–1147. https://doi.org/10.1109/TMM.2020.2993948
Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Pinault, L. F., Kiley, C., Spanakis, E. K., & Mougiakakou, S. (2020). goFOODTM: An artificial intelligence system for dietary assessment. Sensors (Switzerland), 20(15), 1–18. https://doi.org/10.3390/S20154283/SENSORS_20_04283_PDF.PDF
Mahmoodi‐Eshkaftaki, M., Haghighi, A., & Houshyar, E. (2020). Land suitability evaluation using image processing based on determination of soil texture–structure and soil features. Soil Use and Management, 36(3), 482-493. https://doi.org/10.1111/sum.12572
Mao, R., He, J., Shao, Z., Yarlagadda, S. K., & Zhu, F. (2021). Visual Aware Hierarchy Based Food Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12665 LNCS, 571–598. https://doi.org/10.1007/978-3-030-68821-9_47/COVER
Mariappan, A., Bosch, M., Zhu, F., Boushey, C. J., Kerr, D. A., Ebert, D. S., & Delp, E. J. (2009). Personal dietary assessment using mobile devices. Https://Doi.Org/10.1117/12.813556, 7246, 294–305. https://doi.org/10.1117/12.813556
Matsuda, Y., & Yanai, K. (2012). Multiple-food recognition considering co-occurrence employing manifold ranking. Proceedings of the 21st International Conference on Pattern Recognition , 2017–2020. https://ieeexplore.ieee.org/document/6460555
MMYOLO Contributors. (2022). YOLOv7 by MMYOL. https://github.com/open-mmlab/mmyolo/tree/main/
configs/yolov7, 2023. Accessed: May 13, 2023.
Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 100258. https://doi.org/10.1016/j.array.2022.100258
Nivedhitha, P., Anurithi, P., Meenashree, S. S., & Pooja Kumari, M. (2022). FOOD NUTRITION AND CALORIES ANALYSIS USING YOLO. 2022 1st International Conference on Computational Science and Technology, ICCST 2022 - Proceedings, 382–386. https://doi.org/10.1109/ICCST55948.2022.10040454
Okamoto, K., & Yanai, K. (2016). An automatic calorie estimation system of food images on a smartphone. MADiMa 2016 - Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, Co-Located with ACM Multimedia 2016, 63–70. https://doi.org/10.1145/2986035.2986040
Pallathadka, H., Jawarneh, M., Sammy, F., Garchar, V., Sanchez, D. T., & Naved, M. (2022). A Review of Using Artificial Intelligence and Machine Learning in Food and Agriculture Industry. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) (pp. 2215-2218). IEEE. https://doi.org/10.1109/ICACITE53722.2022.9823427
Pouladzadeh, P., Kuhad, P., Peddi, S. V. B., Yassine, A., & Shirmohammadi, S. (2016). Food calorie measurement using deep learning neural network. Conference Record - IEEE Instrumentation and Measurement Technology Conference, 2016-July. https://doi.org/10.1109/I2MTC.2016.7520547
Pouladzadeh, P., & Shirmohammadi, S. (2017). Mobile Multi-Food Recognition Using Deep Learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(3s). https://doi.org/10.1145/3063592
Qiu, J., Lo, F. P. W., & Lo, B. (2019). Assessing individual dietary intake in food sharing scenarios with a 360 camera and deep learning. 2019 IEEE 16th International Conference on Wearable and Implantable Body Sensor Networks, BSN 2019 - Proceedings. https://doi.org/10.1109/BSN.2019.8771095
Rajayogi, J. R., Manjunath, G., & Shobha, G. (2019). Indian Food Image Classification with Transfer Learning. CSITSS 2019 - 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution, Proceedings. https://doi.org/10.1109/CSITSS47250.2019.9031051
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You Only Look Once: Unified, Real-Time Object Detection (pp. 779–788).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Sezer, A., & Altan, A. (2021). Detection of solder paste defects with an optimization‐based deep learning model using image processing techniques. Soldering and Surface Mount Technology, 33(5), 291–298. https://doi.org/10.1108/SSMT-04-2021-0013/FULL/PDF
Sezer Bülent Ecevit Üniversitesi, A., Altan Bülent Ecevit Üniversitesi, A., Sezer, A., & Altan, A. (2021). Optimization of deep learning model parameters in classification of solder paste defects. Ieeexplore.Ieee.Org. https://doi.org/10.1109/HORA52670.2021.9461342
Shima, R., Yunan, H., Fukuda, O., Okumura, H., Arai, K., & Bu, N. (2018). Object classification with deep convolutional neural network using spatial information. ICIIBMS 2017 - 2nd International Conference on Intelligent Informatics and Biomedical Sciences, 2018-January, 135–139. https://doi.org/10.1109/ICIIBMS.2017.8279704
Sun, C., Zhan, W., She, J., & Zhang, Y. (2020). Object Detection from the Video Taken by Drone via Convolutional Neural Networks. Mathematical Problems in Engineering, 2020. https://doi.org/10.1155/2020/4013647
Sun, J., Radecka, K., & Zilic, Z. (2019). FoodTracker: A Real-time Food Detection Mobile Application by Deep Convolutional Neural Networks. https://arxiv.org/abs/1909.05994v2
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Taheri-Garavand,A., Nasiri, A., & Banan, A. (2021). Deep Learning Algorithm Development for Intelligent Detection and Classification of Carp Species. Biosystem Engineering of Iran, 52(3), 391-407. [In Persian]
Tahir, G. A., & Loo, C. K. (2021). A Comprehensive Survey of Image-Based Food Recognition and Volume Estimation Methods for Dietary Assessment. Healthcare 2021, Page 1676, 9(12), 1676. https://doi.org/10.3390/HEALTHCARE9121676
Tan, M., & Le, Q. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (pp. 6105–6114). PMLR. https://proceedings.mlr.press/v97/tan19a.html
Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet: Scalable and Efficient Object Detection (pp. 10781–10790).
Wang, C. Y., Mark Liao, H. Y., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020-June, 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. http://arxiv.org/abs/2207.02696
Wang, C.-Y., Liao, H.-Y. M., & Yeh, I.-H. (2022). Designing Network Design Strategies Through Gradient Path Analysis. https://arxiv.org/abs/2211.04800v1
Weng, W., & Zhu, X. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. IEEE Access, 9, 16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408
Xu, R., Lin, H., Lu, K., Cao, L., & Liu, Y. (2021). A forest fire detection system based on ensemble learning. Forests, 12(2), 1–17. https://doi.org/10.3390/F12020217
Yumang, A. N., Banguilan, D. E. S., & Veneracion, C. K. S. (2021). Raspberry PI based Food Recognition for Visually Impaired using YOLO Algorithm. 2021 5th International Conference on Communication and Information Systems, ICCIS 2021, 165–169. https://doi.org/10.1109/ICCIS53528.2021.9645981
Zhao, H., Xu, D., Lawal, O. M., Lu, X., Ren, R., Wang, X., & Zhang, S. (2023). Jujube Fruit Instance Segmentation Based on Yolov8 Method. https://doi.org/10.2139/SSRN.4482151
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2019). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as Points. https://arxiv.org/abs/1904.07850v2
Zhu, F., Mariappan, A., Boushey, C. J., Kerr, D., Lutes, K. D., Ebert, D. S., & Delp, E. J. (2008). Technology-assisted dietary assessment. Https://Doi.Org/10.1117/12.778616, 6814, 276–285. https://doi.org/10.1117/12.778616