Detection of Iranian foods in images using deep learning

Document Type : Research Paper

Authors

1 Faculty of Agricultural Engineering and Technology, University of Tehran, Karaj, Iran

2 Professor, Department of Agricultural Machinery Engineering, Faculty of Agricultural Engineering and Technology, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

Abstract

Maintaining the well-being of individuals is greatly influenced by a healthy lifestyle and balanced diet. The identification and segmentation of food items can be improved by utilizing a mobile-based system in this era of rapid lifestyle changes and technology. This article introduces a novel system that, upon receiving input images, detects and segmentation the food items within the images. The system utilizes deep learning techniques and models, employing the YOLO algorithm. By incorporating regression-based simple methods, the system achieves the capability to detect and categorize food items in a single pass through the network, aiming to enhance accuracy and speed in the detection process. YOLOv7 was employed for food detection and YOLOv5, YOLOv7, and YOLOv8 was utilized for image segmentation. Based on the results, the accuracy, recall, and average precision values for YOLOv7 were 0.844, 0.924, and 0.932, respectively. Furthermore, the instance segmentation performance of YOLOv7 outperformed YOLOv5 and YOLOv8, with precision, recall, and mean average precision values of 0.959, 0.943, and 0.906, respectively. These findings underscore the high accuracy in detecting Iranian foods and the remarkable speed and precision in food image segmentation attainable through advanced deep-learning algorithms. Consequently, this study establishes that accurate detection of Iranian foods can be accomplished through the utilization of sophisticated deep-learning techniques. This research focuses on promoting a healthy lifestyle through intelligent technology and novel deep learning algorithms in Iran.

Keywords

Main Subjects


Detection of Iranian Foods in Images Using Deep Learning

 

EXTENDED ABSTRACT

 

Introduction

Recent attention has been drawn to the application of deep learning models in various domains, with a particular focus on nutritional analysis and food quality evaluation. This study explores the use of YOLO-based models, including YOLOv5, YOLOv7, and YOLOv8, for the automatic detection and segmentation of Iranian cuisine.

Objective:

The primary aim of this study is to assess the effectiveness of several YOLO-based algorithms in detecting specific food classes commonly found in Iranian meals. YOLOv7 is employed for the detection of 22 food classes, while instance segmentation for 19 different food classes is conducted using YOLOv5, YOLOv7, and YOLOv8.

Method:

A meticulously curated dataset of Iranian food images serves as the training data for the models. To ensure the models' robustness and generalization, the dataset comprises images captured under various lighting conditions and from different viewpoints. Transfer learning strategies and hyperparameter optimization techniques are employed to enhance model precision and effectiveness.

Findings:

The YOLOv7 image detection method was employed to detect 22 types of Iranian food items. YOLOv7 utilized deep convolutional neural networks for hierarchical feature learning from images. After training the model with 100 epochs, its performance remained stable, justifying the choice of 40 epochs for training. YOLOv7 achieved satisfactory results, with average precision values of 77% for food detection. The model demonstrated good performance with a mean average precision and recall of 75.0% and 66.9%, respectively. However, YOLOv7 exhibited imbalanced accuracy in classifying different food classes, ranging from 50% for "Greens" and "Ketchup" to 100% for "Havij Polo" and "Kuku Sabzi." Improving accuracy in specific classes can be achieved by augmenting the training dataset and fine-tuning the model. Other factors, like hyperparameter adjustments, can also influence performance.

Evaluation of classification models, i.e., YOLOv5, YOLOv7, and YOLOv8 indicated that YOLOv7 outperformed the others with an accuracy value of 0.955. YOLOv7 showed a mean average precision value of 94.5%, making it the best model. Fine-tuning and post-processing techniques could further improve the accuracy in specific classes. In conclusion, YOLOv7 proved to be a strong and efficient method for detecting and classifying Iranian food items.

Conclusion:

The research highlights the significance of accurate food detection and segmentation in Iranian cuisines, enabling applications in food quality assessment, health monitoring, and dietary analysis. Furthermore, the study emphasizes the impact of different YOLO-based models on performance metrics and their potential to enhance computer vision applications.

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2281. https://doi.org/10.1109/TPAMI.2012.120

    Afzaal, U., Bhattarai, B., Pandeya, Y. R., & Lee, J. (2021). An Instance Segmentation Model for Strawberry Diseases Based on Mask R-CNN. Sensors 2021, Page 6565, 21(19), 6565. https://doi.org/10.3390/S21196565

    Ando, Y., Ege, T., Cho, J., & Yanai, K. (2019). DepthCalorieCam: A mobile application for volume-based food calorie estimation using depth cameras. MADiMa 2019 - Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management, Co-Located with MM 2019, 76–81. https://doi.org/10.1145/3347448.3357172

    Aslan, S., Ciocca, G., Mazzini, D., & Schettini, R. (2020). Benchmarking algorithms for food localization and semantic segmentation. International Journal of Machine Learning and Cybernetics, 11(12), 2827–2847. https://doi.org/10.1007/S13042-020-01153-Z/FIGURES/13

    Ciocca, G., Napoletano, P., & Schettini, R. (2015). Food recognition and leftover estimation for daily diet monitoring. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 334–341. https://doi.org/10.1007/978-3-319-23222-5_41/COVER

    Dehaerne, E., Dey, B., Halder, S., & De Gendt, S. (2023). Optimizing YOLOv7 for semiconductor defect detection. In Metrology, Inspection, and Process Control XXXVII , 12496, 635-642. https://doi.org/10.1117/12.2657564

    Dehais, J., Anthimopoulos, M., & Mougiakakou, S. (2015). Dish detection and segmentation for dietary assessment on smartphones. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 433–440. https://doi.org/10.1007/978-3-319-23222-5_53/COVER

    Dutta, A., & Zisserman, A. (2019). The VIA annotation software for images, audio and video. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2276–2279). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3350535

    1. Jocher, A. C. and J. Q. (2020). YOLO by Ultralytics. https://github.com/ultralytics/ultralytics, 2023. Accessed: February 30, 2023.

    Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation (pp. 580–587).

    He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980–2988. https://doi.org/10.1109/ICCV.2017.322

    He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8691 LNCS(PART 3), 346–361. https://doi.org/10.1007/978-3-319-10578-9_23

    Islam, A., Chowdhury, T., Hossain, M., Nahid, N., & Rifat, A. I. (2022). An Automatic System for Identifying and Categorizing Tribal Clothing Based on Convolutional Neural Networks. 4th International Conference on Emerging Research in Electronics, Computer Science and Technology, ICERECT 2022. https://doi.org/10.1109/ICERECT56837.2022.10060409

    Jamnekar, R.V., Keole, R.R., Mohod, S.W., Mahore, T.R., Pande, S. (2023). Food Classification Using Deep Learning Algorithm. International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 492. Springer, Singapore. https://doi.org/10.1007/978-981-19-3679-1_62

    Jiang, L., Qiu, B., Liu, X., Huang, C., & Lin, K. (2020). DeepFood: Food Image Analysis and Dietary Assessment via Deep Model. IEEE Access, 8, 47477–47489. https://doi.org/10.1109/ACCESS.2020.2973625

    Jiang, S., Min, W., Liu, L., & Luo, Z. (2020). Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, 29, 265–276. https://doi.org/10.1109/TIP.2019.2929447

    Kaur, P., Sikka, K., Wang, W., Belongie, S., & Divakaran, A. (2019). FoodX-251: A Dataset for Fine-grained Food Classification. https://arxiv.org/abs/1907.06167v1

    Kaur, R., Kumar, R., & Gupta, M. (2023). Deep neural network for food image classification and nutrient identification: A systematic review. Reviews in Endocrine and Metabolic Disorders, 1–21. https://doi.org/10.1007/S11154-023-09795-4/METRICS

    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25.

    Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., & Yang, J. (2020). Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Advances in Neural Information Processing Systems, 2020-December. https://arxiv.org/abs/2006.04388v1

    Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection (pp. 2117–2125).

    Lo, F. P. W., Sun, Y., Qiu, J., & Lo, B. P. L. (2020). Point2Volume: A vision-based dietary assessment approach using view synthesis. IEEE Transactions on Industrial Informatics, 16(1), 577–586. https://doi.org/10.1109/TII.2019.2942831

    Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Christodoulidis, S., Stanga, Z., & Mougiakakou, S. (2021). An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients. IEEE Transactions on Multimedia, 23, 1136–1147. https://doi.org/10.1109/TMM.2020.2993948

    Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Pinault, L. F., Kiley, C., Spanakis, E. K., & Mougiakakou, S. (2020). goFOODTM: An artificial intelligence system for dietary assessment. Sensors (Switzerland), 20(15), 1–18. https://doi.org/10.3390/S20154283/SENSORS_20_04283_PDF.PDF

    1. Contributors. (2023). YOLOv8 by MMYOLO. https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov8, 2023. Accessed: May 13, 2023.

    Mahmoodi‐Eshkaftaki, M., Haghighi, A., & Houshyar, E. (2020). Land suitability evaluation using image processing based on determination of soil texture–structure and soil features. Soil Use and Management, 36(3), 482-493. https://doi.org/10.1111/sum.12572

    Mao, R., He, J., Shao, Z., Yarlagadda, S. K., & Zhu, F. (2021). Visual Aware Hierarchy Based Food Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12665 LNCS, 571–598. https://doi.org/10.1007/978-3-030-68821-9_47/COVER

    Mariappan, A., Bosch, M., Zhu, F., Boushey, C. J., Kerr, D. A., Ebert, D. S., & Delp, E. J. (2009). Personal dietary assessment using mobile devices. Https://Doi.Org/10.1117/12.813556, 7246, 294–305. https://doi.org/10.1117/12.813556

    Matsuda, Y., & Yanai, K. (2012). Multiple-food recognition considering co-occurrence employing manifold ranking. Proceedings of the 21st International Conference on Pattern Recognition , 2017–2020. https://ieeexplore.ieee.org/document/6460555

    MMYOLO Contributors. (2022). YOLOv7 by MMYOL.  https://github.com/open-mmlab/mmyolo/tree/main/

    configs/yolov7, 2023. Accessed: May 13, 2023.

    Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 100258. https://doi.org/10.1016/j.array.2022.100258

    Nivedhitha, P., Anurithi, P., Meenashree, S. S., & Pooja Kumari, M. (2022). FOOD NUTRITION AND CALORIES ANALYSIS USING YOLO. 2022 1st International Conference on Computational Science and Technology, ICCST 2022 - Proceedings, 382–386. https://doi.org/10.1109/ICCST55948.2022.10040454

    Okamoto, K., & Yanai, K. (2016). An automatic calorie estimation system of food images on a smartphone. MADiMa 2016 - Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, Co-Located with ACM Multimedia 2016, 63–70. https://doi.org/10.1145/2986035.2986040

    Pallathadka, H., Jawarneh, M., Sammy, F., Garchar, V., Sanchez, D. T., & Naved, M. (2022). A Review of Using Artificial Intelligence and Machine Learning in Food and Agriculture Industry. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) (pp. 2215-2218). IEEE. https://doi.org/10.1109/ICACITE53722.2022.9823427

    Pouladzadeh, P., Kuhad, P., Peddi, S. V. B., Yassine, A., & Shirmohammadi, S. (2016). Food calorie measurement using deep learning neural network. Conference Record - IEEE Instrumentation and Measurement Technology Conference, 2016-July. https://doi.org/10.1109/I2MTC.2016.7520547

    Pouladzadeh, P., & Shirmohammadi, S. (2017). Mobile Multi-Food Recognition Using Deep Learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(3s). https://doi.org/10.1145/3063592

    Qiu, J., Lo, F. P. W., & Lo, B. (2019). Assessing individual dietary intake in food sharing scenarios with a 360 camera and deep learning. 2019 IEEE 16th International Conference on Wearable and Implantable Body Sensor Networks, BSN 2019 - Proceedings. https://doi.org/10.1109/BSN.2019.8771095

    Rajayogi, J. R., Manjunath, G., & Shobha, G. (2019). Indian Food Image Classification with Transfer Learning. CSITSS 2019 - 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution, Proceedings. https://doi.org/10.1109/CSITSS47250.2019.9031051

    Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You Only Look Once: Unified, Real-Time Object Detection (pp. 779–788).

    Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 779–788. https://doi.org/10.1109/CVPR.2016.91

    Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28.

    Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Sezer, A., & Altan, A. (2021). Detection of solder paste defects with an optimization‐based deep learning model using image processing techniques. Soldering and Surface Mount Technology, 33(5), 291–298. https://doi.org/10.1108/SSMT-04-2021-0013/FULL/PDF

    Sezer Bülent Ecevit Üniversitesi, A., Altan Bülent Ecevit Üniversitesi, A., Sezer, A., & Altan, A. (2021). Optimization of deep learning model parameters in classification of solder paste defects. Ieeexplore.Ieee.Org. https://doi.org/10.1109/HORA52670.2021.9461342

    Shima, R., Yunan, H., Fukuda, O., Okumura, H., Arai, K., & Bu, N. (2018). Object classification with deep convolutional neural network using spatial information. ICIIBMS 2017 - 2nd International Conference on Intelligent Informatics and Biomedical Sciences, 2018-January, 135–139. https://doi.org/10.1109/ICIIBMS.2017.8279704

    Sun, C., Zhan, W., She, J., & Zhang, Y. (2020). Object Detection from the Video Taken by Drone via Convolutional Neural Networks. Mathematical Problems in Engineering, 2020. https://doi.org/10.1155/2020/4013647

    Sun, J., Radecka, K., & Zilic, Z. (2019). FoodTracker: A Real-time Food Detection Mobile Application by Deep Convolutional Neural Networks. https://arxiv.org/abs/1909.05994v2

    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826. https://doi.org/10.1109/CVPR.2016.308

    Taheri-Garavand,A., Nasiri, A., & Banan, A. (2021). Deep Learning Algorithm Development for Intelligent Detection and Classification of Carp Species. Biosystem Engineering of Iran, 52(3), 391-407. [In Persian]

    Tahir, G. A., & Loo, C. K. (2021). A Comprehensive Survey of Image-Based Food Recognition and Volume Estimation Methods for Dietary Assessment. Healthcare 2021, Page 1676, 9(12), 1676. https://doi.org/10.3390/HEALTHCARE9121676

    Tan, M., & Le, Q. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (pp. 6105–6114). PMLR. https://proceedings.mlr.press/v97/tan19a.html

    Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet: Scalable and Efficient Object Detection (pp. 10781–10790).

    Wang, C. Y., Mark Liao, H. Y., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020-June, 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203

    Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. http://arxiv.org/abs/2207.02696

    Wang, C.-Y., Liao, H.-Y. M., & Yeh, I.-H. (2022). Designing Network Design Strategies Through Gradient Path Analysis. https://arxiv.org/abs/2211.04800v1

    Weng, W., & Zhu, X. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. IEEE Access, 9, 16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408

    Xu, R., Lin, H., Lu, K., Cao, L., & Liu, Y. (2021). A forest fire detection system based on ensemble learning. Forests, 12(2), 1–17. https://doi.org/10.3390/F12020217

    Yumang, A. N., Banguilan, D. E. S., & Veneracion, C. K. S. (2021). Raspberry PI based Food Recognition for Visually Impaired using YOLO Algorithm. 2021 5th International Conference on Communication and Information Systems, ICCIS 2021, 165–169. https://doi.org/10.1109/ICCIS53528.2021.9645981

    Zhao, H., Xu, D., Lawal, O. M., Lu, X., Ren, R., Wang, X., & Zhang, S. (2023). Jujube Fruit Instance Segmentation Based on Yolov8 Method. https://doi.org/10.2139/SSRN.4482151

    Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2019). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999

    Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as Points. https://arxiv.org/abs/1904.07850v2

    Zhu, F., Mariappan, A., Boushey, C. J., Kerr, D., Lutes, K. D., Ebert, D. S., & Delp, E. J. (2008). Technology-assisted dietary assessment. Https://Doi.Org/10.1117/12.778616, 6814, 276–285. https://doi.org/10.1117/12.778616