Detection of Iranian foods in images using deep learning

Document Type : Research Paper


1 Faculty of Agricultural Engineering and Technology, University of Tehran, Karaj, Iran

2 Professor, Department of Agricultural Machinery Engineering, Faculty of Agricultural Engineering and Technology, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran


Maintaining the well-being of individuals is greatly influenced by a healthy lifestyle and balanced diet. The identification and segmentation of food items can be improved by utilizing a mobile-based system in this era of rapid lifestyle changes and technology. This article introduces a novel system that, upon receiving input images, detects and segmentation the food items within the images. The system utilizes deep learning techniques and models, employing the YOLO algorithm. By incorporating regression-based simple methods, the system achieves the capability to detect and categorize food items in a single pass through the network, aiming to enhance accuracy and speed in the detection process. YOLOv7 was employed for food detection and YOLOv5, YOLOv7, and YOLOv8 was utilized for image segmentation. Based on the results, the accuracy, recall, and average precision values for YOLOv7 were 0.844, 0.924, and 0.932, respectively. Furthermore, the instance segmentation performance of YOLOv7 outperformed YOLOv5 and YOLOv8, with precision, recall, and mean average precision values of 0.959, 0.943, and 0.906, respectively. These findings underscore the high accuracy in detecting Iranian foods and the remarkable speed and precision in food image segmentation attainable through advanced deep-learning algorithms. Consequently, this study establishes that accurate detection of Iranian foods can be accomplished through the utilization of sophisticated deep-learning techniques. This research focuses on promoting a healthy lifestyle through intelligent technology and novel deep learning algorithms in Iran.


Main Subjects

Detection of Iranian Foods in Images Using Deep Learning





Recent attention has been drawn to the application of deep learning models in various domains, with a particular focus on nutritional analysis and food quality evaluation. This study explores the use of YOLO-based models, including YOLOv5, YOLOv7, and YOLOv8, for the automatic detection and segmentation of Iranian cuisine.


The primary aim of this study is to assess the effectiveness of several YOLO-based algorithms in detecting specific food classes commonly found in Iranian meals. YOLOv7 is employed for the detection of 22 food classes, while instance segmentation for 19 different food classes is conducted using YOLOv5, YOLOv7, and YOLOv8.


A meticulously curated dataset of Iranian food images serves as the training data for the models. To ensure the models' robustness and generalization, the dataset comprises images captured under various lighting conditions and from different viewpoints. Transfer learning strategies and hyperparameter optimization techniques are employed to enhance model precision and effectiveness.


The YOLOv7 image detection method was employed to detect 22 types of Iranian food items. YOLOv7 utilized deep convolutional neural networks for hierarchical feature learning from images. After training the model with 100 epochs, its performance remained stable, justifying the choice of 40 epochs for training. YOLOv7 achieved satisfactory results, with average precision values of 77% for food detection. The model demonstrated good performance with a mean average precision and recall of 75.0% and 66.9%, respectively. However, YOLOv7 exhibited imbalanced accuracy in classifying different food classes, ranging from 50% for "Greens" and "Ketchup" to 100% for "Havij Polo" and "Kuku Sabzi." Improving accuracy in specific classes can be achieved by augmenting the training dataset and fine-tuning the model. Other factors, like hyperparameter adjustments, can also influence performance.

Evaluation of classification models, i.e., YOLOv5, YOLOv7, and YOLOv8 indicated that YOLOv7 outperformed the others with an accuracy value of 0.955. YOLOv7 showed a mean average precision value of 94.5%, making it the best model. Fine-tuning and post-processing techniques could further improve the accuracy in specific classes. In conclusion, YOLOv7 proved to be a strong and efficient method for detecting and classifying Iranian food items.


The research highlights the significance of accurate food detection and segmentation in Iranian cuisines, enabling applications in food quality assessment, health monitoring, and dietary analysis. Furthermore, the study emphasizes the impact of different YOLO-based models on performance metrics and their potential to enhance computer vision applications.

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2281.

    Afzaal, U., Bhattarai, B., Pandeya, Y. R., & Lee, J. (2021). An Instance Segmentation Model for Strawberry Diseases Based on Mask R-CNN. Sensors 2021, Page 6565, 21(19), 6565.

    Ando, Y., Ege, T., Cho, J., & Yanai, K. (2019). DepthCalorieCam: A mobile application for volume-based food calorie estimation using depth cameras. MADiMa 2019 - Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management, Co-Located with MM 2019, 76–81.

    Aslan, S., Ciocca, G., Mazzini, D., & Schettini, R. (2020). Benchmarking algorithms for food localization and semantic segmentation. International Journal of Machine Learning and Cybernetics, 11(12), 2827–2847.

    Ciocca, G., Napoletano, P., & Schettini, R. (2015). Food recognition and leftover estimation for daily diet monitoring. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 334–341.

    Dehaerne, E., Dey, B., Halder, S., & De Gendt, S. (2023). Optimizing YOLOv7 for semiconductor defect detection. In Metrology, Inspection, and Process Control XXXVII , 12496, 635-642.

    Dehais, J., Anthimopoulos, M., & Mougiakakou, S. (2015). Dish detection and segmentation for dietary assessment on smartphones. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9281, 433–440.

    Dutta, A., & Zisserman, A. (2019). The VIA annotation software for images, audio and video. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2276–2279). Association for Computing Machinery, Inc.

    1. Jocher, A. C. and J. Q. (2020). YOLO by Ultralytics., 2023. Accessed: February 30, 2023.

    Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation (pp. 580–587).

    He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980–2988.

    He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8691 LNCS(PART 3), 346–361.

    Islam, A., Chowdhury, T., Hossain, M., Nahid, N., & Rifat, A. I. (2022). An Automatic System for Identifying and Categorizing Tribal Clothing Based on Convolutional Neural Networks. 4th International Conference on Emerging Research in Electronics, Computer Science and Technology, ICERECT 2022.

    Jamnekar, R.V., Keole, R.R., Mohod, S.W., Mahore, T.R., Pande, S. (2023). Food Classification Using Deep Learning Algorithm. International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 492. Springer, Singapore.

    Jiang, L., Qiu, B., Liu, X., Huang, C., & Lin, K. (2020). DeepFood: Food Image Analysis and Dietary Assessment via Deep Model. IEEE Access, 8, 47477–47489.

    Jiang, S., Min, W., Liu, L., & Luo, Z. (2020). Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, 29, 265–276.

    Kaur, P., Sikka, K., Wang, W., Belongie, S., & Divakaran, A. (2019). FoodX-251: A Dataset for Fine-grained Food Classification.

    Kaur, R., Kumar, R., & Gupta, M. (2023). Deep neural network for food image classification and nutrient identification: A systematic review. Reviews in Endocrine and Metabolic Disorders, 1–21.

    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25.

    Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., & Yang, J. (2020). Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Advances in Neural Information Processing Systems, 2020-December.

    Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection (pp. 2117–2125).

    Lo, F. P. W., Sun, Y., Qiu, J., & Lo, B. P. L. (2020). Point2Volume: A vision-based dietary assessment approach using view synthesis. IEEE Transactions on Industrial Informatics, 16(1), 577–586.

    Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Christodoulidis, S., Stanga, Z., & Mougiakakou, S. (2021). An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients. IEEE Transactions on Multimedia, 23, 1136–1147.

    Lu, Y., Stathopoulou, T., Vasiloglou, M. F., Pinault, L. F., Kiley, C., Spanakis, E. K., & Mougiakakou, S. (2020). goFOODTM: An artificial intelligence system for dietary assessment. Sensors (Switzerland), 20(15), 1–18.

    1. Contributors. (2023). YOLOv8 by MMYOLO., 2023. Accessed: May 13, 2023.

    Mahmoodi‐Eshkaftaki, M., Haghighi, A., & Houshyar, E. (2020). Land suitability evaluation using image processing based on determination of soil texture–structure and soil features. Soil Use and Management, 36(3), 482-493.

    Mao, R., He, J., Shao, Z., Yarlagadda, S. K., & Zhu, F. (2021). Visual Aware Hierarchy Based Food Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12665 LNCS, 571–598.

    Mariappan, A., Bosch, M., Zhu, F., Boushey, C. J., Kerr, D. A., Ebert, D. S., & Delp, E. J. (2009). Personal dietary assessment using mobile devices. Https://Doi.Org/10.1117/12.813556, 7246, 294–305.

    Matsuda, Y., & Yanai, K. (2012). Multiple-food recognition considering co-occurrence employing manifold ranking. Proceedings of the 21st International Conference on Pattern Recognition , 2017–2020.

    MMYOLO Contributors. (2022). YOLOv7 by MMYOL.

    configs/yolov7, 2023. Accessed: May 13, 2023.

    Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 100258.

    Nivedhitha, P., Anurithi, P., Meenashree, S. S., & Pooja Kumari, M. (2022). FOOD NUTRITION AND CALORIES ANALYSIS USING YOLO. 2022 1st International Conference on Computational Science and Technology, ICCST 2022 - Proceedings, 382–386.

    Okamoto, K., & Yanai, K. (2016). An automatic calorie estimation system of food images on a smartphone. MADiMa 2016 - Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, Co-Located with ACM Multimedia 2016, 63–70.

    Pallathadka, H., Jawarneh, M., Sammy, F., Garchar, V., Sanchez, D. T., & Naved, M. (2022). A Review of Using Artificial Intelligence and Machine Learning in Food and Agriculture Industry. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) (pp. 2215-2218). IEEE.

    Pouladzadeh, P., Kuhad, P., Peddi, S. V. B., Yassine, A., & Shirmohammadi, S. (2016). Food calorie measurement using deep learning neural network. Conference Record - IEEE Instrumentation and Measurement Technology Conference, 2016-July.

    Pouladzadeh, P., & Shirmohammadi, S. (2017). Mobile Multi-Food Recognition Using Deep Learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(3s).

    Qiu, J., Lo, F. P. W., & Lo, B. (2019). Assessing individual dietary intake in food sharing scenarios with a 360 camera and deep learning. 2019 IEEE 16th International Conference on Wearable and Implantable Body Sensor Networks, BSN 2019 - Proceedings.

    Rajayogi, J. R., Manjunath, G., & Shobha, G. (2019). Indian Food Image Classification with Transfer Learning. CSITSS 2019 - 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution, Proceedings.

    Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You Only Look Once: Unified, Real-Time Object Detection (pp. 779–788).

    Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 779–788.

    Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28.

    Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

    Sezer, A., & Altan, A. (2021). Detection of solder paste defects with an optimization‐based deep learning model using image processing techniques. Soldering and Surface Mount Technology, 33(5), 291–298.

    Sezer Bülent Ecevit Üniversitesi, A., Altan Bülent Ecevit Üniversitesi, A., Sezer, A., & Altan, A. (2021). Optimization of deep learning model parameters in classification of solder paste defects. Ieeexplore.Ieee.Org.

    Shima, R., Yunan, H., Fukuda, O., Okumura, H., Arai, K., & Bu, N. (2018). Object classification with deep convolutional neural network using spatial information. ICIIBMS 2017 - 2nd International Conference on Intelligent Informatics and Biomedical Sciences, 2018-January, 135–139.

    Sun, C., Zhan, W., She, J., & Zhang, Y. (2020). Object Detection from the Video Taken by Drone via Convolutional Neural Networks. Mathematical Problems in Engineering, 2020.

    Sun, J., Radecka, K., & Zilic, Z. (2019). FoodTracker: A Real-time Food Detection Mobile Application by Deep Convolutional Neural Networks.

    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826.

    Taheri-Garavand,A., Nasiri, A., & Banan, A. (2021). Deep Learning Algorithm Development for Intelligent Detection and Classification of Carp Species. Biosystem Engineering of Iran, 52(3), 391-407. [In Persian]

    Tahir, G. A., & Loo, C. K. (2021). A Comprehensive Survey of Image-Based Food Recognition and Volume Estimation Methods for Dietary Assessment. Healthcare 2021, Page 1676, 9(12), 1676.

    Tan, M., & Le, Q. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (pp. 6105–6114). PMLR.

    Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet: Scalable and Efficient Object Detection (pp. 10781–10790).

    Wang, C. Y., Mark Liao, H. Y., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020-June, 1571–1580.

    Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors.

    Wang, C.-Y., Liao, H.-Y. M., & Yeh, I.-H. (2022). Designing Network Design Strategies Through Gradient Path Analysis.

    Weng, W., & Zhu, X. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. IEEE Access, 9, 16591–16603.

    Xu, R., Lin, H., Lu, K., Cao, L., & Liu, Y. (2021). A forest fire detection system based on ensemble learning. Forests, 12(2), 1–17.

    Yumang, A. N., Banguilan, D. E. S., & Veneracion, C. K. S. (2021). Raspberry PI based Food Recognition for Visually Impaired using YOLO Algorithm. 2021 5th International Conference on Communication and Information Systems, ICCIS 2021, 165–169.

    Zhao, H., Xu, D., Lawal, O. M., Lu, X., Ren, R., Wang, X., & Zhang, S. (2023). Jujube Fruit Instance Segmentation Based on Yolov8 Method.

    Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2019). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 12993–13000.

    Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as Points.

    Zhu, F., Mariappan, A., Boushey, C. J., Kerr, D., Lutes, K. D., Ebert, D. S., & Delp, E. J. (2008). Technology-assisted dietary assessment. Https://Doi.Org/10.1117/12.778616, 6814, 276–285.