Predicting Sugarcane Yield Using Sentinel-2 Vegetation Indices, K-Means Clustering, and K-Nearest Neighbors (KNN) Regression

Document Type : Research Paper

Authors

Biosystems engineering Dept., Faculty of Agriculture, Shahid Chamran University of Ahvaz, Ahvaz, Iran

Abstract

Sugarcane is a globally important crop that serves as a primary source of sugar and a vital feedstock for biofuels. Accurate yield prediction of strategic crops such as sugarcane plays a key role in optimal resource management and ensuring food security. The aim of this research is to develop a robust and interpretable model for estimating pre-harvest sugarcane yield by combining satellite imagery and machine learning techniques. In this regard, first, key vegetation indices were extracted using Sentinel-2 satellite time series images and engineered features such as water and fertilizer use efficiency were created to enrich the data. Then, the K-means algorithm was used to cluster the fields into four distinct groups based on their agronomic and spectral characteristics. Finally, a K-nearest neighbors (KNN) regression model was trained to predict yield. The evaluation results showed that the KNN model achieved a strong performance with a coefficient of determination (R²) of 0.8706 and a root mean square error (RMSE) of 7.80 tons per hectare in the test dataset. Feature importance analysis revealed that engineered variables, especially water productivity, are the main predictors of yield. These findings suggest that integrating satellite data with a simple yet effective KNN model provides a practical and transparent tool for decision support systems in precision agriculture.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 30 October 2025
  • Receive Date: 27 September 2025
  • Revise Date: 18 October 2025
  • Accept Date: 30 October 2025