Abstract:In response to the limitations of existing clustering methods on data processing scale and the problem of poor data clustering performance, a big data clustering method based on optimized X-means algorithm is proposed with the support of Hadoop platform. Utilizing the Hadoop platform architecture and functions to collect big data samples, preprocessing the initial sample data is achieved through steps such as missing compensation, noise filtering, and normalization. Select a big data clustering center, extract the features of the clustering center data and all other data samples, and calculate the feature similarity between the data samples and the clustering center. Using similarity measurement results as clustering criteria, the optimized X-means algorithm is used to determine the type of data, ultimately achieving clustering processing of big data. Through clustering effectiveness testing experiments, it was concluded that under both experimental conditions, the recall and precision of the optimized design method were improved by 4.75% and 4.5%, respectively, compared to traditional clustering methods. At the same time, the optimized clustering method resulted in higher data utilization.