Current location - Loan Platform Complete Network - Big data management - The inputs to the k-mean clustering algorithm include
The inputs to the k-mean clustering algorithm include

The inputs to the k-mean clustering algorithm include the number of clusters K and n data objects.

K-mean clustering is a commonly used classification and clustering algorithm, which can be based on the different features of the input data and analyze the relationship between the features of the original data into a number of "clusters" in order to achieve classification and clustering purposes.

I. K-mean clustering algorithm process

1, initialization: to determine the number of clusters K, and for each cluster to choose an initial center point.

2, Assignment: assign each data point to the center point closest to it, and the data points in the same center point belong to the same cluster.

3, Update: Calculate the centroid of each cluster and replace the original centroid with the new one.

4, Repeat: Repeat the above steps until the division of clusters no longer changes or the maximum number of iterations is reached, ending the algorithm.

Two, K-mean clustering algorithm works

The algorithm begins by randomly selecting K points from the dataset as the initial clustering centers, and then calculates the distance of each sample to the clusters, and attributes the sample to the closest one to it in the class where the clustering center is located. Calculate the average of the data objects of each newly formed cluster to get the new clustering center, if there is no change in the clustering center of the two adjacent times, it means that the sample adjustment is over and the clustering criterion function has converged.

One of the features of this algorithm is to examine whether the classification of each sample is correct in each lost generation. If it is not correct, it has to be adjusted, and after all the samples have been adjusted, then the clustering center is modified and the next iteration is entered. If in one iteration of the algorithm all the samples are classified correctly, there will be no adjustments and there will be no change in the clustering centers, which signifies that convergence has taken place and therefore the algorithm ends.

Three, K-mean clustering algorithm advantages and disadvantages

K-mean clustering algorithm has the advantage that the algorithm has a clear structure, the idea is simple, the implementation is simple, easy to explain, and the accuracy can be achieved to a very good level. However, the algorithm also has some disadvantages, namely, the need to specify the number of clusters K in advance, if the specified value of K is too large or too small, will affect the results of the clustering, in addition, the K-mean clustering algorithm also assumes that there is a strong clustering characteristics of the data points, if the data exists in the noise or the relative position of each other is relatively fuzzy, the effect of the algorithm will be greatly affected.

Four, the application of K-mean clustering

K-mean clustering has a wide range of applications, it can be used in data mining, image processing, machine learning and other fields, it can be used to extract valuable information from the original data, and will be irrelevant to clear the data, to provide scientific basis for the analysis of the data.K-mean clustering algorithm helps to accelerate the speed of calculation, can effectively improve the ability of the computer to deal with large amounts of data, and improve the accuracy of computer data processing.