Current location - Loan Platform Complete Network - Foreign exchange account opening - What are the advantages and disadvantages of Euclid distance discrimination, Mahara Nobis distance discrimination and Fisher discrimination?
What are the advantages and disadvantages of Euclid distance discrimination, Mahara Nobis distance discrimination and Fisher discrimination?
Summarized as follows:

1, Euclidean distance, also known as Euclidean metric and Euclidean metric, is a common definition of distance and the real distance between two points in M-dimensional space. Euclidean distance in two-dimensional and three-dimensional space is the distance between two points.

Disadvantages: As far as most statistical problems are concerned, Euclidean distance is not satisfactory. (Each coordinate contributes equally to the Euclidean distance. When coordinates represent measured values, there are often random fluctuations of different sizes. In this case, the reasonable method is to weight the coordinates, so that the coordinates with large changes have smaller weight coefficients than those with small changes, thus generating various distances.

When each component is a quantity with different properties, the size of "distance" is related to the unit of index. It is equivalent to the difference between different attributes of samples (i.e. indicators or variables), and sometimes it cannot meet the actual requirements. The influence of population variation on distance is not considered.

2.Mahalanobis distance is put forward by Mahalanobis, an Indian statistician, and represents the covariance distance of data. The difference between two random variables whose covariance matrix is σ: If the covariance matrix is identity matrix, the Mahalanobis distance is simplified to Euclidean distance; if the covariance matrix is diagonal, it can also be called normalized Euclidean distance.

This is an effective method to calculate the similarity between two unknown sample sets. For a multivariable vector with a mean value of μ and a covariance matrix of σ, the Mahalanobis distance between the sample and the population is (DM) 2 = (x-μ)' σ (- 1) (x-μ). In most cases, Mahalanobis distance can be calculated smoothly, but the calculation of Mahalanobis distance is unstable, and the source of instability is covariance matrix, which is also the biggest difference between Mahalanobis distance and Euclidean distance.

Advantages: it is not affected by dimensions, and the Mahalanobis distance between two points has nothing to do with the measurement unit of the original data. (it takes into account the relationship between various features (for example, a piece of information about height will bring a piece of information about weight because they are related) and is scale-invariant, that is, independent of the measurement scale); The Mahalanobis distance between two points calculated by standardized data and centralized data (that is, the difference between the original data and the mean value) is the same. Mahalanobis distance can also eliminate the interference of correlation between variables.

Disadvantages: exaggerate the role of variables with small changes. Affected by the instability of covariance matrix, the calculation of Mahalanobis distance is not always stable.

Comparison between Mahalanobis distance and Euclidean distance;

1, Mahalanobis distance is calculated based on the whole sample, which can be obtained from the explanation of covariance matrix above. That is to say, if you take the same two samples and put them into two different populations, the Mahalanobis distance between the two samples is usually different, unless the covariance matrices of the two populations are exactly the same;

2. In the process of calculating Mahalanobis distance, it is required that the population sample number is larger than the sample dimension, otherwise the inverse matrix of the covariance matrix of the population sample does not exist. In this case, Euclidean distance can be used for calculation.