Current location - Loan Platform Complete Network - Big data management - How to identify whether there are abnormalities in the measurement data?
How to identify whether there are abnormalities in the measurement data?
1, Overview: A set of measurement data, if individual data deviate from the average value is very far, then this (these) data is called "suspicious value". If the statistical methods - such as Grubbs (Grubbs) method of judgment, can be "suspicious value" from this group of measurement data and not involved in the calculation of the average value, then the "suspicious value" is known as the The "suspicious value" is then called an "outlier (gross error)". This article is to introduce how to use the Grubbs method to determine whether the "suspicious value" is an "outlier".

2. Measurement data: For example, measurements were taken 10 times (n=10), and the following data were obtained: 8.2, 5.4, 14.0, 7.3, 4.7, 9.0, 6.5, 10.1, 7.7, 6.0.

3. The data are in the order of 4.7, 5.4, 6.0, 6.5, 7.3, 7.7, 8.2, 9.0, 10.1, 14.0. It is certain that the suspicious value is either the minimum or the maximum value.

4. Calculate the mean x- and standard deviation s: x- = 7.89; standard deviation s = 2.704. All 10 data must be included in the calculation.

5. Calculate the deviation: the difference between the mean and the minimum is 7.89 - 4.7 = 3.19; the difference between the maximum and the mean is 14.0 - 7.89 = 6.11.

6. Determine a doubtful value: for comparison, the difference between the maximum and the mean, 6.11, is greater than the difference between the mean and the 3.19, so the maximum value of 14.0 is considered suspicious.

7. Calculate the value of Gi: Gi = (xi-x- )/s; where i is a suspicious value of the arrangement of the serial number - 10; so G10 = ( x10-x- )/s = (14.0-)/2; G10 = ( x10-x- )/s = (14.0-)/s = (14.0-)/s = (14.0-)/s = (14.0-). 7.89)/2.704 = 2.260. Since x10-x- is the residual and s is the standard deviation, G10 can be considered to be the ratio of the residual to the standard deviation.

8, the following to compare the calculated value Gi with the critical value GP(n) given in the Grabs table, if the calculated value of Gi is greater than the critical value GP(n) in the table, it can be judged that the measurement data is anomalous and can be eliminated. But be reminded that the critical value GP(n) is related to two parameters: the detection level α (related to the confidence probability P) and the number of measurements n (related to the degree of freedom f).

9. Determine the detection level α: if the requirements are strict, the detection level α can be set smaller, for example, set α = 0.01, then the confidence probability P = 1 - α = 0.99; if the requirements are not strict, α can be set larger, for example, set α = 0.10, that is, P = 0.90; usually set α = 0.05, P = 0.95.

10, check the Grabs table to get the critical value: according to the selected P value (0.95 here) and the number of measurements n (10 here), check the Grabs table, and intersect it horizontally and vertically to get the critical value G95(10) = 2.176.

11, compare the computed value Gi and the critical value G95(10): Gi = 2.260, G95(10) = 2.176, and Gi > G95(10) = 2.176. Gi > G95(10).

12, to determine whether it is an outlier: because Gi>G95(10), you can determine that the measured value of 14.0 is an outlier, and remove it from the 10 measurements.

13, the remaining data to consider: the remaining 9 data and then calculated in accordance with the above steps, if the calculated Gi>G95(9), is still an anomaly, eliminated; if Gi