Recall the last schematic of the development of an OLS, GLS, SI, MME:
MME is Henderson's transformation of the mixed model to get
Henderson proposed that the idea is to combine OLS and SI, and get MME.
Process:
First we know: OLS tends to solve for a larger u than SI.
This requires dividing by a larger value when solving for u. This is done by adding a positive matrix Z'Z, which is assumed to be represented by D:
i.e., the original interpreter is penalized
Using the previous example of finding the average of the milk production of the bull's daughters: using the number of daughters of each cow with some modifications, or which can add a hypothetical value D
Second conclusion:
Explanation: the covariance of u and y, in SI is AZ', and intuitively, D must be proportional to A -1, because the whole block ZZ' + D is inverted.
But the real proof of the following equation was done 20 years later by Prof. Searle (the book Variance components has a detailed proof):
So the best linear unbiased estimation (β, fixed effects) and the best linear unbiased prediction (u, random effects) are the following components:
Henderson rewrote the above equation as:
where R = Var(e); G = Var(u), and is considered known.
Equivalent to SI (β is known):
There will be a multiple trait model: it facilitates the inclusion of samples with missing values, and also facilitates correlation analysis of multiple traits
The same matrix equation:
but there is not a very clear distinction between the two
Fixed effects: the mean of the levels: E(y) = Xβ
Random effects: variance-covariance matrix of random factors: Var(y) = ZGZ' + R
Best linear unbiased estimation = BLUE (β, fixed effects) and best linear unbiased prediction = BLUP (u, random effects)
The difference between BLUE and BLUP is obtained from the SI inheritance
But: in SI it's just BLP, which lacks the u
This can be obtained in a different way:
(2) Using SI again to obtain u:
Each u has a prediction error, and there are two ways to evaluate u using Acurracy or reliability:
X and Z are often not full matrices (i.e., there are lots of zeros inside)
X i and Z i are associated with observer i, so:
As in a stationary model:
Needs to be written separately:
Summed together at the end:
Integration (Calculus) R -1 , with R -1 being equal to R diagonally taken as the inverse (if there is no covariance)
With multiple traits, X 'X' and other elements are Kronecker product R-1
Expanding the above to MME, the least squares (LS) part of the random effects is the same as the fixed effects, with G-1 added to C:
Simplifying the formula further:
As you saw earlier, building an MME is based on the number of records of a single *Traits
So for big data, the accelerated approach:
Historically, SI has been very successful, and is currently used in integrated breeding
But SI is biased in estimating EBVs of individual traits because it is actually based on known bias experience to seek, but known bias experience has a bias.
From that time, gene levels could be different in contemporaneous populations
SI in dairy cows is called: contemporary comparison (CC) Used in dairy cattle breeding 1950-1960 to differentiate between good and bad bulls, and was quickly successful.
CC model: y = Xt + Za + e
But for cows already bred, this phenotypic deviation accumulates over time and becomes more and more difficult to estimate, and some are even underestimated.
This requires us to modify it:
First: joint calculation of deviation and EBV, so there is no error
Second: modefied CC (MCC), iterative calculation:
(1) Calculation of EBV from deviation
(2) Adjustment of deviation from contemporary EBV
(3) return to (1)
(4) until the result is stable (i.e., converges), which converges to the same result as BLUP
How to do it:
Correct the y part when computing t:
Iterate until convergence:
Overall diagram:
Today I accidentally came across a nice article about BLUP, here is the URL to check it out
/p/43395772