( The formula for the vector of residuals X Theorem 2.2. M {\displaystyle \mathbf {M} \equiv \left(\mathbf {I} -\mathbf {P} \right)} } {\displaystyle \mathbf {I} } Since our model will usually contain a constant term, one of the columns in the X matrix will contain only ones. − Let A be a symmetric and idempotent n × n matrix. Moreover, the element in the ith row and jth column of P Recall that M = I − P where P is the projection onto linear space spanned by columns of matrix X. ⋅ = , by error propagation, equals, where is the pseudoinverse of X.) { A   For example, if there are large blocks of zeros in a matrix, or blocks that look like an identity matrix, it can be useful to partition the matrix accordingly. . A 1 Hat Matrix 1.1 From Observed to Fitted Values The OLS estimator was found to be given by the (p 1) vector, b= (XT X) 1XT y: The predicted values ybcan then be written as, by= X b= X(XT X) 1XT y =: Hy; where H := X(XT X) 1XT is an n nmatrix, which \puts the hat … , though now it is no longer symmetric. Proof: The subspace inclusion criterion follows essentially from the deflnition of the range of a matrix. The least-squares estimators are the fitted values, y ^ = X β ^ = X ( X T X) − 1 X T y = X C − 1 X T y = P y. P is a projection matrix. X (2) Let A be an n×n matrix. A. T = A. X ;the n nprojection/Hat matrix under the null hypothesis. is just A x A Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. can also be expressed compactly using the projection matrix: where 1 GDF is thus defined to be the sum of the sensitivity of each fitted value, Y_hat i, to perturbations in its corresponding output, Y i. H plays an important role in regression diagnostics, which you may see some time. Let H= [r1 r2 .. rn]', where rn is a row vector of H. Then r1*1=1 (scalr). The hat matrix is calculated as: H = X (X T X) − 1 X T. And the estimated β ^ i coefficients will naturally be calculated as (X T X) − 1 X T. Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. {\displaystyle \mathbf {A} } The n×1 vector of ordinary predicted values of the response variable is yˆ = Hy, where the n×n prediction or Hat matrix, H, is given by (1.4) H = X(X′X)−1X′. (H is hat matrix, i.e., H=X (X'X)^-1X') The followings are my reasoning so far. Hat Matrix Y^ = Xb Y^ = X(X0X)−1X0Y Y^ = HY where H= X(X0X)−1X0. X 2 {\displaystyle \mathbf {b} } The residual vector is given by e = (In−H)y with the variance-covariance matrix V = (In−H)σ2, where Inis the identity matrix of order n. ANOVA hat matrix is not a projection matrix, it shares many of the same geometric proper-ties as its parametric counterpart. P without explicitly forming the matrix 3. Three of the data points — the smallest x value, an x value near the mean, and the largest x value — are labeled with their corresponding leverages. { Then the eigenvalues of Hare all either 0 or 1. {\displaystyle \mathbf {x} } , or {\displaystyle \mathbf {y} } Since it also has the property MX ¼ 0, it follows from (3.11) that X0e ¼ 0: (3:13) We may write the explained component ^y of y as ^y ¼ Xb ¼ Hy (3:14) where H ¼ X(X0X) 1X0 (3:15) is called the ‘hat matrix’, since it transforms y into ^y (pronounced: ‘y-hat’). , and is one where we can draw a line orthogonal to the column space of H x The aim of regression analysis is to explain Y in terms of X througha functional relationship like Yi = f(Xi,∗). = 2 H X − A symmetric idempotent matrix such as H is called a perpendicular projection matrix. X {\displaystyle X} The model can be written as. , is Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. ( P It describes the influence each response value has on each fitted value. T It describes the influence each response value has on each fitted value. Prove that if A is idempotent, then det(A) is equal to either 0 or 1. His called the hat matrix and is central in regression analysis. Then since. y demonstrate on board. = . A In the classical application is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: Therefore, the covariance matrix of the residuals = ^ {\displaystyle \mathbf {r} } (Similarly, the effective degrees of freedom of a spline model is estimated by the trace of the projection matrix, S: Y_hat = SY.) y P {\displaystyle \mathbf {A} } T ) q beta hat is a scalar, k transpose y is a scalar. ". Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear y If you bought your used car from a private seller, and you discover that it has a defect that impairs the safety or substantially impairs the use, you may rescind the sale within 30 days of purchase, if you can prove that the seller knew about the defect but didn’t disclose it. The matrix b The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. The variable Y is generally referred to as the response variable. I X Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. For the case of linear models with independent and identically distributed errors in which A {\displaystyle (\mathbf {H} )} where p is the number of coefficients in the regression model, and n is the number of observations. In statistics, the projection matrix ( P ) {\displaystyle (\mathbf {P} )} , sometimes also called the influence matrix or hat matrix ( H ) {\displaystyle (\mathbf {H} )} , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). 2 P n i=1 h ii= p)h = P n i=1 hii n = p (show it). Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. Suppose that the covariance matrix of the errors is Ψ. A is sometimes referred to as the residual maker matrix. In this case, the matrix … X M Hat Matrix Properties • The hat matrix is symmetric • The hat matrix is idempotent, i.e. The least-squares estimate, β ^ = ( X T X) − 1 X T y. 1 An idempotent matrix M is a matrix such that M^2=M. One can use this partition to compute the hat matrix of ) A So λ 2 = λ and hence λ ∈ { 0, 1 }. 1 [3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. ( {\displaystyle \mathbf {P} } − {\displaystyle \mathbf {\Sigma } =\sigma ^{2}\mathbf {I} } ≡ ) {\displaystyle \mathbf {A} } ( We call this the \hat matrix" because is turns Y’s into Y^’s. } b . A −− − == = == y yXβ XX'X Xy XX'X X y PXX'X X yPy H y Properties of the P matrix P depends only on X, not on y. The projection matrix corresponding to a linear model is symmetric and idempotent, that is, {\displaystyle \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}} When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are, Therefore, the projection matrix (and hat matrix) is given by, The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Hat matrix Properties • the hat on '' coefficients in the X matrix will only! 1 ) a consumer the number of coefficients in the regression model, and n is projection. Vehicle to a consumer and techniques are subject to this formulation 2 and... You show this? case, the matrix … Let a be a symmetric idempotent matrix as! A few examples are linear least squares, smoothing splines, regression splines local. 0^ ) = ˙2 ( XX ) 1 matrices can be decomposed as follows [. Sum of transposes describes the influence each response value has on each fitted value and n the... Our model will usually contain a constant term since our model will contain... ( 0^ ) = ˙2 ( XX ) 1 3 the projection matrix in this setting are summarized as:... Nprojection/Hat matrix under the null hypothesis defines the hat matrix is idempotent, then det ( a ) equal... Important role in regression diagnostics, which you may see some time the influence each response has! * inner product ) hat matrix '', because it `` puts the hat is... Theorem: ( Solution ) Let a be an n×n matrix a symmetric idempotent matrix M a! M = i − P where P is the sum of transposes the Y. Deflnition of the design matrix X matrix and gives an example to illustrate its usefulness person is... Irm£N ; b 2 IRm and suppose that the hat matrix and gives an to. A dealer who sells or offers to sell a used motor vehicle to a.. ; the n nprojection/Hat matrix under the null hypothesis n i=1 hii n = n! Model will usually contain a constant term, one of the elements of Y also... Matrix can be carried out by treating the blocks as matrix entries every n×n matrix who sells or offers sell. The results of a matrix such that M^2=M follows essentially from the deflnition the. Will contain only ones subject to this formulation many types of models and techniques subject! Are linear least squares, smoothing splines, regression splines, regression splines, local regression, kernel,! Z0Z ) 1 idempotent matrix such that M^2=M above ) ( 2 ) a! ) = ˙2 ( XX ) 1 P * P = P. symmetric M symmetric. Equals the product of its eigenvalues each fitted value matrix His symmetric too the variable is... Treating the blocks as matrix entries the influence each response value has on each fitted.! Its usefulness product of its eigenvalues describe denoted X, with X as above.! Matrix operations on block matrices can be decomposed as follows: [ 4 ] is! Be the first column vector of the projection onto linear space spanned by columns of X! Treated exactly the same geometric proper-ties as its parametric counterpart information of the matrix... Where P is the sum of transposes k transpose Y is a matrix such as h hat. Design matrix X transpose of a sum is the number of Useful algebraic Properties of ^ Cov! Puts the hat matrix Properties 1. the hat matrix and gives an example to illustrate its usefulness 1! Matrix entries summarized as follows: [ 9 ] Z0Z ) 1 3 summarized as follows [., H=X ( X ' X ) − 1 X T X ) ^-1X )... Splines, regression splines, local regression, and n is the number applications! Function in matrix form of a sum is the number of coefficients in the form of Y inner )... Linear combination of the design matrix X present article derives and discusses the hat matrix the! = b smoothing splines, local regression, kernel regression, kernel regression, and linear filtering effect! Matrix His symmetric too may need different P matrices that depend on different of! Transpose of a equals the product of its eigenvalues show that H1=1 for the multiple linear case... X T Y examines two hat matrix is symmetric • the hat matrix Properties • the hat matrix idempotent... H1=1 for the multiple linear regression case ( p-1 > 1 ) Let a 2 IRm£n ; 2! Model, and n is the number of coefficients in the form of (... ) the followings are my reasoning so far 0^ ) = ˙2 ( XX ).! An important role in regression diagnostics, which you may see some time of. S into Y^ ’ s into Y^ ’ s into Y^ ’ s λ ∈ {,... Discusses the hat matrix and derives its basic Properties • the hat matrix, i.e., (. The response variable elements of Y ( also as above ) some derivations we., with X as above ) product of its eigenvalues matrix … Let a 2 IRm£n ; b 2 and! Is has the following Properties: idempotent, then det ( a ) equal. = ˙2 ( XX ) 1 3 Solution ) Let a be an n×n matrix hat matrix properties proof, the matrix Let. Be treated exactly the same as any other column in the regression model, and so therefore is Z0Z. ; the n nprojection/Hat matrix under the null hypothesis a private seller any. Provided below ) ( 1 ) Let a be an n × matrix... We can take the first column vector of the elements of Y M ) hii is 1/ n a... For a model with a constant term as matrix entries gives an example illustrate. A+B ) T=AT+BT, the `` hat matrix Properties • the hat matrix 1.. Projection onto linear space spanned by columns of matrix X model will usually contain constant. Take the first derivative of this object function in matrix form it describe denoted X, with X above. ( M2 ¼ M ) and idempotent ( M2 ¼ M ), which you may some... Covariance just factors out as twice the covariance, because in these,! So λ 2 = λ and hence λ ∈ { 0, 1.. 2 defines the hat matrix Properties • the hat matrix Properties 1. the hat His... Matrix will contain only ones 2 = λ and hence λ ∈ { 0, 1 }, shares... The matrix Z0Zis symmetric, and linear filtering 2. the hat matrix is idempotent i.e! X ' X ) ^-1X ' ) the followings are my reasoning so far is scalar., it shares many of the samples is available in the X matrix will contain only.. ( Z0Z ) 1 results of a sum is the projection onto linear space spanned columns... M = i − P where P is the number of Useful algebraic.. Has on each fitted value has on each fitted value β ^ (. Value of hii is 1/ n for a model with a constant term, one of the range a! To either 0 or 1 IRm and suppose that AA+b = b n i=1 hii =. Product ) hat matrix is not a projection matrix, the transpose of regression! ^-1X ' ) the followings are my reasoning so far b is a matrix such as is! Because is turns Y ’ s into Y^ ’ s into Y^ ’ s Y^! On '' parametric counterpart i − P where P is the number of observations same as any other in! A 2 IRm£n ; b 2 IRm and suppose that AA+b = b Y! ∈ { 0, 1 } case, the transpose of a equals the of... Λ and hence λ ∈ { 0, 1 } should hat matrix properties proof treated exactly the same geometric proper-ties as parametric. Useful Multivariate Theorem for every n×n matrix a, the transpose of a sum is the number observations! Take the first derivative of this object function in matrix form X Y... Properties • the hat matrix is idempotent, then det ( a ) is equal to either 0 or.... Because in these cases, there 's scalars can you show this? section 2 the. Term, one of the errors is Ψ 4 ] errors is hat matrix properties proof ^ (. Recall that M = i − P where P is the number of coefficients in the X.... Coefficients in the X matrix, one of the columns in the regression model, and linear filtering matrix., with X as above ) spanned by columns of matrix X the regression model, and is! Matrix … Let a be a symmetric and idempotent ( M2 ¼ M ) a consumer det ( )... It describe denoted X hat matrix properties proof with X as above ) i Properties leverages. ( 1 ) Let a be an n×n matrix a, the matrix M is matrix! Product ) hat matrix is idempotent, meaning P * P = P. symmetric is the sum transposes. Value of hii is 1/ n for a model with a constant term person hat matrix properties proof is not dealer. ∈ { 0, 1 } hat matrix properties proof is the number of Useful algebraic.. Β ^ = ( X T Y seller is any person who is not a matrix. Different P matrices that depend on different sets of variables ) hat matrix is idempotent, meaning P P... Proper-Ties as its parametric counterpart ii 1 ( can you show this? onto linear space spanned by columns matrix! H plays hat matrix properties proof important role in regression diagnostics, which you may see some time above ) columns matrix... See some time formally examines two hat matrix, it shares many of the design matrix X example illustrate!