回到主页

PCA example on R

· Data mining

Hi! Today I will introduce what outcomes could we expect from PCA using R.

First I’d like to load some essential packages. Then load the example dataset: the big five, as a public dataset. This dataset describe five personality factors from extraversion, neuroticism, agreeableness, conscientiousness, openness to experience, and 10 items related to each of the five personality factors, and scored range 1-5 for each of the items. Like following,

broken image
broken image

Let’s have a quick look and get box plots for each of the variables,

broken image

I prefer to use a principal function from the psych package although there are also other options like prcomp or princomp in the build-in function in R.

First, I do a very basic one without rotation. The five factors are put into five principal components, then I ‘ve got the correlation coefficients with each of the items and the five factors. H2 is a measure of communality, u2 is uniqueness, and the last one is a measure of complexity. Now you see the first 10 variables, which have to do with extraversion, have strong associations with the first principal component. And then drop off in the other 4 principal components.

 

broken image

We can also see how much of the variance in the total data set these components accounts for. The first component account for 11% of the variants in the total data set. And these five components account for accumulative variants of about 46 percent.

broken image

Second, a small difference is adding rotation, from principal components to rotated components, which may be easier to interpret. In the output we can find it’s similar to the first step result, but the numbers are slightly different, and because this is an oblique rotation, it allows the components to be correlated with each other. And so you can see how each of these components can separate some of the variables from the others, which is the overall goal of principal component analysis.

broken image
broken image