![]() ![]() The Iris dataset and license can be found under: We keep the features which can explain the most variation in the data. Keep as many new features as we specified and discard the rest.It consists of the following processing steps. Principal component analysis is a technique to reduce the number of features in our dataset. The first principal component corresponds to the eigenvector with the largest eigenvalue. The Principal components are the eigenvectors of the covariance matrix. ![]() The first principal component explains the biggest part of the observed variation and the second principal component the second largest part and so on. Principal components are the axes in which our data shows the most variation. The factor by which they get scaled is the corresponding eigenvalue. ![]() The vectors, which get only scaled and not rotated are called eigenvectors. This linear transformation is a mixture of rotating and scaling the vector. When we multiply a matrix with a vector, the vector get’s transformed linearly. I will now summarize the most important concepts. We can see, that much of the information in the data has been preserved and we could now train an ML model, that classifies the data points according to the three species. Reducing the number of features makes sense for high dimensional data because then it reduces the number of features. In this example, we do not reduce the number of features. Let’s look at what PCA does on a 2-dimensional dataset.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |