It is well known that higher level features can represent the abstract semantics of original data. We propose a multiple scales combined deep learning network to learn a set of high-level feature representations through each stage of convolutional neural network for face recognition, which is named as multiscaled principle component analysis (PCA) Network (MS-PCANet). There are two main differences between our model and the traditional deep learning network. On the one hand, we get the prefixed filter kernels by learning the principal component of images’ patches using PCA, nonlinearly process the convolutional results by using simple binary hashing, and pool them using spatial pyramid pooling method. On the other hand, in our model, the output features of several stages are fed to the classifier. The purpose of combining feature representations from multiple stages is to provide multiscaled features to the classifier, since the features in the latter stage are more global and invariant than those in the early stage. Therefore, our MS-PCANet feature compactly encodes both holistic abstract information and local specific information. Extensive experimental results show our MS-PCANet model can efficiently extract high-level feature presentations and outperform state-of-the-art face/expression recognition methods on multiple modalities benchmark face-related datasets.