The problem of human pose estimation in still images is considered. Most previous works predicted the pose directly with either local deformable models or a global mixture representation in the pose space. We argue that this process of pose estimation can be divided into different stages. We propose a new two-stage framework for human pose estimation. In the pre-estimation stage, there are three steps: upper body detection, model category estimation for the upper body, and full model selection for pose estimation. A new method based on pairwise scores of the upper body is proposed for upper body detection. In the estimation stage, we address the problem of a variety of human poses and activities. The upper body-based multiple mixture parts (MMP) model is proposed. This model not only joins different mixture models together, but can also analyze activities with complex kinematic structures. The model is compared with the state-of-the-art. The experimental results demonstrate the effectiveness of the MMP model.