Scene text recognition has gained significant attention in the computer vision community. Character detection and recognition are the promise of text recognition and affect the overall performance to a large extent. We proposed a good initialization model for scene character recognition from cropped text regions. We use constrained character’s body structures with deformable part-based models to detect and recognize characters in various backgrounds. The character’s body structures are achieved by an unsupervised discriminative clustering approach followed by a statistical model and a self-build minimum spanning tree model. Our method utilizes part appearance and location information, and combines character detection and recognition in cropped text region together. The evaluation results on the benchmark datasets demonstrate that our proposed scheme outperforms the state-of-the-art methods both on scene character recognition and word recognition aspects.