In response to the current challenges of poor real-time performance and low accuracy in facial expression recognition in classroom environments, this paper proposes a classroom facial recognition model based on YOLOv4. The original backbone feature extraction network is replaced with DenseNet121, and the SE attention mechanism is incorporated into the model to enhance feature extraction and representation capabilities. The EIoU is used for localization loss calculation to improve the localization accuracy of small targets. The improved model enables real-time prediction of multiple facial expressions. Validation on a self-built dataset demonstrates that the improved model achieves a 16.33% increase in mAP compared to other models, with an AP of 96.51% for commonly occurring facial expressions in the classroom. The model achieves a video inference speed of up to 25fps and maintains good recognition performance in online teaching scenarios, indicating a certain level of generalization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.