Convolutional neural networks (ConvNets) coupled with long short term memory (LSTM) networks have been recently shown to be effective for video classification as they combine the automatic feature extraction capabilities of a neural network with additional memory in the temporal domain. This paper shows how multiview fusion can be applied to such a ConvNet LSTM architecture. Two different fusion techniques are presented. The system is first evaluated in the context of a driver activity recognition system using data collected in a multicamera driving simulator. These results show significant improvement in accuracy with multiview fusion and also show that deep learning performs better than a traditional approach using spatiotemporal features even without requiring any background subtraction. The system is also validated on another publicly available multiview action recognition dataset that has 12 action classes and 8 camera views.