A refined non-driving activity classification using a two-stream convolutional neural network

Yang, LichaoYang, TingyuLiu, HaochenShan, XiaocaiBrighton, JamesSkrypchuk, LeeMouzakitis, AlexandrosZhao, Yifan2020-07-232020-07-232020-06-29Yang L, Yang T, Liu H, et al., (2021) A refined non-driving activity classification using a two-stream convolutional neural network. IEEE Sensors Journal, Volume 21, Number 14, July 2021, pp. 15574-155831530-437Xhttps://doi.org/10.1109/JSEN.2020.3005810https://dspace.lib.cranfield.ac.uk/handle/1826/15586It is of great importance to monitor the driver’s status to achieve an intelligent and safe take-over transition in the level 3 automated driving vehicle. We present a camera-based system to recognise the non-driving activities (NDAs) which may lead to different cognitive capabilities for take-over based on a fusion of spatial and temporal information. The region of interest (ROI) is automatically selected based on the extracted masks of the driver and the object/device interacting with. Then, the RGB image of the ROI (the spatial stream) and its associated current and historical optical flow frames (the temporal stream) are fed into a two-stream convolutional neural network (CNN) for the classification of NDAs. Such an approach is able to identify not only the object/device but also the interaction mode between the object and the driver, which enables a refined NDA classification. In this paper, we evaluated the performance of classifying 10 NDAs with two types of devices (tablet and phone) and 5 types of tasks (emailing, reading, watching videos, web-browsing and gaming) for 10 participants. Results show that the proposed system improves the averaged classification accuracy from 61.0% when using a single spatial stream to 90.5%enAttribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/2-stream CNNoptical flowLevel 3 automationNDA classificationA refined non-driving activity classification using a two-stream convolutional neural networkArticle