Web robot detection using supervised learning algorithms

dc.contributor.advisorHe, Hongmei
dc.contributor.advisorStarr, Andrew
dc.contributor.authorChen, Hanlin
dc.date.accessioned2023-09-28T10:45:48Z
dc.date.available2023-09-28T10:45:48Z
dc.date.issued2020-06
dc.description.abstractWeb robots or Web crawlers have become the main source of Web traffic. Although some bots perform well, such as search engines, other bots can perform DDoS attacks, posing a huge threat to websites. The project aims to develop an offline system that can effectively detect malicious web robots, which is not only conducive to network traffic cleaning, but also conducive to improving the network security of IoT systems and services. A comprehensive literature review for the years 2010-2019 was conducted to identify the research gap. The key contributions of the research are: 1) it provided a systematic methodology to address the web robot detection problem based on the log file from industrial company; 2) it provided an approach of feature engineering, thus overcoming the challenge of curse of dimensionality; 3) It made a big progress in the accuracy of off-line web robot detection through a holistic study on the three types of machine learning techniques based on real data from industry. Three algorithms based on Keras sequential model, random forest, and SVM, were developed with python to detect web robots from human visitors on the TensorFlow 2.0 platform. Experimental results suggested that random forest obtained the best performance in accuracy and training time...[cont.]en_UK
dc.description.coursenameManufacturingen_UK
dc.identifier.urihttps://dspace.lib.cranfield.ac.uk/handle/1826/20293
dc.language.isoenen_UK
dc.publisherCranfield Universityen_UK
dc.publisher.departmentSATMen_UK
dc.rights© Cranfield University, 2020. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.en_UK
dc.subjectWeb roboten_UK
dc.subjectWeb crawleren_UK
dc.subjectRandom foresten_UK
dc.subjectSequential modelen_UK
dc.subjectSVMen_UK
dc.subjectFeature importanceen_UK
dc.subjectTensorFlow 2.0en_UK
dc.titleWeb robot detection using supervised learning algorithmsen_UK
dc.typeThesis or dissertationen_UK
dc.type.qualificationlevelMastersen_UK
dc.type.qualificationnameMResen_UK

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_H_2020.pdf
Size:
1.68 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: