Intelligent Recognition Method of Web Application Categories Based on Multi-layer Simhash Algorithm
Author
Abstract

Traditional Web application category recognition is implemented by fingerprint rule matching, which is difficult to extract fingerprint rules and has limited coverage. At present, many improved identification methods semi-automatically extract fingerprints through certain rules and identify Web application categories through clustering or classification algorithms, but still rely on fingerprint rules and human intervention, and the time complexity of classification is too high to process a large amount of data. This paper proposes Multi-layer Simhash Algorithm and combines DBSCAN clustering to realize intelligent identification of Web application types, pioneering the complete automation of fingerprint identification of Web applications. This method has the function of discovering unknown Web applications and predicting unknown application types, and solves the problems of fingerprint rule extraction and manual dependence of Web applications. This paper through the TF-IDF algorithm to extract the Web page text key words and weight, Then, Multi-layer Simhash Algorithm is used to transform text feature words and weights into binary characteristic hash value, at last, the hamming distance between the input Web page and the characteristic hash value of the known category is compared with the radius of the base class, which determines the category of the input Web application. The experimental results show that the accuracy of Web application category recognition and prediction is more than 97\% and 93\% respectively.

Year of Publication
2022
Date Published
dec
Publisher
IEEE
Conference Location
Wuhan, China
ISBN Number
978-1-66549-425-0
URL
https://ieeexplore.ieee.org/document/10063638/
DOI
10.1109/TrustCom56396.2022.00073
Google Scholar | BibTeX | DOI