Deep CAPTCHA Recognition Using Encapsulated Preprocessing and Heterogeneous Datasets
Author
Abstract

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is an important security technique designed to deter bots from abusing software systems, which has broader applications in cyberspace. CAPTCHAs come in a variety of forms, including the deciphering of obfuscated text, transcribing of audio messages, and tracking mouse movement, among others. This paper focuses on using deep learning techniques to recognize text-based CAPTCHAs. In particular, our work focuses on generating training datasets using different CAPTCHA schemes, along with a pre-processing technique allowing for character-based recognition. We have encapsulated the CRABI (CAPTCHA Recognition with Attached Binary Images) framework to give an image multiple labels for improvement in feature extraction. Using real-world datasets, performance evaluations are conducted to validate the efficacy of our proposed approach on several neural network architectures (e.g., custom CNN architecture, VGG16, ResNet50, and MobileNet). The experimental results confirm that over 90% accuracy can be achieved on most models.

Year of Publication
2022
Conference Name
IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
Google Scholar | BibTeX