Final.Video.mp4
--> https://github.com/Voice-Phishing-Prevention-Project
ㅤAs seen in the article, voice phishing techniques are evolving by the day. Existing voice phishing prevention apps are already active in the smartphone domain, but there is a significant limitation in that they are relatively unfamiliar in areas such as internet telephony and landline calls.
ㅤTherefore, we aim to develop a system that can analyze conversation contents in real-time to discern instances of voice phishing and effectively communicate this for some vulnerable groups susceptible to voice phishing.
ㅤFurthermore, existing voice phishing prevention methods are primarily designed for non-disabled individuals, posing limitations in terms of accessibility and usability for people with visual, auditory, cognitive, and other disabilities.
ㅤTherefore, we aim to develop a system that facilitates individuals with disabilities to more easily benefit from voice phishing prevention technology.
ㅤMany services currently available in the market primarily focus on detecting phishing on smartphones, making it difficult for the elderly population, who may not be familiar with smartphone usage, to detect voice phishing.
ㅤAdditionally, while non-disabled individuals can use the service without inconvenience, there is no separate convenience device for people with disabilities, leading to difficulties for them in using such services.
➡️ For the reasons mentioned, this project has designated landline phones, rather than smartphones, as the primary target. The service has been designed with the primary target in mind, enabling easy and accurate phishing detection even on regular landline phones.
Additionally, to facilitate clear signal identification for disabled users, a method of conveying phishing alerts through both voice and text has been adopted.
1️⃣ When the recipient initiates a call with the sender, AWS Transcribe collects the contents of the conversation in real time and performs Speech-to-Text (STT).
After that, utilizing natural language processing model algorithms, it undergoes preprocessing and is stored in a database. Using the stored database, it provides phishing alert functionality.
2️⃣ Based on the conversation content database, a pre-trained and customized NLP algorithm conducts binary classification for phishing detection.
Ultimately, this result is communicated to the user through text and a light signal to indicate the presence of phishing.
1️⃣ Start recording voice at the beginning of the call.
2️⃣ Save the recorded data on the laptop at the end of the call.
3️⃣ Upload the MP3 file to Google Colab from the laptop.
4️⃣ Transfer the file from Google Colab to the AWS S3 bucket.
5️⃣ Perform speech-to-text (STT) on the file stored in the S3 bucket.
6️⃣ Save the STT results in a .json file.
7️⃣ Transmit the saved JSON file to the model.
8️⃣ Classify the presence of phishing in the model.
9️⃣ Save the derived classification results to the AWS S3 bucket.
🔟➖🅰️ Read the values stored in the S3 bucket on the Raspberry Pi.
🔟➖🅱️ Read the values stored in the S3 bucket on the responsive Flask web.
1️⃣1️⃣➖🅰️ Provide guidance through LED flashing based on the result.
1️⃣1️⃣➖🅱️ Provide guidance through a web pop-up window based on the result.
1️⃣ Send the collected voice file data to the AWS Transcribe server.
2️⃣ Perform speech-to-text (STT) through AWS Transcribe and preprocess the text data obtained from the file.
3️⃣ Perform binary classification on the phishing status based on the preprocessed text data using an appropriate model.
ㅤBased on the model performance comparison results, both the accuracy and F1-score evaluation metrics confirm the superiority of KoBIGBIRD. Furthermore, an inference test was conducted using a new test dataset consisting of 10 normal datasets and 10 phishing datasets, where both models correctly classified 19 out of the 20 test data.
➡️ Consequently, the KoBIGBIRD model was selected for use in voice phishing detection, and the solution proceeded accordingly.
⏩ In this project, we utilized the concatenation and customization of the KoBIGBIRD, R-BERT, and KR-BERT models.
KoBIGBIRD is a model developed for Korean natural language processing, based on Transformers. It can handle longer sequences than conventional BERT, dealing with a maximum of 4096 tokens, eight times more than BERT's 512 tokens.
A BERT-based model for Korean natural language processing, KR-BERT provides excellent performance in various NLP tasks through pre-training tailored to Korean text, learning sentence and word-level representations. In the field of Korean natural language processing, KR-BERT is utilized for various NLP tasks, including semantic interpretation and sentence structure analysis.
R-BERT, based on BERT, is specialized in context-aware entity relationship inference, effectively inferring relationships between entities in natural language processing tasks. By integrating entity and relationship information, it achieves improved performance in information extraction and relationship inference tasks.
ㅤOur custom model is inspired by the R-BERT model and built upon the architecture of the KoBIGBIRD model. In the Relation Extraction Task, the R-BERT model enhances its performance by utilizing not only the CLS token but also the embedding vectors of entity1 and entity2.
ㅤIn this regard, our model can be characterized as a customized model that leverages not only the CLS token but also the entire dialogue data, extracted morphemes and keywords, and their respective embedding vectors in the training process.
ㅤFurthermore, by combining the CLS tokens of Kr-BERT and KoBIGBIRD, our model aims to integrate the diverse features of both models, leveraging their respective strengths and compensating for their shortcomings.
ㅤThis model combines the embedding values of the CLS tokens from Kr-BERT and KoBIGBIRD.
ㅤSubsequently, the entire dialogue data and the data extracted only for keywords and morphemes are separated into vectors using an index that indicates the end of the sentence. This process is designed to understand the context of the conversation through the entire dialogue data and to learn the important parts of the conversation through morphemes and keywords.
ㅤFinally, through the FCLayer class, the dimensions of each vector are adjusted and combined to create a single vector, which is used as the input value of the model. Using this generated data, the model can predict the final class, i.e., whether it is a phishing attempt, through the label classifier.
ㅤBy combining Kr-BERT's language understanding capabilities and KoBIGBIRD's ability to handle long texts, the model incorporates various features of the input text. It is expected to leverage KoBIGBIRD's strengths in handling long dialogue data and Kr-BERT's strengths in understanding subtle meanings and expressions within sentences.
👉ㅤIn this project, the collected data is in unstructured form, categorized into phishing and non-phishing data. To address class imbalance, augmentation was conducted only for the phishing data.
➖ Loan fraud type: 185 instances
➖ Financial fraud type: 227 instances ➖ Financial/Insurance, Transfer, Withdrawal, Loan Service Type: 48,476 instances