Abstract
In the machine learning-based detection model, the detection accuracy tends to be proportional to the quantity and quality of the training dataset. The machine learning-based SSH detection model's performance is affected by the size of the training dataset and the ratio of target classes. However, in an actual network environment within a short period, it is inconvenient to collect a sufficient and diverse training dataset. Even though many training data samples are collected, it takes a lot of effort and time to prepare the training dataset through data classification. To overcome these limitations, we generate sophisticated samples using the WGAN-GP algorithm and present how to select samples by comparing generator loss. The synthetic training dataset with generated samples improves the performance of the SSH detection model. Furthermore, we add the new features to include the distinction of inter-packet arrival time. The enhanced SSH detection model decreases false positives and provides a 0.999 F1-score by applying the synthetic dataset and the packet inter-arrival time features.
Original language | English |
---|---|
Article number | 102672 |
Journal | Computers and Security |
Volume | 116 |
DOIs | |
Publication status | Published - 2022 May |
Keywords
- GAN
- Generator loss
- Inter-packet arrival time
- PCA
- Random forest
- Session-based data
- SSH detection
- WGAN-GP
ASJC Scopus subject areas
- Computer Science(all)
- Law