In the Evidence-Based Cybersecurity Research Laboratory, I spearheaded a project focused on scraping and analyzing data from illicit forums to gain insights into underground activities. Utilizing advanced web scraping techniques and natural language processing (NLP) methodologies, this project provided significant insights into forum dynamics and illicit activities.

Key Contributions:

  • Data Collection: Scraped data from various illicit forums, collecting diverse data types including text and images.
  • NLP Techniques: Applied sophisticated NLP techniques such as stemming, lemmatization, stop words removal, word cloud generation, topic modeling, and named entity recognition to process and analyze the collected data.
  • Forum Analysis:
    • Forum Longevity: Calculated the age of the forums to understand their longevity and sustainability.
    • Forum Category: Classified forums into categories such as fraud and advertisement based on the content and discussions.
    • Market Attractiveness: Identified buzzwords and trending topics within the forums to assess their market attractiveness and activity levels.
  • Insights Gained: The analysis provided valuable insights into illicit activities, helping researchers understand the dynamics and trends within underground markets.
  • Impact: Enhanced the research capabilities of the lab by providing a comprehensive analysis of illicit forums, contributing to the understanding of cybersecurity threats and market behaviors.

This project demonstrates my ability to combine web scraping and NLP techniques to analyze and gain insights from complex and unstructured data sources, contributing to the field of cybersecurity research.