基于 Scrapy 框架的威胁情报抓取以及处理系统的设计 与实现 Design and implementation of threat infornation system based on scrapy Technology 内容摘要随着安全技术的发展,网络安全防御措施成为了一种重要资产,如何通过现有的威胁情报数据进行快速有效的提取和分析数据是目前的一个研究热点。针对开源威胁网站以及博客等海量安全数据采用 Scrapy 框架设计网络爬虫对数据进行提取,并将其以知识图谱所需要的底层数据存储,通过 scrapy 爬虫技术,可以使得通过对各个数据源的爬虫文件,针对性的爬取所需要的信息以及信息的解析整合入库,从而在这个基础上进行多方向的研究。安全防范的目标不仅仅是在攻击到达前阻挡它,还应该包括阻断它达到其最终目的。本次的毕业设计课题将围绕基于对开源威胁情报网站的爬取与数据解析系统进行设计与实现。系统将实现由 Scrapy 爬虫框架爬取开源威胁情报网站上公开的数据进行解析入库,通过 Flask_admin+nginx 以及pyecharts 来展示所抓取的数据以及相对于的查询与导出功能;本系统由爬虫模块、数据解析模块以及数据展示模块组成,很大程度上实现了APT 知识图谱的数据需求。关键词:APT 知识 Scrapy 爬虫技术 AbstractWith the development of security technology, network security defense measures have become an important asset. How to extract and analyze data quickly and effectively through the existing threat intelligence data is a research hotspot at present. For massive security data such as open-source threat websites and blogs, we use the scrapy framework to design web crawlers to extract data, It can store the bottom data needed by knowledge map, and through the technology of crawler, it can integrate the crawler files of each data source, the information needed for targeted crawling and the analysis of information into the database, so as to carry out multi-directional research on this basis. The goal of security is not only to block the attack before it arrives, but also to block it to achieve its ultimate goal.This graduation project will focus...