基于 Scrapy 框架 python 的全国高考数据爬虫及数 据分析 Crawler and data analysis of national college entrance examination data based on scratch framework Python内容摘要此论文首先详细地介绍了 Python 爬虫的开发背景以及 Python 对数据的分析进行了详细的讲解,然后讨论了对 Python 爬虫和 Python 的数据分析结合起来的设计目标。本文前端基于 Scrapy 框架,Python 进行全国高考数据爬虫以将数据存储就能 MySQL 数据库和 Excel 中进行数据处理以及对其数据进行总体的分析。本次项目的数据是通过网站里的数据进行爬取获得,数据处理是通过 Pandas 模块,数据分析主要是数据可视化和数据挖掘采用了 Python 第三方库 Matplotlib 和 pyecharts 以及线性回归方程,和 Python 机器学习中 Lasso 的回归预测处理来实现数据的挖掘和分析的。关键词: Python 爬虫 MySQL 数据库 Scrapy 数据分析 数据挖掘 Execl Matplotlib pyecharts 线性回归 Lasso 回归预测 Python 机器学习AbstractThe continuously the advent of the development background of Python's crawler This thesis, to begin with, introduced the background of Python crawler development detailly and Python makes a detailed explanation to the data analysis, and then discussed the design goal, which about the combination of the data analysis of Python and Python towards the data analysis. The front-end of this thesis based on Python Scrapy-framework, Python runs for the national college entrance examination data to store the data that can be carried out in the MySQL database and Excel data processing and analysis of its data in general. The data of this project is obtained by excavating and retrieving the data on the website. The data processing is carried out by the Pandas module. The data analysis is mainly based on data visualization and data mining, which uses Python the third-party libraries Matplotlib, pyecharts and linear regression equation, and Lasso regression pr...