基于 VGG 和 LSTM 网络的视觉问答系统研究与应用摘 要随着互联网的发展,人类可以获得的数据信息量呈指数型增长,我们能够从数据中获得的知识也大大增多,人工智能的研究和应用再一次焕发活力。随着人工智能应用的不断发展,近年来,产生了有关视觉问答(Visual Question answering,VQA)的研究,并发展成为人工智能应用的一大热门问题。视觉问答任务是一个多领域、跨学科的任务,以一张图片和一个关于图片形式自由、开放式的自然语言问题作为输入,以生成一条自然语言答案作为输出[1]。简单来说,VQA 就是对给定的图片进行问答。本设计结合当前 VQA 的研究现状,基于深度学习理论,研究了 VGG+LSTM 网络的视觉问答系统,即用 VGG 网络对图片进行特征提取,用 LSTM 网络对问题进行特征提取和系统输出答案的特征生成。最终将这一复杂的人工智能系统,转化为一个多分类问题,实现了对一张图片用自然语言句子进行提问,然后用自然语言的一个单词来回答。本设计的主要创新点是将深度学习领域内的计算机视觉和自然语言处理两个方向进行多模态融合[2],将系统的输出转化为一个分类问题,达到了对图片进行一问一答的效果。关键词: VQA;视觉问答;VGG 网络;LSTM 网络;深度学习;人工智能Research And Application Of Visual Question Answering System Based On VGG And LSTM NetworkABSTRACTWith the development of the Internet, the amount of data available to human beings has increased exponentially, and the knowledge we can obtain from the data has also increased greatly. The research and application of artificial intelligence have been revitalized again. Along with the continuous development of artificial intelligence application, the research on Visual Question Answering has appeared in recent years and has developed into a hot spot. A VQA task is a multi-domain, interdisciplinary task, with a picture and a natural language question about the free and open form of pictures as input and the generation of a natural language answer as output. Briefly, VQA is a question-and-answer session on a given picture. This design combines...