International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 08 Issue: 05 | May 2021 www.irjet.net p-ISSN: 2395-0072
ASK IMAGE: A Chatbot which Answers Questions on Image Captions
Varun Vinod1, Bhoyar Rohit Avinash2, Jeetan Rajesh3, Kale Gauresh Atmaram4 and Prof. Dhiraj
Amin5
1-5Department of Computer Engineering, Pillai College of Engineering, Navi Mumbai, India - 410206
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Caption generation is a challenging artificial intelligence problem where a printed depiction should be produced for a
given photo. It needs each strategies from pc vision to grasp the content of the image and a language model from the
sphere of linguistic communication process to show the understanding of the image into words within the right order. A single end-
to-end model is often characterized to anticipate a subtitle, given a photograph, instead of requiring refined data arrangement or
a pipeline of expressly planned models. Conversational AI use cases are diverse. They include customer support, e-commerce,
controlling IoT devices, enterprise, productivity and much more. In very simplistic terms, these use cases involve a user asking a
specific question (intent) and the conversational experience (or the chatbot) responding to the question by making calls to a
backend system like a CRM, Database or an API. It turns out that some of these use cases can be enriched by allowing a user to
upload an image. In such cases, you would want the conversation experience to take an action based on what exactly is in that
image. In this project we will develop a photo captioning deep learning model and incorporate COCO dataset, SQuAD dataset to
provide rich and dynamic ML based responses to user provided image inputs.
Key Words: Natural Language Processing, Caption Generation, Convolutional Neural Networks, Recurrent Neural
Networks, Question Processing.
1. Introduction
Ask Image’s fundamental target is to produce subtitles by preparing the info picture and coordinating a chatbot. Caption
generation is an interesting artificial intelligence problem where a descriptive sentence is generated for a given image. It
includes the double procedures from PC vision to comprehend the substance of the picture and a language model from the field
of normal language preparing to transform the comprehension of the picture into words organized appropriately. Image
captioning has various applications such as recommendations in editing applications, usage in virtual assistants, for image
indexing, for visually impaired persons, for social media, and several other natural language processing applications. At the
point when people read an article or a short entry from book, the most ideal path for checking a nature of far reaching perusing
is attempting to make a rundown or responding to the inquiries with regards to the part that you read. Therefore in order to
mimic this reading process most of the QA systems are aimed to extract important information from a provided article or a
short passage to answer the given questions.
2. Literature Survey
2.1 Enriching Conversation Context in Retrieval-based Chatbots.
This technique is demonstrated by Amir Vakili and Azadeh Shakery from the University of Tehran. This project works on
retrieval-based chatbots, like most sequence pair matching tasks, can be divided into Cross-encoders that perform word
matching over the pair, and Bi-encoders that encode the pair separately. Development of a sequence matching architecture that
utilizes the entire training set as a makeshift knowledge-base during inference is expanded upon. Retrieval-based systems,
which select a response from candidates retrieved from chat logs according to how well they match the current conversation
context as opposed to generative systems which synthesise new sentences based on the context are studied. Detailed
experiments demonstrating that this architecture can be used to further improve Bi-encoders performance while still
maintaining a relatively high inference speed are performed.
2.2 Survey on Automatic Image Caption Generation.
This survey is executed by Shuang Bai and Shan An for image caption generation. The survey explains in detail about
connecting both research communities of computer vision and natural language processing. In this paper, a survey on advances
in image captioning research based on the technique adopted and classification of image captioning approaches into different
categories is presented. Representative methods in each category are summarized, and their strengths and limitations are
© 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1292
, International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 08 Issue: 05 | May 2021 www.irjet.net p-ISSN: 2395-0072
talked about. The initial methods discusssed are mainly retrieval and template based. Neuural network based methods are also
discussed, which give state of the art results. Neural network based methods are further divided into subcategories based on
the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of
the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented.
2.3 An Intelligent Behaviour Shown by Chatbot System.
The Authors Vibhor Sharma, Monika Goyal , Drishti Malik discuss about how chatbots are software agents used to interact
between a computer and a human in natural language, just as people use language for human communication, chatbots use
natural language to communicate with human users. In this paper, analysis of some existing chatbot systems namely ELIZA and
ALICE is observed. Arrival at a conclusion that it is easier to build bots using ALICE because of its simple pattern matching
techniques that building one for ELIZA since it is based on rules is observed. Finally, discussion of the proposed system in
which the implementation of ALICE chatbot system as a domain specific chatterbox which is a student information system that
helps users in various queries related to students and universities is observed.
2.4 Chatbot Design-Reasoning about design options using i* and process architecture.
The Authors Zia Babar , Alexei Lapouchnian, Eric Yu discuss about how software systems are often designed without
considering their social intentionality and the software process changes required to accommodate them. This paper considers
chatbots as domain example for illustrating the complexities of designing such intentional and intelligent systems, and the
resultant changes and reconfigurations in processes. A mechanism of associating process architecture models and actor models
is presented. The modelling and analysis of two types of chatbots, retrieval based and generative are shown using both process
architecture and actor models.
3. System Architecture
The system architecture is given in Figure 1. Each block is described in this Section.
Fig 1: Proposed system architecture
© 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1293