Skip to content

Input questions as text regarding movies in the IMDB sqlite database to get information. #NLP #StanfordCoreNLP #Semantics

Notifications You must be signed in to change notification settings

seanwalker909/askIMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

askIMDB

askIMDB uses Natural Language Processing theory and the Stanford CoreNLP to provide a question answer interface for questions that can be answered by retrieving data from the IMDB sqlite database.

Supported Types of Questions:

Is insert name here a director? Was insert name here born in ?

The Algorithm uses Stanford CoreNLP in Python to produce a parse tree that taggs the input sentance, for example:

            ROOT                 
             |                    
             SQ                  
  ___________|_________________   
 |     NP         NP           | 
 |     |      ____|_____       |  
VBZ   NNP    DT         NN     . 
 |     |     |          |      |  
 Is Kubrick  a       director  ?

Then this tree is converted into another tree of the same structure, but comprised of the nodes of custom class, Node, which stores additional information:

class Node:
    def __init__(self, children, word, pos):
        self.children = []
        self.word = word  # empty unless you're a leaf
        self.pos = pos #POS = Part of Speach 
        self.sem = "" #semantic meaning - either a SQL string or a Python lambda function
        self.rule = "" #The grammar rule of the form pos->[*children POS]

    def addChild(self, newChild):
        self.children.append(newChild)

The rule variables in the node class when traversitng this tree in order will be:

ruleKey:  ROOT->SQ, 
ruleKey:  SQ->VBZ, NP, NP, ., 
ruleKey:  VBZ->Is
ruleKey:  NP->NNP, 
ruleKey:  NNP->Kubrick
ruleKey:  NP->DT, NN, 
ruleKey:  DT->a
ruleKey:  NN->director
ruleKey:  .->?

The tree is then traversed post order and these rules are used as keys into the python dictionary called rules on line 39. The rules dictionary uses rules as keys and lambda functions as values. The algorithm traverses post order and and gets the lambda function for each corresponding tree, and then evaluates and stores the result in the sem variable, for example:

<QUESTION> Is Kubrick a director?
***********traverse_tree***********
            ROOT                 
             |                    
             SQ                  
  ___________|_________________   
 |     NP         NP           | 
 |     |      ____|_____       |  
VBZ   NNP    DT         NN     . 
 |     |     |          |      |  
 Is Kubrick  a       director  ? 

**********rules from tree***********
ruleKey: ROOT->SQ,
            ROOT                 
             |                    
             SQ                  
  ___________|_________________   
 |     NP         NP           | 
 |     |      ____|_____       |  
VBZ   NNP    DT         NN     . 
 |     |     |          |      |  
 Is Kubrick  a       director  ? 

ruleKey: SQ->VBZ,NP,NP,.,
             SQ                 
  ___________|________________   
 |     NP        NP           | 
 |     |      ___|_____       |  
VBZ   NNP    DT        NN     . 
 |     |     |         |      |  
 Is Kubrick  a      director  ? 

ruleKey: NP->NNP,
   NP  
   |    
  NNP  
   |    
Kubrick

ruleKey: NP->DT,NN,
     NP         
  ___|_____      
 DT        NN   
 |         |     
 a      director

*********custom tree traversal**********
VBZ->Is
node.sem if in dictionary:  SELECT * 
NP->NNP,
node.sem if in dictionary:  Kubrick
NN->director
node.sem if in dictionary:  <function <lambda>.<locals>.<lambda> at 0x120f89f28>
NP->DT,NN,
node.sem if in dictionary:  <function <lambda>.<locals>.<lambda> at 0x120f89f28>
SQ->VBZ,NP,NP,.,
node.sem if in dictionary:  SELECT * FROM Director, Person WHERE Director.director_id = Person.id AND Person.name LIKE '%Kubrick%';
ROOT->SQ,
node.sem if in dictionary:  SELECT * FROM Director, Person WHERE Director.director_id = Person.id AND Person.name LIKE '%Kubrick%';
<QUERY>
SELECT * FROM Director, Person WHERE Director.director_id = Person.id AND Person.name LIKE '%Kubrick%';
<ANSWER> Yes

Requirements:

Java 8 or greater python 3 pip3

download and install the following python modules: -gensim -nltk -pycorenlp -- install this with the following command: pip3 install pycorenlp pip install pytest

download the Stanford CoreNLP server: https://stanfordnlp.github.io/CoreNLP/index.html#download extract the zip and cd to the folder it creates. then inside of this new directory start the server with this command: java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

About

Input questions as text regarding movies in the IMDB sqlite database to get information. #NLP #StanfordCoreNLP #Semantics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages