PURE_Program

How to RUN

For each data folder we need to separate sentences with labels:

sed '1~3d' input_entailment.txt > sentences.txt sed -n 'p;N;N' input_entailment.txt > labels.txt

And parse the sentences

./stanford-parser-2011-09-14/lexparser.sh sentences.txt > parsed.txt

And run the matlab code

cd code echo run | matlab -nodesktop

TODO:

Parser.py: Adding options for generating data with the following cases:

ENTAILMENT vs. ( NEUTRAL or CONTRADICTION) -> (done)
CONTRADICTION vs. ( NEUTRAL or ENTAILMENT)
NEUTRAL vs. ( CONTRADICTION or ENTAILMENT)

(data is generated and parseed for the above items. But the parser.py needs to be more userfriendly.)

Similarity measure (scaled to one)

Generate and parse all data into their corresponding folders inside 'EntailmentData/'

Running the baseine

Adding F1 table to the final result -> (done)

We need to ask if we can use previous RTE datasets for training or not. If we can, we should pretrain on other datasets.

Wiki of the results

https://wiki.engr.illinois.edu/display/~khashab2/RTE+project

Data

The data is classified inside 'EntailmentData/'

../data/vars.normalized.100.mat : Contains the word vectors

What is the difference between 'params.mat' and 'simMat_release.mat'

Variables

Meaning of some important variables:

TODO:

allSNum: array of each word's index in the dictionary

allSStr: array of words

allSTree: tree structure. allSTree[i] = j means j is i's parent

allSKids: children info. of the tree.

      allSKids[i,1] is the i's left child

      allSKids[i,2] is the i's right child

allSOStr = {};

allSPOS = {};

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.idea		.idea
Baseline		Baseline
EntailmentData		EntailmentData
RecursiveLanguageModel		RecursiveLanguageModel
Report		Report
code		code
data		data
obsolete		obsolete
savedParams		savedParams
stanford-parser-2011-09-14		stanford-parser-2011-09-14
toolbox		toolbox
.DS_Store		.DS_Store
100onlySentences.txt		100onlySentences.txt
100parsed.txt		100parsed.txt
README.md		README.md
README.md~		README.md~
baseline_output.txt		baseline_output.txt
classifyParaphrases.sh~		classifyParaphrases.sh~
compute_overlap_baseline.py		compute_overlap_baseline.py
parsed.txt		parsed.txt
parsed_1600_notSeparated.txt		parsed_1600_notSeparated.txt
parsed_5000_notSeparated.txt		parsed_5000_notSeparated.txt
sentences_1600.txt		sentences_1600.txt
sentences_1600_notSeparated.txt		sentences_1600_notSeparated.txt
sentences_1600_texts.txt		sentences_1600_texts.txt
sentences_5000.txt		sentences_5000.txt
sentences_5000_notSeparated.txt		sentences_5000_notSeparated.txt
sentences_5000_texts.txt		sentences_5000_texts.txt
sick_evaluation.R		sick_evaluation.R
unsupervised_pased_1600.txt		unsupervised_pased_1600.txt
unsupervised_sentences_1600.txt		unsupervised_sentences_1600.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PURE_Program

How to RUN

TODO:

Wiki of the results

Data

Variables

About

Releases

Packages

Contributors 2

Languages

jeenakk/PURE_program

Folders and files

Latest commit

History

Repository files navigation

PURE_Program

How to RUN

TODO:

Wiki of the results

Data

Variables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages