GitHub

Spam filter for Quora questions

This is project no. 3

In this project the task is to filter the questions asked in quora to see if they are spam or not.

The projcct is a binary classification as such:

spam:0
not spam:1

Before we applying any deep learning algorith to it we first need to process each question to clean and remove some of the following things as such:

convert the entire sentences to lower case
remove any punctuation
white space removal
remove any new line
remove number etc,

Now the process is to create features out of word, we accomplished it using (glove embedding)[http://nlp.stanford.edu/data/glove.42B.300d.zip].

embeding_index={}

f=open('glove.42B.300d.txt',encoding='utf-8')

for line in f:
   values=line.split()
   word=values[0]
   coefs=np.asarray(values[1:],dtype='float32')
   embeding_index[word]=coefs
f.close()

And using the same embedding we are going to create weights for our network:

embedding_matrix=np.zeros((vocab_size+1,300))

for word,i in tk.word_index.items():
   embed_vector=embeding_index.get(word)
   if embed_vector is not None:
       embedding_matrix[i]=embed_vector

Finally we will use these in the network as such:

inputs=Input(name='text_input',shape=[max_len,])

embed=Embedding(vocab_size+1,
               300,
               input_length=max_len,
               mask_zero=True,
               weights=[embedding_matrix],
               trainable=False)(inputs)

x = Bidirectional(GRU(64, return_sequences=True))(embed)
x = GlobalMaxPool1D()(x)
x = Dense(16, activation="relu")(x)
x = Dropout(0.1)(x)
final_layer = Dense(1, activation="sigmoid")(x)

model=Model(inputs=inputs,outputs=final_layer)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Project_3_with_pretrained_embedding.ipynb		Project_3_with_pretrained_embedding.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam filter for Quora questions

This is project no. 3

About

Releases

Packages

Languages

DasAniruddha/DL_project3

Folders and files

Latest commit

History

Repository files navigation

Spam filter for Quora questions

This is project no. 3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages