-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.xml
1231 lines (992 loc) · 85.9 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Nishant Mishra</title>
<link>https://mnishant2.github.io/</link>
<atom:link href="https://mnishant2.github.io/index.xml" rel="self" type="application/rss+xml" />
<description>Nishant Mishra</description>
<generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© NM 2021</copyright><lastBuildDate>Tue, 15 Dec 2020 13:07:25 -0500</lastBuildDate>
<image>
<url>https://mnishant2.github.io/media/dab.jpg</url>
<title>Nishant Mishra</title>
<link>https://mnishant2.github.io/</link>
</image>
<item>
<title>Generative Multimodal Learning for Reconstructing Missing Modality</title>
<link>https://mnishant2.github.io/project/multimodalvae/</link>
<pubDate>Tue, 15 Dec 2020 13:07:25 -0500</pubDate>
<guid>https://mnishant2.github.io/project/multimodalvae/</guid>
<description><p>Multimodal learning with latent space models
has the potential to help learn deeper, more useful
representations that help getting better performance,
even in missing modality scenarios.
In this project we leverage latent space based
model to perform inference and reconstruction
in all missing modality combinations.
We trained a
<a href="https://arxiv.org/abs/1802.05335" target="_blank" rel="noopener">Multimodal Variational Auto Encoder</a>
which uses a product of Experts based inference
network on three different modalities consisting
of MNIST handwritten digit images in
two languages and spoken digit recordings for
our experiments. We trained the model in a
subsampled training paradigm using an ELBO
loss that comprised the modality reconstruction
losses, label cross-entropy loss as well as the
Kullback-Leibler divergence for the latent distribution.
We evaluated the total
<a href="https://en.wikipedia.org/wiki/Evidence_lower_bound" target="_blank" rel="noopener">ELBO loss</a>
, individual
reconstruction losses, classification accuracy
and visual reconstruction outputs as part
of our analysis. We observed encouraging results
both in terms of successful convergence as
well as accurate reconstructions.</p>
<p>We approached the missing modality reconstruction
and classification based problem using a Multimodal Variational
Autoencoder(MVAE). Our model used a tree like graph where the
different modalities define the observation nodes. It consists
of parallel fully connected encoder and decoder networks
associated with each modality as part of a VAE and a product of experts technique for late fusion of the respective
latent distribution parameters from each encoder to get a
final representation. An additional linear decoder branch
was used for label classification.Each modality has its own
inference network. This model was trained by optimizing
an estimated lower bound (ELBO) on the marginal likelihood
of observed data, i.e reconstructions of the modalities
as well as the classification loss.</p>
<p>We also used a sampling
based training scheme such that for each training example
containing modalities, we obtained the loss for all combinations
of modalities given to the model, this ensured the
learned model generalized to perform well in reconstructing
given any combination set of the modalities.
We used three modalities for experimentation and trained the
model on a MNIST dataset with images in two languages,
Farsi and Kannada as first two modalities and speech utterances
of the MNIST digits as the third modality.</p>
<p>The model
performed well in terms of the convergence of ELBO loss,
individual reconstruction losses, classification accuracy as
well as the final visual reconstructions of the modalities. We
also performed various analyses in terms of hyperparameter
tuning, reconstruction under different modality combinations
as well as analysis of disentanglement of representation
property.</p>
</description>
</item>
<item>
<title>LICENSE: CC-BY-SA</title>
<link>https://mnishant2.github.io/license/</link>
<pubDate>Sun, 13 Sep 2020 00:00:00 +0100</pubDate>
<guid>https://mnishant2.github.io/license/</guid>
<description><p>My
<a href="../">website</a>
is licensed under a
<a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener">Creative Commons Attribution-ShareAlike 4.0 International License</a>
.</p>
<center>
<i class="fab fa-creative-commons fa-2x"></i><i class="fab fa-creative-commons-by fa-2x"></i><i class="fab fa-creative-commons-sa fa-2x"></i>
</center></description>
</item>
<item>
<title>Highlighter(Auto field detection)</title>
<link>https://mnishant2.github.io/project/highlighter/</link>
<pubDate>Fri, 21 Aug 2020 09:18:07 -0400</pubDate>
<guid>https://mnishant2.github.io/project/highlighter/</guid>
<description><p>This project involved an automatic highlighter tool for automatic highlighting and extraction of specific form fields from documents for further processing such as Optical Character Recognition, information retrieval from handwritten documents or even to facilitate semi manual digital population of records from forms using a user interface.</p>
<p>The tool utilizes document layout detection, classical Computer vision techniques like template matching and mathematical heuristics to create a generalizable automatic highlighting tool using only one sample of the concerned document.</p>
<p>The associated repository here is designed for handling a particular bank form and is a command line highlighting tool that can be appropriated/extended for other documents and interfaces.</p>
</description>
</item>
<item>
<title>CV</title>
<link>https://mnishant2.github.io/cv/</link>
<pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/cv/</guid>
<description></description>
</item>
<item>
<title>Portfolio</title>
<link>https://mnishant2.github.io/portfolio/</link>
<pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/portfolio/</guid>
<description></description>
</item>
<item>
<title>Locally Competitive Algorithms</title>
<link>https://mnishant2.github.io/talk/lca/</link>
<pubDate>Wed, 15 Jul 2020 11:00:00 -0400</pubDate>
<guid>https://mnishant2.github.io/talk/lca/</guid>
<description></description>
</item>
<item>
<title>Histopathology</title>
<link>https://mnishant2.github.io/talk/histopathology/</link>
<pubDate>Sat, 06 Jun 2020 11:00:00 -0400</pubDate>
<guid>https://mnishant2.github.io/talk/histopathology/</guid>
<description></description>
</item>
<item>
<title>Policy Gradient</title>
<link>https://mnishant2.github.io/project/policy_gradient/</link>
<pubDate>Fri, 01 May 2020 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/project/policy_gradient/</guid>
<description><p>This project was done as part of my final project submission for
<a href="https://www.cs.mcgill.ca/~dprecup/courses/RL/lectures.html" target="_blank" rel="noopener">COMP767: Reinforcement Learning</a>
course at McGill University</p>
<p>In the recent years, significant work has been done in the field of Deep Reinforcement
Learning, to solve challenging problems in many diverse domains. One such example,
are Policy gradient algorithms, which are ubiquitous in state-of-the-art continuous control
tasks. Policy gradient methods can be generally divided into two groups: off-policy
gradient methods, such as
<a href="https://arxiv.org/abs/1509.02971" target="_blank" rel="noopener">Deep Deterministic Policy Gradients (DDPG)</a>
,
<a href="https://arxiv.org/pdf/1802.09477.pdf" target="_blank" rel="noopener">Twin Delayed
Deep Deterministic (TD3)</a>
,
<a href="https://arxiv.org/abs/1801.01290" target="_blank" rel="noopener">Soft Actor Critic (SAC)</a>
and on-policy methods, such as
<a href="https://arxiv.org/abs/1502.05477" target="_blank" rel="noopener">Trust Region Policy Optimization (TRPO)</a>
.</p>
<p>However, despite these successes on paper, reproducing deep RL results is rarely straightforward. There are many sources of possible instability and variance including extrinsic
factors (such as hyper-parameters, noise-functions used) or intrinsic factors (such as random
seeds, environment properties).</p>
<p>In this project, we perform two different analysis on these policy gradient methods:
(i) Reproduction and Comparison: We implement a variant of DDPG, based on the original
paper. We then attempt to reproduce the results of DDPG (our implementation) and
TD3 and compare them with the well-established methods of REINFORCE and A2C.
(ii) Hyper-Parameter Tuning: We also, study the effect of various Hyper-Parameters(namely
Network Size, Batch Sizes) on the performance of these methods.</p>
</description>
</item>
<item>
<title>Online Learning of temporal Knowledge Graphs</title>
<link>https://mnishant2.github.io/project/online_learning/</link>
<pubDate>Tue, 14 Apr 2020 09:23:37 -0400</pubDate>
<guid>https://mnishant2.github.io/project/online_learning/</guid>
<description><p>This project was undertaken as part of the final project for
<a href="https://cs.mcgill.ca/~wlh/comp766/" target="_blank" rel="noopener">COMP 766: Graph Representation Learning</a>
course at McGill University.</p>
<p>For many computer science sub-fields, knowledge graphs (KG) remain a constant
abstraction whose usefulness relies in their representation power. However, dynamic
environments, such as the temporal streams of social media information,
brings a greater necessity of incorporating additional structures to KG’s.</p>
<p>In this project, we applied currently available solutions to address incremental
knowledge graph embedding to several applications to test their efficiency. We also
proposed an embedding model agnostic framework to make these models
incremental. Firstly, we proposed a window-based incremental learning approach
that discards least happening facts and performs link prediction on updated triples.
Next, we presented experiments on a GCN model-agnostic meta-learning based approach.</p>
<p>To create edge embedding vectors, we experimented with two methods:</p>
<ol>
<li>Concatenating head and tail’s 128-dimensional Node2Vec embedding vectors to create
256-dimensional edge embedding</li>
<li>Subtracting head embedding from tail embedding vector to create 128-dimensional edge
embedding vector
Our best model is the Window-based KG Incremental Learning, where
edge representations, are calculated from subtraction of embedding vectors of head
and tail nodes</li>
</ol>
<p>For the experiment, link prediction adjusted to a binary classification, with 0 and 1 representing
link is present or absent respectively, was used, with Random-Forest model for training and prediction.
Also, dataset is divided to training set and nine test sets as incremental updates, to generate 9 snapshots of graph with each snapshot, adding new nodes and updating edges compare to previous graph snapshot.</p>
<p>The second method we experimented with followed a model-agnostic meta-learning based approach
with Graph Convolutional Networks(GCN). The idea here is to learn a GCN to predict the embeddings
of new nodes given the old embeddings of its neighboring entities in the old graph and similarly
obtain an updated representation of old entities based on the recently learned embedding of new
entities. These two predictions are jointly iterated. This can be viewed as learning to learn problem
(meta-learning).
<link rel="stylesheet" href=https://mnishant2.github.io/css/hugo-easy-gallery.css />
<div class="box" style="max-width:50%">
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/tsne-2d-40clusters.png#center" alt="tsne visualization of top 40 entity embeddings cluster"/>
</div>
<a href="../../media/tsne-2d-40clusters.png#center" itemprop="contentUrl"></a>
<figcaption>
<p>tsne visualization of top 40 entity embeddings cluster</p>
</figcaption>
</figure>
</div>
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script>
<script src=https://mnishant2.github.io/js/load-photoswipe.js></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.css" integrity="sha256-sCl5PUOGMLfFYctzDW3MtRib0ctyUvI9Qsmq2wXOeBY=" crossorigin="anonymous" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/default-skin/default-skin.min.css" integrity="sha256-BFeI1V+Vh1Rk37wswuOYn5lsTcaU96hGaI7OUVCLjPc=" crossorigin="anonymous" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.js" integrity="sha256-UplRCs9v4KXVJvVY+p+RSo5Q4ilAUXh7kpjyIP5odyc=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe-ui-default.min.js" integrity="sha256-PWHOlUzc96pMc8ThwRIXPn8yH4NOLu42RQ0b9SpnpFk=" crossorigin="anonymous"></script>
<div class="pswp" tabindex="-1" role="dialog" aria-hidden="true">
<div class="pswp__bg"></div>
<div class="pswp__scroll-wrap">
<div class="pswp__container">
<div class="pswp__item"></div>
<div class="pswp__item"></div>
<div class="pswp__item"></div>
</div>
<div class="pswp__ui pswp__ui--hidden">
<div class="pswp__top-bar">
<div class="pswp__counter"></div>
<button class="pswp__button pswp__button--close" title="Close (Esc)"></button>
<button class="pswp__button pswp__button--share" title="Share"></button>
<button class="pswp__button pswp__button--fs" title="Toggle fullscreen"></button>
<button class="pswp__button pswp__button--zoom" title="Zoom in/out"></button>
<div class="pswp__preloader">
<div class="pswp__preloader__icn">
<div class="pswp__preloader__cut">
<div class="pswp__preloader__donut"></div>
</div>
</div>
</div>
</div>
<div class="pswp__share-modal pswp__share-modal--hidden pswp__single-tap">
<div class="pswp__share-tooltip"></div>
</div>
<button class="pswp__button pswp__button--arrow--left" title="Previous (arrow left)">
</button>
<button class="pswp__button pswp__button--arrow--right" title="Next (arrow right)">
</button>
<div class="pswp__caption">
<div class="pswp__caption__center"></div>
</div>
</div>
</div>
</div>
</p>
</description>
</item>
<item>
<title>TransE</title>
<link>https://mnishant2.github.io/talk/transe/</link>
<pubDate>Wed, 22 Jan 2020 13:45:00 +0000</pubDate>
<guid>https://mnishant2.github.io/talk/transe/</guid>
<description></description>
</item>
<item>
<title>Generative Adversarial Networks: Reproducibility Study</title>
<link>https://mnishant2.github.io/project/gan/</link>
<pubDate>Sun, 15 Dec 2019 14:11:56 -0400</pubDate>
<guid>https://mnishant2.github.io/project/gan/</guid>
<description><p>In this project, the final project for
<a href="https://cs.mcgill.ca/~wlh/comp551/" target="_blank" rel="noopener">COMP551: Applied Machine Learning Course</a>
we study the 2014 published paper
<a href="https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf" target="_blank" rel="noopener">Generative Adversarial Networks</a>
. We have tried to reproduce a subset of the results obtained in the paper and performed ablation studies to understand the model&rsquo;s robustness and evaluate the importance of the various model hyper-parameters. We also extended the model to include newer features in order to improve the model&rsquo;s performance on the featured datasets, by making changes to the model&rsquo;s internal structure, inspired by more recent works in the field.</p>
<p>Generative Adversarial Networks (GANs) were first described in
<a href="https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf" target="_blank" rel="noopener">this paper</a>
and are based on the
<a href="https://www.investopedia.com/terms/z/zero-sumgame.asp" target="_blank" rel="noopener">zero-sum non-cooperative game</a>
between a Discriminator (D) and a Generator(G), analysed thoroughly in the field of
<a href="https://en.wikipedia.org/wiki/Non-cooperative_game_theory" target="_blank" rel="noopener">Game Theory</a>
. The framework where both D and G networks are multilayer perceptrons, is referred to as Adversarial Networks.</p>
<p>The provided code was implemented using the now obsolete
<a href="http://deeplearning.net/software/theano/" target="_blank" rel="noopener">Theano framework</a>
and using python2, hence it was really difficult to reconfigure and get it setup on our system. Nevertheless we managed to hack the code and get it to execute for the task of reproducing the results on
<a href="http://yann.lecun.com/exdb/mnist/" target="_blank" rel="noopener">MNIST dataset</a>
but proceeded to use the much more interpretable and relevant pytorch implementation for ablation studies and extension of the model. The original paper trains the presented GAN network on the MNIST,
<a href="https://www.cs.toronto.edu/~kriz/cifar.html" target="_blank" rel="noopener">CIFAR-10</a>
and TFD images. However, the Toronto Faces Database (TFD) is not accessible without permission, and the provided code does not include scripts for it. Hence, we do not reproduce their results on the TFD database.</p>
<p>GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs. We decided to put this notion to test by tuning some of the hyperparameters involved in training the models. As part of the ablation studies, we experimented with different values for</p>
<ul>
<li>Learning Rates: We tuned the learning rates of both Generator and Discriminator models.</li>
<li>Loss Functions: We decided to experiment with the L2 norm or
<a href="https://www.probabilitycourse.com/chapter9/9_1_5_mean_squared_error_MSE.php" target="_blank" rel="noopener">Mean Squared error</a>
loss function.</li>
<li>D_steps: Number of steps to apply for the Discriminator, i.e the number of times the Discriminator is trained before updating the Generators. We changed it from 1 to 2 as part of our experiment.</li>
</ul>
<p>As part of extensions of GAN we implemented two variants of GAN</p>
<ul>
<li>
<a href="https://arxiv.org/abs/1511.06434" target="_blank" rel="noopener">Deep Convolutional Generative Adversarial Networks</a>
or DCGAN are a variation of GAN
where the vanilla GAN is upscaled using CNNs.</li>
<li>
<a href="https://arxiv.org/abs/1411.1784" target="_blank" rel="noopener">Conditional Generative Adversarial Networks</a>
or cGAN which allows us to direct the generation process of the model by conditioning it on certain features, here, the class labels.</li>
</ul>
</description>
</item>
<item>
<title>Incremental Knowledge Graphs</title>
<link>https://mnishant2.github.io/project/transe/</link>
<pubDate>Sun, 15 Dec 2019 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/project/transe/</guid>
<description><p>This project was directed towards the final course project requirement for
<a href="https://www.mcgill.ca/study/2019-2020/courses/comp-550" target="_blank" rel="noopener">COMP 550: Natural Language Processing</a>
course at McGill University.</p>
<p>Knowledge graphs (KGs) succinctly represent
real-world facts as multi-relational graphs. A
plethora of work exists in embedding the information
in KG to a continuous vector space in
order to obtain new facts and facilitate multiple
down-stream NLP tasks.</p>
<p>Despite the popularity
of the KG embedding problem, to the
best of our knowledge, we find that no existing
work handles dynamic/evolving knowledge
graphs that incorporates facts about new
entities.</p>
<p>In this project, we propose this problem
as an incremental learning problem and
propose solutions to obtain representations for
new entities and also update the representations
of old entities that share facts with these
newer entities. The primary motive of this setup is to avoid
relearning the knowledge graph embedding altogether
with the occurrence of every new set
of facts (triplets).</p>
<p>We build our solutions with
<a href="https://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf" target="_blank" rel="noopener">TransE(Bordes et al.)</a>
as our base KG embedding model and
evaluate the learned embeddings on facts associated
with these new entities.</p>
<p>To this aim, we formulated
two solutions; the first approach followed a finetuning
based transfer-learning solution, and the
second followed a model-agnostic meta-learning
based approach with Graph Convolutional Networks
(GCN). While our model-specific finetuning
approach fared well, the proposed model independent
approach failed to learn representations for a new entity.</p>
<p>We used
<a href="https://github.com/thunlp/OpenKE" target="_blank" rel="noopener">OpenKE’s</a>
implementation for setting our model. For our
task, we made changes to the TransE model, so
that it can learn the representations of the new entities. We employed the
<a href="https://www.microsoft.com/en-us/download/details.aspx?id=52312" target="_blank" rel="noopener">FB20K</a>
dataset
(
<a href="http://nlp.csai.tsinghua.edu.cn/~lzy/publications/aaai2016_dkrl.pdf" target="_blank" rel="noopener">Xie et al., 2016</a>
) for our task. In addition to
containing all the entities and relations from the
FB15K dataset, this dataset also contains new entities
which was required for our setup. We evaluate the models for link prediction, which
aims to predict the missing h or t for a relation fact
(h, r, t).</p>
</description>
</item>
<item>
<title>Image Stitching (Panorama)</title>
<link>https://mnishant2.github.io/project/image_stitching/</link>
<pubDate>Tue, 03 Dec 2019 14:11:05 -0400</pubDate>
<guid>https://mnishant2.github.io/project/image_stitching/</guid>
<description><p>This was the final assignment of
<a href="https://www.mcgill.ca/study/2018-2019/courses/comp-558" target="_blank" rel="noopener">COMP558:Fundamentals of Computer Vision</a>
course, where we had to implement an image stitching(Panorama) algorithm from scratch. We were given a set of images taken by rotating the camera vertically and horizontally and the goal was to stitch them together to form a panorama exactly like how mobile devices do.</p>
<p>We used the SIFT algorithm implemented as part of
<a href="../sift">this project</a>
with certain modifications(second order keypoint extraction) for feature extraction. Features along edges are eliminated using eigenvalues of the hessian matrix, and weak features along edges will have low
<a href="https://www.mathsisfun.com/algebra/eigenvalue.html" target="_blank" rel="noopener">eigenvalues</a>
along the edge and are therefore
suppressed. The low contrast features are eliminated in this implementation using second order Taylor
series based thresholding. Instead of 36 dimension feature histograms, now we had 128 dimensional feature vectors which are intuitively better descriptors.</p>
<p>For the extracted features, two different matching strategies viz
<a href="https://www.mathworks.com/help/vision/ref/matchfeatures.html" target="_blank" rel="noopener">matchFeatures(MATLAB function)</a>
and our own implementation of
<a href="https://www.sciencedirect.com/topics/engineering/bhattacharyya-distance" target="_blank" rel="noopener">Bhattacharyya Distance</a>
that requires normalized histograms were compared. We decided to proceed with featureMatch for the relative simplicity, even though Bhattacharyya measure was more robust and rich.</p>
<p>Using the feature matches we implemented a least squares based
<a href="http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FISHER/RANSAC/" target="_blank" rel="noopener">Random Sample Consensus(RANSAC) algorithm</a>
to find a homography H between corresponding images that puts matched points in exact correspondence. This step is called
<a href="https://www.mathworks.com/discovery/image-registration.html" target="_blank" rel="noopener">Image Registration</a>
. The homography was found by solving the equation of the form Ax+B given below, using
<a href="https://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm" target="_blank" rel="noopener">Singular Value Decomposition</a>
.
<link rel="stylesheet" href=https://mnishant2.github.io/css/hugo-easy-gallery.css />
<div class="box" >
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/homography.jpg#center" alt="Least Squares Estimation equation for finding Homography"/>
</div>
<a href="../../media/homography.jpg#center" itemprop="contentUrl"></a>
<figcaption>
<p>Least Squares Estimation equation for finding Homography</p>
</figcaption>
</figure>
</div>
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script>
<script src=https://mnishant2.github.io/js/load-photoswipe.js></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.css" integrity="sha256-sCl5PUOGMLfFYctzDW3MtRib0ctyUvI9Qsmq2wXOeBY=" crossorigin="anonymous" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/default-skin/default-skin.min.css" integrity="sha256-BFeI1V+Vh1Rk37wswuOYn5lsTcaU96hGaI7OUVCLjPc=" crossorigin="anonymous" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.js" integrity="sha256-UplRCs9v4KXVJvVY+p+RSo5Q4ilAUXh7kpjyIP5odyc=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe-ui-default.min.js" integrity="sha256-PWHOlUzc96pMc8ThwRIXPn8yH4NOLu42RQ0b9SpnpFk=" crossorigin="anonymous"></script>
<div class="pswp" tabindex="-1" role="dialog" aria-hidden="true">
<div class="pswp__bg"></div>
<div class="pswp__scroll-wrap">
<div class="pswp__container">
<div class="pswp__item"></div>
<div class="pswp__item"></div>
<div class="pswp__item"></div>
</div>
<div class="pswp__ui pswp__ui--hidden">
<div class="pswp__top-bar">
<div class="pswp__counter"></div>
<button class="pswp__button pswp__button--close" title="Close (Esc)"></button>
<button class="pswp__button pswp__button--share" title="Share"></button>
<button class="pswp__button pswp__button--fs" title="Toggle fullscreen"></button>
<button class="pswp__button pswp__button--zoom" title="Zoom in/out"></button>
<div class="pswp__preloader">
<div class="pswp__preloader__icn">
<div class="pswp__preloader__cut">
<div class="pswp__preloader__donut"></div>
</div>
</div>
</div>
</div>
<div class="pswp__share-modal pswp__share-modal--hidden pswp__single-tap">
<div class="pswp__share-tooltip"></div>
</div>
<button class="pswp__button pswp__button--arrow--left" title="Previous (arrow left)">
</button>
<button class="pswp__button pswp__button--arrow--right" title="Next (arrow right)">
</button>
<div class="pswp__caption">
<div class="pswp__caption__center"></div>
</div>
</div>
</div>
</div>
For solving this equation we just need 4 matches, so in our RANSAC algorithm we select 4 random points at each iteration to find homography and then using the Homography Matrix, we find a consensus set, i.e the matches in two images that agree to the homography calculated by using
<a href="https://mathworld.wolfram.com/Distance.html" target="_blank" rel="noopener">Euclidean Distance</a>
. We calculate the distance between transformed points for each match(using H) and corresponding actual matches and threshold them at 0.5 to filter inliers.</p>
<p>Following the sequential image registration we use the matched
features from consecutive images to learn geometric transformations between them in order to
project them into a panoramic image. This process is called
<a href="https://www.mathworks.com/help/vision/examples/feature-based-panoramic-image-stitching.html" target="_blank" rel="noopener">Image Stitching</a>
. In order to perform
image stitching, an empty panorama is created, then the images are aligned and blended based
on the learned homography after which they are warped on to the panorama canvas.
<div class="box" >
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/stitch3.jpg#center" alt="Result of our Image stitching algorithm on Real images taken from my OnePlus phone"/>
</div>
<a href="../../media/stitch3.jpg#center" itemprop="contentUrl"></a>
<figcaption>
<p>Result of our Image stitching algorithm on Real images taken from my OnePlus phone</p>
</figcaption>
</figure>
</div>
</p>
</description>
</item>
<item>
<title>Modified MNIST [Kaggle]</title>
<link>https://mnishant2.github.io/project/modified_mnist/</link>
<pubDate>Thu, 14 Nov 2019 14:11:29 -0400</pubDate>
<guid>https://mnishant2.github.io/project/modified_mnist/</guid>
<description><p>This was a
<a href="https://www.kaggle.com/c/modified-mnist" target="_blank" rel="noopener">competition hosted on Kaggle</a>
and was a miniproject for the
<a href="https://cs.mcgill.ca/~wlh/comp551/" target="_blank" rel="noopener">COMP 551: Applied Machine Learning</a>
Course.
We analyze different Machine Learning models to process a modified version of the MNIST dataset and develop a supervised classification model that can predict the number with the largest numeric value that is present in an Image.</p>
<p>We analyze Images from a modified version of the
<a href="http://yann.lecun.com/exdb/mnist/" target="_blank" rel="noopener">MNIST dataset (Yann Le Cunn, 2001)</a>
. MNIST is a dataset that contains handwritten numeric digits from 0-9 and the goal is to classify which digit is present in an image. The given dataset contains 50,000 modified MNIST images.The images are grayscale images of size 128*128. Each image contains three MNIST style randomly sampled numbers on custom grayscale backgrounds each at various positions and orientations in the image. The task was to train a model in order to identify the number with the highest numerical value in the image.</p>
<p>We experimented numerous models with different configurations for this task. The models chosen were primarily pretrained complex neural network models, such as ResNets, VGGNets and
<a href="">EfficientNets</a>
. After fine-tuning the best performing models’ hyper-parameters, to further boost the classification accuracy, we used various data augmentation techniques, including Affine Transformation
Mappings, Scale-Space blurring, Contrast changes and Perspective transforms. By doing so, we were able to gain a higher accuracy on the test set, as compared to before data augmentation.
<link rel="stylesheet" href=https://mnishant2.github.io/css/hugo-easy-gallery.css />
<div class="box" >
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/modified_mnist2.jpg#center" />
</div>
<a href="../../media/modified_mnist2.jpg#center" itemprop="contentUrl"></a>
</figure>
</div>
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script>
<script src=https://mnishant2.github.io/js/load-photoswipe.js></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.css" integrity="sha256-sCl5PUOGMLfFYctzDW3MtRib0ctyUvI9Qsmq2wXOeBY=" crossorigin="anonymous" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/default-skin/default-skin.min.css" integrity="sha256-BFeI1V+Vh1Rk37wswuOYn5lsTcaU96hGaI7OUVCLjPc=" crossorigin="anonymous" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.js" integrity="sha256-UplRCs9v4KXVJvVY+p+RSo5Q4ilAUXh7kpjyIP5odyc=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe-ui-default.min.js" integrity="sha256-PWHOlUzc96pMc8ThwRIXPn8yH4NOLu42RQ0b9SpnpFk=" crossorigin="anonymous"></script>
<div class="pswp" tabindex="-1" role="dialog" aria-hidden="true">
<div class="pswp__bg"></div>
<div class="pswp__scroll-wrap">
<div class="pswp__container">
<div class="pswp__item"></div>
<div class="pswp__item"></div>
<div class="pswp__item"></div>
</div>
<div class="pswp__ui pswp__ui--hidden">
<div class="pswp__top-bar">
<div class="pswp__counter"></div>
<button class="pswp__button pswp__button--close" title="Close (Esc)"></button>
<button class="pswp__button pswp__button--share" title="Share"></button>
<button class="pswp__button pswp__button--fs" title="Toggle fullscreen"></button>
<button class="pswp__button pswp__button--zoom" title="Zoom in/out"></button>
<div class="pswp__preloader">
<div class="pswp__preloader__icn">
<div class="pswp__preloader__cut">
<div class="pswp__preloader__donut"></div>
</div>
</div>
</div>
</div>
<div class="pswp__share-modal pswp__share-modal--hidden pswp__single-tap">
<div class="pswp__share-tooltip"></div>
</div>
<button class="pswp__button pswp__button--arrow--left" title="Previous (arrow left)">
</button>
<button class="pswp__button pswp__button--arrow--right" title="Next (arrow right)">
</button>
<div class="pswp__caption">
<div class="pswp__caption__center"></div>
</div>
</div>
</div>
</div>
</p>
<p>The final fine-tuned model was able to achieve an accuracy of 99% on the validation data, and an accuracy of 99.166% on the test data in the public leaderboard of the competition. We finished
<a href="https://www.kaggle.com/c/modified-mnist/leaderboard" target="_blank" rel="noopener">2nd and 4th out of 105 teams(Group 30) on the public and the private leaderboards</a>
of the competition respectively.</p>
</description>
</item>
<item>
<title>SIFT</title>
<link>https://mnishant2.github.io/project/sift/</link>
<pubDate>Mon, 11 Nov 2019 14:11:11 -0400</pubDate>
<guid>https://mnishant2.github.io/project/sift/</guid>
<description><p>In this project, which was essentially an assignment in
<a href="https://www.mcgill.ca/study/2018-2019/courses/comp-558" target="_blank" rel="noopener">COMP558:Fundamentals of Computer Vision</a>
course, I implemented the
<a href="http://www.scholarpedia.org/article/Scale_Invariant_Feature_Transform" target="_blank" rel="noopener">Scale Invariant Feature Transform(SIFT)</a>
algorithm from scratch. SIFT is a traditional computer vision feature extraction technique. SIFT features are scale, space and rotationally invariant.</p>
<p>SIFT is a highly involved algorithm and thus implementing it from scratch is an arduous tasks. At an abstract level the SIFT algorithm can be described in five steps</p>
<ul>
<li>
<p><strong>Find Scale Space Extrema</strong>: We construct the
<a href="https://www.sciencedirect.com/topics/engineering/laplacian-pyramid" target="_blank" rel="noopener">Laplacian(Difference of Gaussian) pyramid</a>
for the given image and using this pyramid, we
found local extremas in each level of the laplacian pyramid by taking a local area and
comparing the intensities in that local region for the same scale as well as the
adjacent(next and previous) levels in the pyramid. Two local
neighbourhood sizes(3<em>3,5</em>5) were tried.</p>
</li>
<li>
<p><strong>Keypoint Localization</strong>: A large number of keypoints are generated by the first step which might not be useful. Corner cases and low contrast keypoints are discarded. Also a threshold was specified in order to select only strong extremas. A
<a href="https://mathworld.wolfram.com/TaylorSeries.html" target="_blank" rel="noopener">taylor series expansion</a>
of scale space is done to get a more accurate value of extrema and those falling below the threshold were discarded.</p>
</li>
<li>
<p><strong>Gradient Calculation</strong>: For each keypoint detected, a square neigborhood(17x17 in our case) was taken around them at their respective scales. Intensity gradients and orientation were calculated for the given neighborhood. A
<a href="https://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm" target="_blank" rel="noopener">gaussian mask</a>
of the same size as our neighborhood was used as a weighting mask over gradient magnitude matrix.</p>
</li>
<li>
<p><strong>SIFT Feature Descriptors</strong>: SIFT feature descriptors are created by taking
<a href="https://www.analyticsvidhya.com/blog/2019/09/feature-engineering-images-introduction-hog-feature-descriptor/" target="_blank" rel="noopener">histograms of gradients orientations</a>
for each keypoint neighborhoods. Orientations are divided into bins of various ranges(36 bins of 10 deg in our case), and for each gradient falling in a bin the gradient magnitude value is added to that particular bin. Once we have the histogram we find the orientation with the highest weighted value. Its the principle orientation and the desriptors(orientation vectors) are shifted counterclockwise such that principle orientation becomes the first bin. This lends SIFT features their rotational invariance.</p>
</li>
</ul>
<p>Once we had the SIFT desriptors, I transformed the image and calculated SIFT vectors for the original and transformed images and matched them using bruteforce algorithm i.e
<a href="https://www.sciencedirect.com/topics/engineering/bhattacharyya-distance" target="_blank" rel="noopener">Bhattacharyya Distance</a>
and visualised(as in figure above) the matches above a certain threshold to test the robustness of the SIFT algorithm.</p>
</description>
</item>
<item>
<title>Reddit Comment Classification [Kaggle]</title>
<link>https://mnishant2.github.io/project/reddit_comment/</link>
<pubDate>Mon, 21 Oct 2019 14:11:20 -0400</pubDate>
<guid>https://mnishant2.github.io/project/reddit_comment/</guid>
<description><p>This was a
<a href="https://www.kaggle.com/c/reddit-comment-classification-comp-551" target="_blank" rel="noopener">competition hosted on Kaggle</a>
and was a miniproject for the
<a href="https://cs.mcgill.ca/~wlh/comp551/" target="_blank" rel="noopener">COMP 551: Applied Machine Learning</a>
Course.</p>
<p>We analyze text from the website Reddit, and develop a multilabel
classification model to predict which subreddit (group) a
queried comment came from. Reddit is an online forum, where
people discuss various topics from sports to cartoons, technology
and video-games. The dataset is a list of comments from 20
different subreddits (groups/topics). This problem can be formulated
as a type of
<a href="https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17" target="_blank" rel="noopener">Sentiment analysis</a>
problem, which is quite
well-known in the Natural Language Processing (NLP) literature.
Sentiment analysis is a computational approach toward
identifying opinion, sentiment, and subjectivity in text.</p>
<p>For this dataset, we implemented a Bernoulli Naive Bayes
classifier, trained and tested it against the dataset. We also analyzed
various models for improving the classification accuracy,
including Support Vector Machines, Logistic Regression,
k-Nearest Neighbours, the Ensemble method of Stacking and
a Deep Learning model
<a href="https://arxiv.org/abs/1801.06146" target="_blank" rel="noopener">ULMFiT (J.Howard and S.Ruder,
2018)</a>
. We also tried using the
<a href="https://github.com/flairNLP/flair" target="_blank" rel="noopener">FlairNLP library</a>
concatenating several combinations of embeddings such as
<a href="https://alanakbik.github.io/papers/coling2018.pdf" target="_blank" rel="noopener">FlairEmbeddings</a>
+
<a href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener">BERT</a>
to get text features for classification</p>
<p>We compare the accuracy of these models for different Feature
extraction methods, namely
<a href="http://www.tfidf.com/" target="_blank" rel="noopener">Term Frequency-Inverse document
frequency (TF-IDF)</a>
, Binary and Non-Binary Count Vectorizer.
We also analyze the performance gain/loss after applying
Dimensionality reduction methods on the dataset. In particular,
we explore the
<a href="https://en.wikipedia.org/wiki/Principal_component_analysis#:~:text=Principal%20component%20analysis%20%28PCA%29%20is,components%20and%20ignoring%20the%20rest." target="_blank" rel="noopener">Principle Component Analysis (PCA)</a>
inspired
method of
<a href="https://www.sciencedirect.com/topics/computer-science/latent-semantic-analysis" target="_blank" rel="noopener">Latent Semantic Analysis (LSA)</a>
.</p>
<p>We observed that the best results were obtained by stacking various combinations
of the models described above. For the final submission, we
used an ensemble classifier with ’soft’ voting by Stacking SVM,
Naive Bayes and Logistic Regression at their optimum parameter
settings.which gave an accuracy of 57.97% on our validation
data and 58.011% on kaggle public leaderboard. Adding ULMFit
to the stack and using a logistic regression on top as meta
classifier further bolstered the accuracy to 60.1%. We finished
<a href="https://www.kaggle.com/c/reddit-comment-classification-comp-551/leaderboard" target="_blank" rel="noopener">10th and 8th out of 105 teams(Group 60) on the public and the private leaderboards</a>
of the competition respectively.</p>
</description>
</item>
<item>
<title>Generic Extraction Module (G.E.M)</title>
<link>https://mnishant2.github.io/project/gem/</link>
<pubDate>Tue, 21 May 2019 09:20:41 -0400</pubDate>
<guid>https://mnishant2.github.io/project/gem/</guid>
<description><p>The project at Signzy involved training a generalizable model for information retrieval from OCR output of Indian ID cards. We used both character level embeddings and word level embeddings(
<a href="https://allennlp.org/elmo" target="_blank" rel="noopener">ELMO</a>
) in a stacked manner for language modelling before passing the concatenated embeddings to a bidirectional Long Short Term Memory neural network with Conditional Random Field modelling on LSTM output (
<a href="https://arxiv.org/abs/1508.01991" target="_blank" rel="noopener">Huang et al.</a>
) for final classification.</p>
<p>The model was trained on a large corpus of text OCR outputs obtained from our own proprietary ID cards dataset for extracting non-trivial information such as Names, dates, numbers, addreses from any card. The training was done in a way to ensure the embeddings were also fine tuned. The
<a href="https://github.com/flairNLP/flair" target="_blank" rel="noopener">FlairNLP library</a>
was used to create the preprocessing, text embedding, training and postprocessing pipeline and training was performed using pytorch framework. Multiple combinations of embeddings including FlairEmbeddings(
<a href="https://www.aclweb.org/anthology/C18-1139/" target="_blank" rel="noopener">Contextualized string embeddings for sequence labelling</a>
), BERT, CharacterEmbeddings, ELMO, XLNet were benchmarked before settling on the final pair based on accuracy, compute and efficiency considerations.</p>
<p>Not only did the model perform admirably well on unseen text from ID types part of training data irrespective of variations in OCR output and image layout, but it generalised well for out of sample ID types too when finetuned with just 1-5 samples of these cards.</p>
<p>The idea behind this was to build a generic, flexible information retrieval engine thats pretrained to extract important information from OCR output of all ID cards without specifically being trained on them or having seen them, without any rule based processing, that can be easily finetuned on a very small number of samples of any new card type for optimum performance. This was made into a rest API as a plug and play product for clients to finetune the model on their samples and then use it out of the box to extract information from IDs. The performance was measured using precision and recall figures.</p>
</description>
</item>
<item>
<title>Performance Evaluation of Neural Networks for Speaker Recognition</title>
<link>https://mnishant2.github.io/publication/mishra-performance-2019/</link>
<pubDate>Fri, 01 Feb 2019 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/publication/mishra-performance-2019/</guid>
<description></description>
</item>
<item>
<title>Image Quality Assessment</title>
<link>https://mnishant2.github.io/project/iqa/</link>
<pubDate>Mon, 21 Jan 2019 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/project/iqa/</guid>
<description><p>Many of the vision based applications or APIs meant for information retrieval/data verification such as Text extraction or face recognition need a minimal quality of image for efficient processing and adequate performance. Hence it becomes imperative to implement an Image quality assessment layer before proceeding with further processing. This will ensure smooth applicaton of the vision algorithms, reliable performance and an overall time reduction by ensuring less redundant computations on oor quality images, and prventing multiple requests and passes through the algorithm.</p>
<p>This additional filter helps by ensuring only optimal quality images are passed on and poor quality images are screened at the client/user stage itself saving the users time and the server unnecessary processing, ensuring higher throughput and efficiency.</p>
<p>We implemented one such pipeline using an ensemble of models that qualitatively analysed images and produced a quantitative measure for image quality that could then be used as a threshold for decision on whether they are sent for downstream processing or the user is notified to repeat the request with better quality images. This quantitative score ensures flexibility for different tasks and different people tailored to their needs.</p>
<p>The model detects the blur in an image(
<a href="../blurnet">BlurNet</a>
), brightness of the image(a
<a href="https://www.researchgate.net/figure/ResNet-18-Architecture_tbl1_322476121" target="_blank" rel="noopener">ResNet-18</a>
model trained for binary classification i.e dark vs bright) and the text readability(based on performance of text detection and OCR algorithms along with other filtering and morphological operations on the image to estimate textual region) and a meta layer performed computation on their individual outputs to provide a final cumulative Image Quality Score.</p>
<p>The final meta learner was trained taking the outputs of individual models as input with the average image quality scores assigned to each image by annotators being the output score. The annotation was done by assigning each image to atleast five random users and asking them to score the image on the three parameters i.e Blur, Brightness and readability out of 10 solely on their personal discretion. These scores were then fit into a weighting formula to generate a cumulative score. This final score obtained from all the annotators for each image was averaged to output the final ground truth score for the image.</p>
<p>The clients get both the final score as well as outputs from each individual model along with a short description about the image quality based on the score for analysis.</p>
</description>
</item>
<item>
<title>Dory OCR</title>
<link>https://mnishant2.github.io/project/dory-ocr/</link>
<pubDate>Fri, 05 Oct 2018 00:34:01 -0400</pubDate>
<guid>https://mnishant2.github.io/project/dory-ocr/</guid>
<description></description>
</item>
<item>
<title>Sign Language Classification [Bachelor Project]</title>
<link>https://mnishant2.github.io/project/sign_language/</link>
<pubDate>Thu, 10 May 2018 14:12:20 -0400</pubDate>
<guid>https://mnishant2.github.io/project/sign_language/</guid>
<description><p>This was our Undergrad Final Project where we set out to implement a speech Sign Language intercoversion system. More specifically it was Hindi speech- Indian sign Language interconversion system. The speech to sign language subsystem was essentially a derivative of our
<a href="../speech_recognition">speech recognition project</a>
with detected speech being mapped to corresponding sign language visuals in real time.
Here I shall be discussing our Indian Sign Language detection subsystem. Initially we just used a dataset of 7000 2D images of Indian sign language for classification as a proof of concept, we used a modified VGGNet for classification with a 99% accuracy. But using 2D data was impracticable for building a real time and realistic sign language recognition system. To accommodate more complex backgrounds that we could come across in everyday situation instead of the simple backgrounds as in 2-D dataset and also to account for occlusion, various angles arising due to Indian Sign Language being two handed, we decided to use
<a href="https://en.wikipedia.org/wiki/Kinect" target="_blank" rel="noopener">kinect sensor</a>
and hence RGB-D dataset to leverage the depth information rendered by Kinect.</p>
<p>We collected RGB-D data for 48 different Indian Signs. These include both RGB and Depth images of digits, alphabets and a few common words. The dataset comprises of around 36 images per word in our vocabulary, contributed by 18 different people. We trained a
<a href="https://towardsdatascience.com/gaussian-mixture-models-explained-6986aaf5a95" target="_blank" rel="noopener">Multivariate Gaussian Mixture Model(GMM)</a>
on the
<a href="https://www.lifewire.com/what-is-hsv-in-design-1078068" target="_blank" rel="noopener">HSV</a>
pixel values of the data to segment skin region and intensify the skin pixel areas in the RGB-D images.
<link rel="stylesheet" href=https://mnishant2.github.io/css/hugo-easy-gallery.css />
<div class="box" >
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/sign2.jpg#center" alt="Skin segmentation using Multivariate GMM"/>
</div>
<a href="../../media/sign2.jpg#center" itemprop="contentUrl"></a>
<figcaption>
<p>Skin segmentation using Multivariate GMM</p>
</figcaption>
</figure>
</div>
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script>
<script src=https://mnishant2.github.io/js/load-photoswipe.js></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.css" integrity="sha256-sCl5PUOGMLfFYctzDW3MtRib0ctyUvI9Qsmq2wXOeBY=" crossorigin="anonymous" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/default-skin/default-skin.min.css" integrity="sha256-BFeI1V+Vh1Rk37wswuOYn5lsTcaU96hGaI7OUVCLjPc=" crossorigin="anonymous" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe.min.js" integrity="sha256-UplRCs9v4KXVJvVY+p+RSo5Q4ilAUXh7kpjyIP5odyc=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/photoswipe/4.1.1/photoswipe-ui-default.min.js" integrity="sha256-PWHOlUzc96pMc8ThwRIXPn8yH4NOLu42RQ0b9SpnpFk=" crossorigin="anonymous"></script>
<div class="pswp" tabindex="-1" role="dialog" aria-hidden="true">
<div class="pswp__bg"></div>
<div class="pswp__scroll-wrap">
<div class="pswp__container">
<div class="pswp__item"></div>
<div class="pswp__item"></div>
<div class="pswp__item"></div>
</div>
<div class="pswp__ui pswp__ui--hidden">
<div class="pswp__top-bar">
<div class="pswp__counter"></div>
<button class="pswp__button pswp__button--close" title="Close (Esc)"></button>
<button class="pswp__button pswp__button--share" title="Share"></button>
<button class="pswp__button pswp__button--fs" title="Toggle fullscreen"></button>
<button class="pswp__button pswp__button--zoom" title="Zoom in/out"></button>
<div class="pswp__preloader">
<div class="pswp__preloader__icn">
<div class="pswp__preloader__cut">
<div class="pswp__preloader__donut"></div>
</div>
</div>
</div>
</div>
<div class="pswp__share-modal pswp__share-modal--hidden pswp__single-tap">
<div class="pswp__share-tooltip"></div>
</div>
<button class="pswp__button pswp__button--arrow--left" title="Previous (arrow left)">
</button>
<button class="pswp__button pswp__button--arrow--right" title="Next (arrow right)">
</button>
<div class="pswp__caption">
<div class="pswp__caption__center"></div>
</div>
</div>
</div>
</div>
</p>
<p>Since per class data was significantly small for training a robust model, we performed significant data segmentation(blurring,affine transforms,colour adjustments) to multiply the data before training. Once we had the data, we adopted two different paradigms. In the first method we stacked the RGB and Depth image vertically before passing them on to a ResNet-50 classifier for training. This method reached a validation accuracy of 71%.
<div class="box" >
<figure itemprop="associatedMedia" itemscope itemtype="http://schema.org/ImageObject">
<div class="img">
<img itemprop="thumbnail" src="https://mnishant2.github.io/media/sign3.jpg#center" alt="Data sample along with Augmentation"/>
</div>
<a href="../../media/sign3.jpg#center" itemprop="contentUrl"></a>
<figcaption>
<p>Data sample along with Augmentation</p>
</figcaption>
</figure>
</div>
</p>
<p>The second approach involved using a
<a href="http://vis-www.cs.umass.edu/bcnn/" target="_blank" rel="noopener">Bilinear CNN</a>
system, with two parallel ResNet architectures for RGB and Depth images separately followed by bilinear pooling of features output by them before being passed on to subsequent Dense layers. This approach performed better with a validation accuracy of 79% although it was computationally more expensive. Finally we passed the output of the sign language detection system through
<a href="https://cloud.google.com/text-to-speech" target="_blank" rel="noopener">Google&rsquo;s text to speech(TTS)</a>
generation API for getting the final speech output.</p>
</description>
</item>
<item>
<title>Cropnet</title>
<link>https://mnishant2.github.io/project/cropnet/</link>
<pubDate>Thu, 12 Apr 2018 00:00:00 +0000</pubDate>
<guid>https://mnishant2.github.io/project/cropnet/</guid>
<description><p>As the name suggests, this project involved training a model to crop out documents from a background. Essentially this can be classified as a segmentation task that would need massive annotation of data with a mask on the foreground object which is to be used for supervised segmentation training.</p>
<p>We decided to cast this into a regression based problem where we annotated only the four corner points of the foreground object as our training labels and then used them to train a regression model with 8 continuous valued outputs({x,y} coordinates of all four corners). Once we had these points we implemented a perspective transform to warp the object into a rectangular space for the final cropped output.</p>
<p>For training we used custom aggregated and crowdsourced dataset of ID cards and other documents in various background settings. We implemented our own
<a href="https://github.com/mnishant2/NMAnnotation-tool" target="_blank" rel="noopener">annotation tool</a>
for the above mentioned ground truth annotation. In order to ensure variance, we used both natural camera taken images as well as synthetically generated data by superimposing the already available cropped samples on random backgrounds at different positions, scales and orientation.</p>
<p>Not only this, we also implemented massive data augmentation in order to further multiply our training data that worked simultaneously on the image and the annotated keypoints. Some of the augmentation techniques used were blurring, rotation, scaling,grayscale, color adjustments, dropout, adding noise etc. We used the
<a href="https://imgaug.readthedocs.io/en/latest/" target="_blank" rel="noopener">imgaug library</a>
for the whole augmentation pipeline.</p>
<p>Annotation, synthetic data generation, and augmentation were all done in such a way as to ensure the sequence of the four corner points with respect to the object remained same in order to ensure spatial and rotational invariance during training and prediction. The upper left point of the foreground object was always the first label followed by others in a clockwise manner.</p>
<p>Once we had sufficient annotated and augmented data, we trained the regression models. We experimented and benchmarked a number of different algorithms and learning paradigms. We benchmarked
<a href="https://arxiv.org/abs/1512.03385" target="_blank" rel="noopener">ResNet</a>
,
<a href="https://arxiv.org/abs/1602.07360" target="_blank" rel="noopener">Squeezenet</a>
,
<a href="https://arxiv.org/abs/1409.1556" target="_blank" rel="noopener">VGGNet</a>
,
<a href="https://arxiv.org/abs/1707.01083" target="_blank" rel="noopener">Shufflenet</a>