Skip to content

Latest commit

 

History

History
113 lines (90 loc) · 6.41 KB

TODO.md

File metadata and controls

113 lines (90 loc) · 6.41 KB

TODO

The milestones for releases are held as GitHub issues now.

Ideas

Face detection option 2:

https://code.google.com/p/jviolajones/

import detection.Detector;

String fileName="yourfile.jpg"; Detector detector=Detector.create("haarcascade_frontalface_default.xml"); List res=detector.getFaces(fileName, 1.2f,1.1f,.05f, 2,true);

Image Similarity Ideas

Notes

Similarity measures

Two approaches: N-Gram Matching and Fuzzy Search. Both seem to work rather well, but the overall goal is to see which performs better at scale.

http://localhost:8080/discovery/select?rows=20&q.op=OR&fl=*,score&q=ssdeep_hash_ngram_bs_96%3Ar9G3voQkYXUgT97rm1GWnhNZL0%2BoQVpWRIE4PoZ5QbWjW5WiIj7Y7cXyTuWFFcyj+OR+ssdeep_hash_ngram_bs_192%3ArapkEUgpag%2BHtE4Pbhc24s3&wt=json&indent=true

http://localhost:8080/discovery/select?rows=20&q.op=OR&fl=*,score&q=ssdeep_hash_bs_192%3ArapkEUgpag%2BHtE4Pbhc24s3~&wt=json&indent=true

NOTE that the N-Gram approach may also be useful for spotting similar binaries.

Geospatial Queries

http://192.168.45.10:8983/solr/aadda-discovery/select?q=*%3A*&wt=json&indent=true&fq={!geofilt}&sfield=locations&pt=51,0&d=20&sort=geodist()%20asc&fl=*,_dist_:geodist()

Named Entity extraction

Stanford Named Entity Recognizer (NER) appears to be a sound option, although using it would mean relicensing this project as GPL. Has multiple classes of recogniser:

3 class Location, Person, Organization
4 class Location, Person, Organization, Misc
7 class Time, Location, Organization, Person, Money, Percent, Date