Address extraction code is impossible to work on #128

slinkp · 2012-09-28T20:08:19Z

The regular expression that ebdata.nlp.addresses uses to find addresses is actually a 100-line regex into which a 130-line regex is inserted 11 times. The final regex is over 1500 lines long.

It is almost impossible to debug, fix, or extend this regex.
We need to re-think the address extraction approach completely.
Investigating whether there is existing natural-language work we can leverage.

slinkp · 2012-09-28T20:08:21Z

Ticket imported from Trac:
http://developer.openblockproject.org/ticket/128
Reported by: slinkp

This was referenced Sep 28, 2012

nlp address parser doesn't recognize prefixes #284

Open

nlp address parser doesn't recognize highways #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address extraction code is impossible to work on #128

Address extraction code is impossible to work on #128

slinkp commented Sep 28, 2012

slinkp commented Sep 28, 2012

Address extraction code is impossible to work on #128

Address extraction code is impossible to work on #128

Comments

slinkp commented Sep 28, 2012

slinkp commented Sep 28, 2012