DeepDive 0.0.3-alpha.1
Pre-release
Pre-release
zifeishan
released this
26 May 04:50
·
4333 commits
to master
since this release
Changelog for version 0.0.3-alpha.1 (05/25/2014)
- Updated example walkthrough and
spouse_example
code - Added python utility
ddlib
for text manipulation (need exporting PYTHONPATH, usage see its pydoc) - Added utility script
util/extractor_input_writer.py
to sample extractor inputs - Updated
nlp_extractor
format (usesentence_offset
, textual sentence_id) - Cleaned up unused datastore code
- Update templates
- Bug fixes
Changelog for version 0.0.3-alpha (05/07/2014)
- Non-backward-compatible syntax change: Developers must include
id
column with typebigint
in any table containing variables, but they MUST NOT use this column anywhere. This column is preserved for learning and inference, and all values will be erased and reassigned in grounding phase. - Non-backward-compatible functionality change: DeepDive is no longer responsible for any automatic assignment of sequential variable IDs. You may use
examples/spouse_example/scripts/fill_sequence.sh
for this task. - Updated dependency requirement: requires JDK 7 or higher.
- Supported four new types of extractors. See documentation for details:
- Even faster factor graph grounding and serialization using better optimized SQL.
- The previous default Java sampler is no longer supported. Made C++ sampler as the default sampler.
- New configuration supported:
pipeline.relearn_from
to skip extraction and grounding, only perform learning and inference with a previous version. Useful for tuning sampler arguments. - New configuration supported:
inference.skip_learning
to use weights learned in the last execution. - New configuration supported:
inference.weight_table
to fix factor weights in a table and skip learning. The table is specified by factor description and weights. This table can be results from one execution of DeepDive, or manually assigned, or a combination of them. It is useful for learning once and using learned model for later inference tasks. - Supported manual holdout by a holdout query.
- Updated
spouse_example
with implementations in different styles of extractors. - The nlp_extractor example has changed table requirements and usage. See HERE.
- In
db.default
configuration, users should definedbname
,host
,port
anduser
. If not defined, by default system will use environmental variablesDBNAME
,PGHOST
,PGPORT
andPGUSER
accordingly. - Fixed all examples.
- Updated documentation.
- Print SQL query execution plans for extractor inputs.
- Skip grounding, learning and inference if no factors are active.
- If using GreenPlum, users should add
DISTRIBUTED BY
clause in allCREATE TABLE
commands. Do not use variable id as distribution key. Do not use distribution key that is not initially assigned.