Releases: strangetom/ingredient-parser
1.3.2
1.3.1
Warning
This version requires pint >=0.24.4
General
- Support Python 3.13. Requires pint >= 0.24.4.
1.3.0
Processing
-
Various minor improvements to feature generation.
-
Add PREPARED_INGREDIENT flag to IngredientAmount objects. This is used to indicate if the amount refers to the prepared ingredient (
PREPARED_INGREDIENT=True
) or the unpreprared ingredient (PREPARED_INGREDIENT=False
). -
Add
starting_index
attribute to IngredientText objects, indicating the index of the token that starts the IngredientText. -
Improve detection of composite amounts in sentences.
-
Add
quantity_fractions
keyword argument toparse_ingredient
. When True, thequantity
andquantity_max
fields ofIngredientAmount
objects will befractions.Fraction
objects instead of floats. This allows fractions such as 1/3 to be represented exactly. The default behaviour is whenquantity_fractions=False
, where quantities are floats as previously. For example>>> parse_ingredient("1 1/3 cups flour").amount[0] IngredientAmount( quantity=1.333, quantity_max=1.333, unit=<Unit('cup')>, text='1 1/3 cups', ... ) >>> parse_ingredient("1 1/3 cups flour", quantity_fractions=True).amount[0] IngredientAmount( quantity=Fraction(4, 3), quantity_max=Fraction(4, 3), unit=<Unit('cup')>, text='1 1/3 cups', ... )
Model
- Addition of new dataset: tastecooking. This is a relatively small dataset, but includes a number of unique abbreviations for units and sizes.
1.2.0
General
-
New optional keyword argument to extract foundation foods from the ingredient name. Foundation foods are the fundamental item of food, excluding any qualifiers or descriptive adjectives, e.g. for the name
organic cucumber
, the foundation food iscucumber
.See https://ingredient-parser.readthedocs.io/en/latest/guide/foundation.html for additional details.
-
Some minor post processing fixes.
1.1.2
Require NLTK >= 3.9.1, due to change in their resources format.
1.1.1
Revert upgrade to NLTK 3.8.2 after 3.8.2 removed to PyPI.
1.1.0
General
Require NLTK >= 3.8.2 due to change in POS tagger weights format.
Model
- Include new tokens features, which help improve performance:
- Word shape (e.g. cheese -> xxxxxx; Cheese -> Xxxxxx)
- N-gram (n=3, 4, 5) prefixes and suffixes of tokens
- Add 15,000 new sentences to training data from AllRecipes. This dataset includes lots of branded ingredients, which the existing datasets were quite light on.
- Tweaks to the model hyperparameters have yielded a model that is ~25% small, but with better performance than the previous model.
Processing
- Change processing of numbers written as words (e.g. 'one', 'two' ). If the token is labelled as QTY, then the number will converted to a digit (i.e. 'one' -> 1) or collapsed into a range (i.e. 'one or two' -> 1-2), otherwise the token is left unchanged.
1.0.1
Warning
This version requires NLTK >=3.8.2
NLTK 3.8.2 changes the file format (from pickle to json) of the weights used by the part of speech tagger used in this project, to address some security concerns. This patch updates the NLTK resource checks performed when ingredient-parser
is imported to check for the new json files, and downloads them if they are not present.
This version requires NLTK>=3.8.2.
1.0.0
1.0
General
- Improve performance when tagging multiple sentences. For large numbers of sentences (>1000), the performance improvement is ~100x.
Processing
- Extend support for composite amounts that have the form e.g.
1 cup plus 1 tablespoon
or1 cup minus 1 tablespoon
. Previously the phraseplus/minus 1 tablespoon
would be returned in the comment. Now the whole phrase is captured as aCompositeAmount
object. - Fix cases where the incorrect
pint.Unit
would be returned, caused by pint interpreting the unit as something else e.g. "pinch" -> "pico-inch".
0.1.0-beta11
General
-
Refactor package structure to make it more suitable for expansion to over languages.
Note: There aren't any plans to support other languages yet.
Model
- Reduce duplication in training data
- Introduce PURPOSE label for tokens that describe the purpose of the ingredient, such as
for the dressing
andfor garnish
. - Replace quantities with "!num" when determining the features for tokens so that the model doesn't need to learn all possible values quantities can take. This results in a small reduction in model size.
Processing
- Various bug fixes to post-processing of tokens with labels NAME, COMMENT, PREP, PURPOSE, SIZE to correct punctuation and confidence calculations.
- Modification of tokeniser to split full stops from the end of tokens. This helps to model avoid treating "
token.
" and "token
" as different cases to learn. - Add fallback functionality to
parse_ingredient
for cases where none of the tokens are labelled as NAME. This will select name as the token with the highest confidence of being labelled NAME, even though a different label has a high confidence for that token. This can be disabled by settingexpect_name_in_output=False
inparse_ingredient
.