Skip to content

Releases: strangetom/ingredient-parser

1.3.2

06 Dec 07:29
Compare
Choose a tag to compare

Processing

  • Fix bug that allowed fractions in the intermediate form (i.e. #1$2) to appear in the name, prep, comment, size, purpose fields of the ParsedIngredient output.

1.3.1

29 Nov 14:51
Compare
Choose a tag to compare

Warning

This version requires pint >=0.24.4

General

  • Support Python 3.13. Requires pint >= 0.24.4.

1.3.0

06 Nov 19:42
Compare
Choose a tag to compare

Processing

  • Various minor improvements to feature generation.

  • Add PREPARED_INGREDIENT flag to IngredientAmount objects. This is used to indicate if the amount refers to the prepared ingredient (PREPARED_INGREDIENT=True) or the unpreprared ingredient (PREPARED_INGREDIENT=False).

  • Add starting_index attribute to IngredientText objects, indicating the index of the token that starts the IngredientText.

  • Improve detection of composite amounts in sentences.

  • Add quantity_fractions keyword argument to parse_ingredient. When True, the quantity and quantity_max fields of IngredientAmount objects will be fractions.Fraction objects instead of floats. This allows fractions such as 1/3 to be represented exactly. The default behaviour is when quantity_fractions=False, where quantities are floats as previously. For example

    >>> parse_ingredient("1 1/3 cups flour").amount[0]
    IngredientAmount(
        quantity=1.333,
        quantity_max=1.333,
        unit=<Unit('cup')>, 
        text='1 1/3 cups', 
        ...
    )
    >>> parse_ingredient("1 1/3 cups flour", quantity_fractions=True).amount[0]
    IngredientAmount(
        quantity=Fraction(4, 3),
        quantity_max=Fraction(4, 3),
        unit=<Unit('cup')>,
        text='1 1/3 cups',
        ...
    )

Model

  • Addition of new dataset: tastecooking. This is a relatively small dataset, but includes a number of unique abbreviations for units and sizes.

1.2.0

29 Sep 12:20
Compare
Choose a tag to compare

General

  • New optional keyword argument to extract foundation foods from the ingredient name. Foundation foods are the fundamental item of food, excluding any qualifiers or descriptive adjectives, e.g. for the name organic cucumber, the foundation food is cucumber.

    See https://ingredient-parser.readthedocs.io/en/latest/guide/foundation.html for additional details.

  • Some minor post processing fixes.

1.1.2

23 Aug 13:47
Compare
Choose a tag to compare

Require NLTK >= 3.9.1, due to change in their resources format.

1.1.1

16 Aug 05:58
Compare
Choose a tag to compare

Revert upgrade to NLTK 3.8.2 after 3.8.2 removed to PyPI.

1.1.0

15 Aug 15:37
Compare
Choose a tag to compare

General

Require NLTK >= 3.8.2 due to change in POS tagger weights format.

Model

  • Include new tokens features, which help improve performance:
    • Word shape (e.g. cheese -> xxxxxx; Cheese -> Xxxxxx)
    • N-gram (n=3, 4, 5) prefixes and suffixes of tokens
  • Add 15,000 new sentences to training data from AllRecipes. This dataset includes lots of branded ingredients, which the existing datasets were quite light on.
  • Tweaks to the model hyperparameters have yielded a model that is ~25% small, but with better performance than the previous model.

Processing

  • Change processing of numbers written as words (e.g. 'one', 'two' ). If the token is labelled as QTY, then the number will converted to a digit (i.e. 'one' -> 1) or collapsed into a range (i.e. 'one or two' -> 1-2), otherwise the token is left unchanged.

1.0.1

10 Aug 20:34
Compare
Choose a tag to compare

Warning

This version requires NLTK >=3.8.2

NLTK 3.8.2 changes the file format (from pickle to json) of the weights used by the part of speech tagger used in this project, to address some security concerns. This patch updates the NLTK resource checks performed when ingredient-parser is imported to check for the new json files, and downloads them if they are not present.

This version requires NLTK>=3.8.2.

1.0.0

17 Jun 15:44
Compare
Choose a tag to compare

1.0

General

  • Improve performance when tagging multiple sentences. For large numbers of sentences (>1000), the performance improvement is ~100x.

Processing

  • Extend support for composite amounts that have the form e.g. 1 cup plus 1 tablespoon or 1 cup minus 1 tablespoon. Previously the phrase plus/minus 1 tablespoon would be returned in the comment. Now the whole phrase is captured as a CompositeAmount object.
  • Fix cases where the incorrect pint.Unit would be returned, caused by pint interpreting the unit as something else e.g. "pinch" -> "pico-inch".

0.1.0-beta11

27 May 16:43
Compare
Choose a tag to compare
0.1.0-beta11 Pre-release
Pre-release

General

  • Refactor package structure to make it more suitable for expansion to over languages.

    Note: There aren't any plans to support other languages yet.

Model

  • Reduce duplication in training data
  • Introduce PURPOSE label for tokens that describe the purpose of the ingredient, such as for the dressing and for garnish.
  • Replace quantities with "!num" when determining the features for tokens so that the model doesn't need to learn all possible values quantities can take. This results in a small reduction in model size.

Processing

  • Various bug fixes to post-processing of tokens with labels NAME, COMMENT, PREP, PURPOSE, SIZE to correct punctuation and confidence calculations.
  • Modification of tokeniser to split full stops from the end of tokens. This helps to model avoid treating "token." and "token" as different cases to learn.
  • Add fallback functionality to parse_ingredient for cases where none of the tokens are labelled as NAME. This will select name as the token with the highest confidence of being labelled NAME, even though a different label has a high confidence for that token. This can be disabled by setting expect_name_in_output=False in parse_ingredient.