Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider simplifying language? #13

Open
NeilGirdhar opened this issue May 24, 2024 · 0 comments
Open

Consider simplifying language? #13

NeilGirdhar opened this issue May 24, 2024 · 0 comments

Comments

@NeilGirdhar
Copy link

NeilGirdhar commented May 24, 2024

Cool project!

I think there are some simplifications that would lead to a nicer language.

Consider recasting the main example as:

from humre import optional, one_or_more, exactly, either, between, DIGIT
leading_sign = either('+', '-')
number_with_commas = (between(1, 3, DIGIT), one_or_more((',', exactly(3, DIGIT))))
number_without_commas = either(number_with_commas, one_or_more(DIGIT))
whole_number = either(number_with_commas, number_without_commas)
fractional_number = ('.', one_or_more(DIGIT))
compile((optional(leading_sign), whole_number, optional(fractional_number)))

rather than

from humre import *
compile(
    # optional negative or positive sign:
    optional(noncap_group(either(PLUS_SIGN, '-'))),
    # whole number section:
    noncap_group(either(
        # number with commas:
        noncap_group(between(1, 3, DIGIT), one_or_more(noncap_group(',', exactly(3, DIGIT)))),
        # number without commas:
        one_or_more(DIGIT)
    )),
    # fractional number section (optional)
    optional(noncap_group(PERIOD, one_or_more(DIGIT)))
    )

The main change I propose is to broaden the general type of a component from str to some abstract component type Part, and then to process these into strings in humre.compile.

This allows removing the noncap_group markers, and replacing PLUS_SIGN and PERIOD back to regular literals. That way the writer can forget how regular expressions are written!

Also, I think it is a lot more "human readable" to use variable names rather than nested comments.

What do you think?


In general, I suggest redoing the group functions to be just one function:

from __future__ import annotations
from typing import TypeAlias


class Part:
    def __init__(self, x: Part | str):
        self.escaped = x.escaped if isinstance(x, Part) else escape(x)

PartLike: TypeAlias = Part | str | tuple['PartLike', ...]

def group(*xs: PartLike,
          minimum: None | int = None,
          maximum: None | int = None,
          capture: bool = False
          ) -> Part:
    if len(xs) == 0:
        return Part('')
    if len(xs) == 1 and minimum is maximum is None and not capture:
        x = xs[0]
        if isinstance(x, tuple):
            return group(*x)
        return Part(x)
    parts = (group(x) for x in xs)
    inner = ''.join(part.escaped for part in parts)
    # produce group from the part strings, taking into account flags.
    return Part('(' + inner + ')')

etc.

@NeilGirdhar NeilGirdhar changed the title Simplify example? Consider simplifying language? May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant