Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(frontend): adding a use-case for Levenshtein distance #902

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

bcm-at-zama
Copy link
Contributor

Adding a use-case for Levenshtein distance

closes #https://github.com/zama-ai/concrete-internal/issues/750

@cla-bot cla-bot bot added the cla-signed label Jun 19, 2024
@bcm-at-zama bcm-at-zama marked this pull request as draft June 19, 2024 12:41
@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from f61fa55 to 122de8c Compare June 19, 2024 13:49
@bcm-at-zama
Copy link
Contributor Author

Works quite well but fail from time to time:

Computations in FHE

    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings 'b' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'a' and 'a' - OK in 10.76 seconds
    Computing Levenshtein between strings 'a' and 'b' - OK in 11.03 seconds
    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'tv' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'gag' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'wywd' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'oezbr' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'ctuugd' - OK in 0.01 seconds
    Computing Levenshtein between strings 'n' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'o' and 'g' - OK in 11.78 seconds
    Computing Levenshtein between strings 'p' and 'sl' - OK in 21.41 seconds
    Computing Levenshtein between strings 'r' and 'qbd' - OK in 30.90 seconds
    Computing Levenshtein between strings 't' and 'vbej' - OK in 43.54 seconds
    Computing Levenshtein between strings 'b' and 'srvxs' - OK in 52.11 seconds

    Computing Levenshtein between strings 'n' and 'fkuftz' - OK in 63.67 seconds
    Computing Levenshtein between strings 'ey' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'kd' and 'f' - OK in 20.81 seconds
    Computing Levenshtein between strings 'fv' and 'xh' - OK in 60.51 seconds
    Computing Levenshtein between strings 'mv' and 'dnr' - OK in 120.93 seconds
    Computing Levenshtein between strings 'db' and 'msvl' - OK in 199.30 seconds

    Computing Levenshtein between strings 'hn' and 'whoql'Traceback (most recent call last):
  File "frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py", line 189, in <module>
    assert l1_fhe == l1_clear, f"    {l1_fhe=} and {l1_clear=} are different"
AssertionError:     l1_fhe=5 and l1_clear=4 are different

I have p_error=10**-8 which is already quite low

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 2 times, most recently from c9b3afb to d6720e5 Compare June 20, 2024 09:04
@bcm-at-zama
Copy link
Contributor Author

@bcm-at-zama
Copy link
Contributor Author

Hey @umut-sahin , would you have a look, to see if we can make it faster, please?

@bcm-at-zama bcm-at-zama requested a review from umut-sahin June 24, 2024 08:39
@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from 0ee22ef to e2dceee Compare June 28, 2024 16:39
@bcm-at-zama
Copy link
Contributor Author

Now I have added a system for alphabet: eg, one can do

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autotest --alphabet ACTG --show_mlir

and you'll see it generates random string of A, C, T and G.

@bcm-at-zama
Copy link
Contributor Author

It's very easy to add a new alphabet

@bcm-at-zama
Copy link
Contributor Author

And there is a new option, to make perks on all alphabets:

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autoperf

Typical performances for alphabet ACTG, with string of maximal length:

    Computing Levenshtein between strings 'AACG' and 'GATT' - OK in 5.24 seconds
    Computing Levenshtein between strings 'ACCG' and 'GTGG' - OK in 4.71 seconds
    Computing Levenshtein between strings 'TTTC' and 'TTTA' - OK in 4.87 seconds

Typical performances for alphabet string, with string of maximal length:

    Computing Levenshtein between strings 'skon' and 'iisi' - OK in 14.67 seconds
    Computing Levenshtein between strings 'qukm' and 'vufu' - OK in 15.01 seconds
    Computing Levenshtein between strings 'afoe' and 'kbwh' - OK in 14.62 seconds

Typical performances for alphabet STRING, with string of maximal length:

    Computing Levenshtein between strings 'RPJX' and 'MLZU' - OK in 14.33 seconds
    Computing Levenshtein between strings 'GYAQ' and 'IEWC' - OK in 13.73 seconds
    Computing Levenshtein between strings 'GEUC' and 'CLUI' - OK in 15.11 seconds

Typical performances for alphabet StRiNg, with string of maximal length:

    Computing Levenshtein between strings 'oXcM' and 'Igjh' - OK in 30.11 seconds
    Computing Levenshtein between strings 'pgBk' and 'GuOp' - OK in 28.20 seconds
    Computing Levenshtein between strings 'jScn' and 'yRRN' - OK in 30.81 seconds

Successful end

@bcm-at-zama bcm-at-zama changed the title docs(frontend): adding a use-case for Levenshtein distance [WIP] docs(frontend): adding a use-case for Levenshtein distance Jun 28, 2024
@bcm-at-zama bcm-at-zama requested review from umut-sahin and aPere3 June 28, 2024 17:07
@bcm-at-zama
Copy link
Contributor Author

What do you think guys? When you like the code, I add a small .md and measure that on AWS

@bcm-at-zama bcm-at-zama marked this pull request as ready for review June 28, 2024 17:07
@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from 63c858e to 1247eda Compare July 1, 2024 10:47
@bcm-at-zama
Copy link
Contributor Author

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from 1247eda to 2771fed Compare July 1, 2024 11:01
@bcm-at-zama
Copy link
Contributor Author

No more blocked by 794

Copy link
Member

@BourgerieQuentin BourgerieQuentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment feel free to include or not, thanks.

@BourgerieQuentin
Copy link
Member

@bcm-at-zama I updated the code with a constant function fell free to keep it or not, still bad as we need the "encrypt function" on server side, but is more a lack of the API that we need the client to export clear value

@bcm-at-zama
Copy link
Contributor Author

Thx for 380a523 @BourgerieQuentin . I'll reapply it by hand sorry, because too much big conflicts with my coming Alphabet things.

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from 380a523 to 88e579e Compare July 5, 2024 15:48
Copy link
Contributor Author

@bcm-at-zama bcm-at-zama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an Alphabet class as requested, even if I am not a big fan. We may let some functions outside of the class, tell me what you want @umut-sahin

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from c0198fe to ac87b6e Compare July 5, 2024 16:11
@bcm-at-zama
Copy link
Contributor Author

@bcm-at-zama bcm-at-zama requested a review from umut-sahin July 5, 2024 16:27
@bcm-at-zama
Copy link
Contributor Author

@umut-sahin : hopefully it's the final review!

@bcm-at-zama bcm-at-zama requested a review from umut-sahin July 9, 2024 16:00
Copy link
Contributor

@umut-sahin umut-sahin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things, but it's good to merge otherwise!

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from ca6e860 to 817341f Compare July 10, 2024 11:59
@bcm-at-zama
Copy link
Contributor Author

Rebasing

@bcm-at-zama
Copy link
Contributor Author

@umut-sahin , perf have been done on hpc, could you tell me it's fine for you? Then I squash and merge!

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch from 43350ce to 9682493 Compare July 17, 2024 09:32
@bcm-at-zama bcm-at-zama merged commit 92ee970 into main Jul 17, 2024
30 of 33 checks passed
@bcm-at-zama bcm-at-zama deleted the levenshtein_distance_750 branch July 17, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants