Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds HAETAE #313

Merged
merged 11 commits into from
Jan 7, 2024
Merged

Adds HAETAE #313

merged 11 commits into from
Jan 7, 2024

Conversation

mmoeller23
Copy link
Contributor

This commit implements the post-quantum signature scheme HAETAE from

The stack strategy can be selected in config.h by setting STACK_STRATEGY
to the appropriate value (run "make clean" after the change).

  • 0 or undefined: Optimized for speed (default).
  • 1: Disable buffers for the polynomials of the verification key in
    crypto_sign_keypair() and crypto_sign(). This reduces speed,
    as the key needs to be recomputed after each rejection.
  • 2: In addition to 1, sample the hyperball in multiple passes, such
    that some intermediate values are computed on demand, rather
    than being buffered. This roughly doubles the runtime of crypto_sign().

M4F version corresponds to reference version of 2023-10-21.
…ministically

* Move challenge seed generation from crypto_sign() to poly_challenge().

* Sample the random byte b deterministically inside of
  polyfixveclk_sample_hyperball(). It is used to:
  * determine the sign in hyperball sampling (bit mask 0x01)
  * reject with 50% odds in the overlap region (bit mask 0x02)

* M4F version corresponds to reference version of 2023-11-20.
This implementation offers different stack strategies:
* 0: Optimized for speed.
* 1: Does not buffer the polynomials of the verification
     key in crypto_sign_keypair() and crypto_sign_signature(),
     thus reducing stack usage at the cost of some speed.
* 2: In addition to 1, the hyperballs are sampled in multiple
     passes in crypto_sign_signature(), which reduces the stack
     usage for temporary variables. This roughly doubles the
     execution time of crypto_sign_signature().
The clean implementation is only minimally changed from the
reference implementation to conform with the PQM4 API.

The clean implementation would run out of memory for HAETAE3
and HAETAE5 and is therefore not added for those modes.
This commit implements the post-quantum signature scheme HAETAE from
https://eprint.iacr.org/2023/624
https://kpqc.cryptolab.co.kr/haetae

The stack strategy can be chosen config.h by setting STACK_STRATEGY
to the appropriate value (run "make clean" when changing it).
* 0 or undefined: Optimized for speed (default).
* 1:              Disable buffers for the polynomials of the verification
                  key in crypto_sign_keypair() and crypto_sign(). This
                  reduces speed, as the key needs to be recomputed after
                  each rejection.
* 2:              In addition to 1, sample the hyperball in multiple passes,
                  such that some intermediate values are computed on demand,
                  rather than being buffered. This roughly doubles the
                  runtime of crypto_sign().

The scheme HAETAE2 contains a reference implementation, which has been
renamed from "clean" in previous commits to "ref". The reference
implementation would run out of memory for schemes HAETAE3 and HAETAE5 and
is therefore not included for these schemes.
This commit implements the post-quantum signature scheme HAETAE from
https://eprint.iacr.org/2023/624
https://kpqc.cryptolab.co.kr/haetae

The stack strategy can be selected in config.h by setting STACK_STRATEGY
to the appropriate value (run "make clean" after the change).
* 0 or undefined: Optimized for speed (default).
* 1:              Disable buffers for the polynomials of the verification
                  key in crypto_sign_keypair() and crypto_sign(). This
                  reduces speed, as the key needs to be recomputed after
                  each rejection.
* 2:              In addition to 1, sample the hyperball in multiple passes,
                  such that some intermediate values are computed on demand,
                  rather than being buffered. This roughly doubles the
                  runtime of crypto_sign().
@rpls
Copy link
Contributor

rpls commented Nov 23, 2023

Thanks! I'll test it this week. Btw., any chance you also have a suitable pure C implementation for mupq?

@rpls
Copy link
Contributor

rpls commented Nov 24, 2023

Tests pass, but for Testvector-test we need a pure C implementation in mupq.

@mmoeller23
Copy link
Contributor Author

Tests pass, but for Testvector-test we need a pure C implementation in mupq.

The reference implementation is included in the immediately preceding commit b48968e in the HAETAE2 directory. I did not include it, as the reference implementation only works for HAETAE2 on the embedded system; for HAETAE3 and HAETAE5 it runs fine on the host, but runs out of memory on the embedded system.

In that commit, haetae2 works fine with the Testvector-test. If you copy the ref subdirectory to the haetae3 and haetae5 directories, respectively, and adjust HAETAE_MODE in the config.h files the host implementation will produce the proper testvectors. However, when running on the testboard the reference implementation will run out of memory and not return in these modes.

How do we proceed from here? Does the constallation outlined above work for you, or do we need to have pure C-implementations for all modes, which are able to run on the limited resources of the embedded system? In the latter case, the code will have to deviate substantially from the reference implementation and will be closer to the M4F version.

@markuskrausz
Copy link

Tests pass, but for Testvector-test we need a pure C implementation in mupq.

The reference implementation is included in the immediately preceding commit b48968e in the HAETAE2 directory. I did not include it, as the reference implementation only works for HAETAE2 on the embedded system; for HAETAE3 and HAETAE5 it runs fine on the host, but runs out of memory on the embedded system.

In that commit, haetae2 works fine with the Testvector-test. If you copy the ref subdirectory to the haetae3 and haetae5 directories, respectively, and adjust HAETAE_MODE in the config.h files the host implementation will produce the proper testvectors. However, when running on the testboard the reference implementation will run out of memory and not return in these modes.

How do we proceed from here? Does the constallation outlined above work for you, or do we need to have pure C-implementations for all modes, which are able to run on the limited resources of the embedded system? In the latter case, the code will have to deviate substantially from the reference implementation and will be closer to the M4F version.

The memory was only a limitation on the stm32f4discovery for HAETAE3 and 5 right?
With the nucleo-l4r5zi and its 640KB of RAM, this should not be an issue.

The Testvector-test probably runs on the host anyway?!

stack usage (keypair/sign/verify):
* haetae2: 26152 / 83128 / 29856
Add slightly modified reference implementations to haetae2,
Add slightly modified reference implementations to haetae2,
haetae3 and haetae5 with lower stack memory footprint than
the original reference implementation. This enables the
test vector comparison for all schemes.

CAVEAT: This commit modifies the following PQM4 core files
* ldscripts/stm32f4discovery.ld
* ldscripts/stm32f4discovery_fullram.ld
* mk/stm32f4discovery.mk
The two load scripts are modified as recommended in
[issue 310](mupq#310 (comment)).
The make file is modified to use full ram for the implementations
m4f and ref of scheme haetae5, as they would run out
of memory otherwise, similar to dilithium5.

The stack memory footprint was reduced by:
* Storing A1 using uint16 instead of int32, halving
  its footprint
* Grouping some vectors inside `crypto_sign_signature()`, whose
  periods of liveliness do not overlap, into unions.

The modification is light enough to easily verify consistency
with the reference implementation.
Add slightly modified reference implementations to haetae2,
haetae3 and haetae5, labeled as `ref`, with lower stack
memory footprint than the original reference implementation.
This enables running testvectors.py for all schemes.

CAVEAT: This commit modifies the following PQM4 core files
* ldscripts/stm32f4discovery.ld
* ldscripts/stm32f4discovery_fullram.ld
* mk/stm32f4discovery.mk
The two load scripts are modified as recommended in
[issue 310](mupq#310 (comment)).
The make file is modified to use full ram for the implementations
m4f and ref of scheme haetae5, as they would run out
of memory otherwise, similar to dilithium5.

The stack memory footprint was reduced by:
* Storing A1 using uint16 instead of int32, halving
  its footprint
* Grouping some vectors inside `crypto_sign_signature()`, whose
  periods of liveliness do not overlap, into unions.

The modification is light enough to easily verify consistency
with the reference implementation.
@mmoeller23
Copy link
Contributor Author

I have added slightly modified reference implementations to all schemes, testvectors.py works now.

CAVEAT: Commit f7aedf0 includes modifications to PQM4 core files that are required to make this work.

  1. Applied the patch from issue 310 to
    • ldscripts/stm32f4discovery.ld
    • ldscripts/stm32f4discovery_fullram.ld
  2. Patched mk/stm32f4discovery.mk for haetae5 to use the full RAM model for both implementations, just like dilithium5.

@rpls
Copy link
Contributor

rpls commented Dec 1, 2023

Could you add the ref implementations in mupq instead? All portable pure C stuff should go there.

The pure C reference implementations were removed from this
pull request. A corresponding pull request in MUPQ/MUPQ
has been initiated:
mupq/mupq#131
@mmoeller23
Copy link
Contributor Author

I have removed the pure C reference implementation from this pull request and initiated a new pull request at mupq

mupq/mupq#131

@mmoeller23
Copy link
Contributor Author

Please do not pull this at the moment

@mmoeller23
Copy link
Contributor Author

Please do not pull this at the moment

All good, you can pull again.

@markuskrausz
Copy link

This implements the HAETAE specification 2.0 and corresponds to the M4 implementation discussed in the 2.0 specification document.

@rpls rpls merged commit 4ad3ef6 into mupq:master Jan 7, 2024
@rpls rpls mentioned this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants