Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field operations for small prime moduli #1982

Open
atrieu opened this issue Nov 5, 2024 · 4 comments
Open

Field operations for small prime moduli #1982

atrieu opened this issue Nov 5, 2024 · 4 comments

Comments

@atrieu
Copy link
Contributor

atrieu commented Nov 5, 2024

I'm trying to use fiat-crypto to synthesize (bedrock2) C code for NTT operations as used in some PQ lattice-based cryptography, where the prime modulus fits in one word.
How difficult would it be to output "call-by-value" field operations, i.e., without the load/store opening/ending in functions ?
I tried looking into how to modify the pipeline, though I have a bit of a hard time understanding, so I would appreciate some help to get started.

@JasonGross
Copy link
Collaborator

The relevant entry-point for the bedrock2 pipeline is

let out := translate_func
e innames inlengths insizes outnames outsizes in
let f : string*func := (("internal_" ++ name)%string, fst out) in
=> inl ((header ++ ["static"] ++ bedrock_func_to_lines f)
++ ["/* NOTE: The following wrapper function is not covered by Coq proofs */"
; (((if inline then "__inline__ " else "") ++ (if static then "static " else "") ++ "void " ++ name ++ "(")
++ (String.concat ", " (C.to_retarg_list prefix outdata ++ C.to_arg_list_for_each_lhs_of_arrow prefix indata))
++ ") {")
; " " ++ (wrap_call indata outdata f insizes outsizes) ++ ";"
; "}"
; ""
]%string%list,
. As far as making translate_func avoid load/store when we only have one limb, I think @jadephilipoom or @andres-erbsen is better-placed to answer about the difficulty, since I haven't touched the bedrock2 code much.
There seems to be some documentation in https://github.com/mit-plv/fiat-crypto/blob/master/src/Bedrock/Field/README.md. translate_func itself is implemented at
(* Translates functions in 3 steps:
1) load any lists from the arguments
2) call translate_cmd to translate the function body and get the
return values
3) store the return values in the designated variables (with
lists being written into memory)
The reason for the load/translate/store pattern is so that, for
the inductive proof of translate_cmd, there is no need to reason
about the memory. Since fiat-crypto doesn't do any list
manipulations in the middle of functions, but only uses lists in
arguments/return values, it's a convenient formalization. *)
Definition translate_func {t}
(e : API.Expr t)
(* argument variables *)
(argnames : type.for_each_lhs_of_arrow ltype t)
(* lengths of argument lists *)
(lengths : type.for_each_lhs_of_arrow list_lengths t)
(* integer sizes of argument lists *)
(argsizes : type.for_each_lhs_of_arrow access_sizes t)
(* return variables *)
(rets : base_ltype (type.final_codomain t))
(* integer sizes of return lists *)
(retsizes : base_access_sizes (type.final_codomain t))
: list string * list string * cmd (* bedrock function *)
* list_lengths (type.base (type.final_codomain t)) (* output list lengths *) :=
(* load arguments *)
let load_args_out := load_arguments 0%nat argnames lengths argsizes in
let nextn := fst (fst load_args_out) in
let args := snd (fst load_args_out) in
let load_args_cmd := snd load_args_out in
(* translate *)
let out := translate_func' (e _) nextn args in
(* store return values *)
let store_rets := store_return_values (snd (fst out)) rets retsizes in
(* make new arguments for pointers to returned lists *)
let part := extract_listnames rets in
let out_ptrs := flatten_listonly_base_ltype (fst part) in
let innames := out_ptrs ++ flatten_argnames argnames in
let outnames := flatten_listexcl_base_ltype (snd part) in
(* assemble executable body: load arguments, function body, store rets *)
let body := cmd.seq (cmd.seq load_args_cmd (snd out)) (snd store_rets) in
(* assemble func (arg varnames, return varnames, executable body),
also add in output list lengths *)
(innames, outnames, body, fst store_rets).

Adjusting the wrapper code shouldn't be too hard; it's at

Definition wrap_call
{width BW word mem locals ext_spec varname_gen error}
`{parameters_sentinel : @parameters width BW word mem locals ext_spec varname_gen error}
{t}
(indata : type.for_each_lhs_of_arrow var_data t)
(outdata : base_var_data (type.final_codomain t))
(f : string*func)
(insizes : type.for_each_lhs_of_arrow access_sizes t)
(outsizes : base_access_sizes (type.final_codomain t))
: string
:= let part := extract_list_var_data outdata in
let out_ptrs := flatten_listonly_base_var_data (fst part) in
let innames := out_ptrs ++ flatten_argnames_of_var_data indata in
let outnames := flatten_listexcl_base_var_data (snd part) in
let '(name, (args, rets, _body)) := f in
(* slightly modified from https://github.com/mit-plv/bedrock2/blob/13365e8113131601a60dd6b6ffddc5a0b92aed58/bedrock2/src/bedrock2/ToCString.v#L151-L154 *)
let '(precall, all_args)
:= match rets, outnames with
| nil, _ | _, nil => (name, outnames ++ innames)
| cons _ _, cons r0 _
=> let r0 := List.last outnames r0 in
let outnames' := List.removelast outnames in
(("*" ++ print_name r0 ++ " = " ++ print_cast r0 ++ name)%string, outnames' ++ innames)
end%list in
((precall ++ "(")
++ (String.concat
", "
(List.map
(** TODO: Is there a better way to do this, e.g., pulling from insizes? *)
(fun n => "(uintptr_t)" ++ print_name n)
all_args))
++ ")")%string.

If you want to play with this all using tactics like cbv and see it interactively, there's some debugging examples in https://github.com/mit-plv/fiat-crypto/blob/master/src/StandaloneDebuggingExamples.v, https://github.com/mit-plv/fiat-crypto/blob/master/src/CompilersTestCases.v, https://github.com/mit-plv/fiat-crypto/blob/master/src/SlowPrimeSynthesisExamples.v

@atrieu
Copy link
Contributor Author

atrieu commented Nov 7, 2024

Thank you for the pointers!
I'm not sure I understand whether there is a difference in using the Stringification stuff, vs using the C pretty printer from bedrock?

I was just planning to use the pretty printer, since I will be using Rupicola to synthesize the NTT code after I manage to teach it how to compile arrays of field elements.

To make sure I understand the translation pipeline:

  • The files in PushButtonSynthesis/ transform the Gallina code in Arithmetic/ into some API.Expr t (this kinda looks like magic to me for now)
  • The API.Expr t stuff is later consumed by translate_func to produce a bedrock func.

So I would need to modify/copy translate_func so that it avoids the load/stores when the number of limbs is 1, which I tried to do by removing some fluff around the call to translate_func', but this is harder to make Coq typecheck than I expected :/

@jadephilipoom
Copy link
Collaborator

The existing translate_func should work the way you want if you present the field element as an integer instead of a list; it assumes that lists are "arrays in memory" and that integers are "registers". If I were you, I'd try to approach it by adjusting Arithmetic/ or maybe writing a pass to transform the API.Expr t expression itself. The former is probably easier, since it sounds like you have some new Gallina definitions anyway -- if you make these operate on Z instead of list Z, everything should work as expected.

@JasonGross
Copy link
Collaborator

I think you want to add a reification of encode (?) (the inverse of eval, I believe) a la

Derive reified_one_gen
SuchThat (is_reification_of reified_one_gen onemod)
As reified_one_gen_correct.
Proof. Time cache_reify (). Time Qed.
#[global]
Hint Extern 1 (_ = _) => apply_cached_reification onemod (proj1 reified_one_gen_correct) : reify_cache_gen.
#[global]
Hint Immediate (proj2 reified_one_gen_correct) : wf_gen_cache.
#[global]
Hint Rewrite (proj1 reified_one_gen_correct) : interp_gen_cache.
Local Opaque reified_one_gen. (* needed for making [autorewrite] not take a very long time *)
or
Derive reified_eval_gen
SuchThat (is_reification_of reified_eval_gen evalmod)
As reified_eval_gen_correct.
Proof. Time cache_reify (). Time Qed.
#[global]
Hint Extern 1 (_ = _) => apply_cached_reification evalmod (proj1 reified_eval_gen_correct) : reify_cache_gen.
#[global]
Hint Immediate (proj2 reified_eval_gen_correct) : wf_gen_cache.
#[global]
Hint Rewrite (proj1 reified_eval_gen_correct) : interp_gen_cache.
Local Opaque reified_eval_gen. (* needed for making [autorewrite] not take a very long time *)
cf also
Derive reified_encode_gen
SuchThat (is_reification_of reified_encode_gen encodemod)
As reified_encode_gen_correct.
Proof.
Time cache_reify ().
(* we would do something faster, but it breaks extraction COQBUG(https://github.com/coq/coq/issues/7954) *)

Then we want versions of all the operations that compose with encode on the input side and eval on the output side (possibly we want to figure out how to work around coq/coq#7954 so the reification cache isn't two times slower; probably we can do this just by adding reduction at

instantiate (1:=ltac:(let r := Reify rv in
refine (r @ reified_op_gen)%Expr))
), and we can emit these operations in, e.g., https://github.com/mit-plv/fiat-crypto/blob/master/src/PushButtonSynthesis/WordByWordMontgomery.v in cases where the number of limbs is 1 (possibly gated by some top-level flag). The bedrock2 code can presumably do something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants