Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-???? | Modules in UPLC #946

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rjmh
Copy link

@rjmh rjmh commented Dec 10, 2024

Cardano scripts are limited in complexity by the fact that each script must be supplied in one transaction, whether the script is supplied in the same transaction in which it is used, or pre-loaded onto the chain for use as a reference script. This limits script code size, which in turn limits the use of libraries in scripts, and ultimately limits the sophistication of Cardano apps, compared to competing blockchains. It is the aspect of Cardano that script developers complain about most.

This CIP addresses this problem directly, by allowing reference inputs to supply 'modules', which can be used from other scripts (including other modules), thus allowing the code of a script to be spread across many reference inputs. The 'main specification' requires no changes to UPLC, PTLC, PIR or Plinth; only a 'dependency resolution' step before scripts are run. Many variations are described for better performance, including some requiring changes to the CEK machine itself.

Higher performance variations will be more expensive to implement; the final choice of variations should take implementation cost into account, and (in some cases) may require extensive benchmarking.


(latest revision rendered from branch)

@rphair rphair added the Category: Plutus Proposals belonging to the 'Plutus' category. label Dec 10, 2024
@rphair rphair changed the title Draft CIP on an extension to add modules to UPLC CIP-???? | Modules in UPLC Dec 10, 2024
@rphair
Copy link
Collaborator

rphair commented Dec 10, 2024

Thanks @rjmh - I'll change the review status to Draft (as formerly reflected in the title) and please let us know when you think it's ready for review and we can mark it Triage for introduction at the following CIP meeting & start tagging more Plutus representatives to go over it (@zliu41 @MicroProofs @michele-nuzzi you may be interested in an advance look).

@rphair rphair marked this pull request as draft December 10, 2024 15:55
@rjmh
Copy link
Author

rjmh commented Dec 10, 2024 via email

@zliu41
Copy link
Contributor

zliu41 commented Dec 10, 2024

Yes @rphair this is ready for review

@rphair rphair marked this pull request as ready for review December 10, 2024 17:10
@rphair rphair added the State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. label Dec 10, 2024
Comment on lines +1321 to +1330
The motivation for these fees is to deter DDoS attacks based on
supplying very large Plutus scripts that are costly to deserialize,
but run fast and so incur low execution unit fees. While these fees
are likely to be reasonable for moderate use of the module system, in
the longer term they could become prohibitive for more complex
applications. It may be necessary to revisit this design decision in
the future. To be successful, the DDoS defence just needs fees to
become *sufficiently* expensive per byte as the total size of
reference scripts grows; they do not need to grow without bound. So
there is scope for rethinking here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary to revisit this design decision in the future.

I don't think this can be left for "future work". I really think it should be updated if necessary when this CIP gets implemented. The reason for this is I don't think DApps should be treated as standalone applications. I think the following example perfectly exemplifies why:

Right now, all stablecoins are not fungible despite them all effectively being the the US dollar. You can't repay a loan in DJED using USDM. If DApps were composable, you could compose a DEX with the lending/borrowing DApp to convert the USDM to DJED in the same transaction where you make the loan payment. DApp composability makes stablecoins fungible!

This isn't possible on account style blockchains because each DApp is individually too expensive. On Cardano, you can compose 10 different DApps in the same transaction. I think this module approach would be huge, but only if it doesn't interfere with DApp composability. AFAIU that means lazy loading is 100% a requirement and users should be able to compose 4-5 DApps in a single transaction even with this module approach. Otherwise, this CIP could end up seriously handicapping the potential of Cardano's DeFi.

I was personally frustrated when I saw there was a hard-cap on the reference script size; if people want to pay up to fit more DApps into the transaction, let them! I'm fine with the cost being exponential after a certain point (ideally after 4-5 DApps in the transaction), but the hard limit doesn't make sense to me as long as the user pays for it. The adr linked to doesn't give any justification for the hard limit aside from "further increase the resilience". This CIP could easily exacerbate the issues with the reference script fee calculation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's going to be necessary. I just don't think it's a prerequisite... so modules should not be held up waiting for this. They'll be useful even without a change to reference script fees--just not as useful. I realise there are other factors to consider in fee-setting, but adding modules should raise the priority of fixing those fees considerably.

for use as a reference script. This limits script code size, which in
turn limits the use of libraries in scripts, and ultimately limits the
sophistication of Cardano apps, compared to competing blockchains. It
is the aspect of Cardano that script developers complain about most.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the aspect of Cardano that script developers complain about most.

Seems a bit arbitrary as a statement 😅 ... I have seldom heard people complaining about that. Rather, people complain about the script size which they often max out in their on-chain scripts without even bringing in dependencies.

See also:

Copy link
Author

@rjmh rjmh Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks--I took this from a meeting, but the claim seems to be exaggerated. I will weaken the language. Sounds like you agree that complaints about the script size limit are common though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree here. Prior to the introduction of reference scripts, complaints about size were common, now with the withdraw-zero trick / other forwarding logic scripts, and reference scripts, script size is not really an issue, in-fact most dApps happily accept increased script size for reduced ex-units (more aggressive inlining / manual recursion unrolling / lookup tables).

I do agree that regardless of whether or not script size restraints are still a pain point, modules are still valuable.

the others provide supporting code of one sort or another. Thus the
software engineering benefits of a module system are already
available; other languages compiled to UPLC could provide a module
system in a similar way. The *disadvantage* of this approach is that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus the software engineering benefits of a module system are already
available; other languages compiled to UPLC could provide a module
system in a similar way.

I don't think there's a single Plutus language framework today that doesn't support modules.

Although for all those languages, the concept of modules exists at compile-time only, whereas I believe this CIP is about bringing this concept at runtime to have dynamic resolution. Perhaps a parallel/analogy with statically linked vs dynamically linked dependencies is worth highlighting to make that clearer? Today, every module is very much statically bundled with scripts unless work is explicitly done to split them in separate validators.

(edit: now read the sections further down and I see that (1) this points is made indeed and (2) that the approach suggested in this CIP is still closer to a static linking done by the ledger prior to execution -- so, semi-dynamic 😅 ?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the choice of terminology can be a bit confusing and could be made more precise. The term "static/dynamic linking" is being used to refer to two different things:

  • You are saying: static linking = status quo where each script is a monolith, (semi-)dynamic linking = what this CIP proposes
  • whereas there's a subsection "Static vs Dynamic Linking" in the CIP, where static linking = a module specifies its dependency hashes, and dynamic linking = it doesn't specify them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it clear that many languages already support modules, not just Plutus/Haskell. But with the limitation that all the code ends up in one script, and so is subject to the script size limit.

lookupArg (ScriptArg hash) = do
script <- lookup hash preimages
go script
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This suggests that either the module resolution happens at compile time (which would void the benefits of having modules to begin with) or, actually done by the ledger itself executing scripts. So my understanding leans towards the later, which leads to the follow-up question: are you suggesting that the ledger becomes aware of scripts dependencies? And if so, by which means shall transaction communicate this intent to the ledger?

At the moment, scripts are fundamentally already parameterized by a single parameter (two or three in PlutusV1 & PlutusV2); A validator has a signature that's roughly Data -> Validator. So I don't find it completely unreasonable to ask the ledger to now also apply some dependencies to the scripts in addition to the datum/redeemer & script context. Though it's unclear at this point how to signal that and how is this being cost (will keep reading 👀).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting that the ledger becomes aware of scripts dependencies? And if so, by which means shall transaction communicate this intent to the ledger?

Yes. a serialised script is deserialised into either a complete script with no dependency, or a script plus a list of dependencies, and in the latter case the ledger will need to retrieve those dependencies and link them together to form a complete script.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I clarified that this happens during phase 2 verification, and that scripts on the chain are represented in this form, with dependencies just in the form of hashes.

Comment on lines +235 to +238
The goal of this variation is to eliminate the cost of evaluating
scripts, by converting them directly to values. Since UPLC runs on the
CEK machine, this means converting them directly into the `CekValue` type,
*without* any CEK machine execution. To make this possible, the syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that it doesn't eliminate the cost of evaluating scripts, but rather, it becomes someone's else problem 😄! That someone here being, the ledger/node indirectly which now has to do more (un-budgeted) work for free. I believe one of the fundamental design choice of Plutus was to have most of the decoding / conversion operations happen as part of the CEK evaluation so that they can be properly cost and paid for.

Otherwise, I'd argue that instead of providing Data arguments to scripts, we might as well provide pre-computed sum-of-products. But that means the cost of decoding the script context is now not paid for by execution units so has to be acknowledged through different means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to be clear, I am not against the idea! It seems like a reasonable ask to me, but I recall past conversations with the Plutus core team about it and why it is generally not deem as a viable option).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inexpensive operation that takes in the worst case linear time (and in some variants it is probably always constant time), so I think it's reasonable to consider it covered by the reference script fee, which is already an over-estimation of the script deserialization cost.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's linear time in the size of the top-level of scripts--one traversal over the code which need not descend inside values at the top level of a module. So reasonable to cover it from the reference script fee.

Comment on lines +338 to +339
transitions. The conversion can be done *once* for a whole
transaction, sharing the cost between several scripts if they share
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion can be done once for a whole transaction

That's a good point, and also strengthen the idea that more of these transformations would be better off happening in the ledger as pre-processing instead of directly within the CEK evaluation.

Although in that particular case, it probably depends on the redeemer value too. If we assume a partial resolution like what you mention in Lazy Loading, then the traversal could likely yield different applications for the same script based on which redeemer is being used. Though, for the same inputs, this is certainly a reasonable expectation. It's unclear to me whether there would many "cache hit" in practice.

Another important point that supports this thought is how developers end up often structuring their scripts by mutualizing similar chunks of logic under validator purposes that execute only once per transaction. So a typical structure we see on-chain are trivial spending validators that defer their validation to a single withdraw validator; then forcing a 0-Ada withdrawal on a registered stake credential. Since validators have access to the entire transaction script context, it's always possible to have a validator guarding the 0-Ada withdrawal to execute and validate each input in a single pass; rather than re-doing work for every single input.

See for details: https://github.com/Anastasia-Labs/design-patterns/blob/main/stake-validator/STAKE-VALIDATOR.md#stake-validator-design-pattern

Copy link
Contributor

@zliu41 zliu41 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different redeemers may indeed result in different modules being required to be present - but I don't think this poses any problem, does it?

Your second point I think is the same as the "Merkelized Validators" discussed in the related work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KtorZ

The design patterns repo has a separate readme specifically for the withdraw zero trick,

https://github.com/Anastasia-Labs/design-patterns/blob/main/stake-validator/STAKE-VALIDATOR-TRICK.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a section on "Merkelized validators" that discusses this; I have added links to the stake-validator trick directly to that section. I also made the discussion there a little more explicit: it's a great trick for sharing work between validators, which is useful with-or-without the modules discussed in this CIP--so it's not replaced by this CIP; but as a way of implementing modules it is intricate and unsatisfactory.

Re "cache hits", they will occur when different modules in the dependency tree depend in turn on the same module. So a module containing basic definitions for an application, and used in many parts of it, would fall into that category. So would a commonly-used library that many modules (in the same application) might depend on. I'm expecting to see quite a lot of this.

Where 'lazy loading' is concerned, note that it is the particular transaction that decides which dependencies to supply. Yes indeed, the dependencies needed will vary depending on the redeemer value. That's what we want to take advantage of--that in a particular transaction, we know what the redeemer value is, and so we can decide to omit modules that are not going to be needed. Dangling pointers ftw! (As long as they're not going to be used).

Comment on lines +414 to +421
using the SoP extension (CIP-85) as `constr 0 x1...xn`, but the only
way to select the `i`th component is using
```
case t of (constr 0 x1...xi...xn) -> xi
```
which takes time linear in the size of the tuple to execute, because
all `n` components need to be extracted from the tuple and passed to
the case branch (represented by a function).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such Tuples could also be represented as pairs of pairs and bring this cost down to log2(size) steps ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be logn case terms, cheaper in terms of execution units (at least for long tuples) but bigger in script size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logarithmic is better than linear, but it's also the cost of accessing variables in the environment (which is logarithmic in the size of the environment). So the advantage of putting the module exports into one tuple instead of bunging them all into the environment would disappear. Much better to bite the bullet and put in explicit projections, getting constant time access.

Comment on lines +158 to +161
Currently, the definition of “script” used by the ledger is (approximately):
```
newtype Script = Script ShortByteString
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning that we cannot actually publish arbitrary CEK Term as scripts but only UPLC Program (which are wrapped Term with versioning metadata).

The ledger enforces that all published scripts (in reference or witness) have this Program envelope. So it might be worth defining a new type of envelope for Modules. This would also allow to distinguish modules on-chain from actual validators scripts which may be handy shall we need to apply further restriction from the ledger regarding those (since as outlined below, it is incumbent upon the ledger to manage those dependencies and pre-process them on the behalf of validators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this CIP is proposing publishing CEk terms as scripts. As to distinguishing validators vs. modules, the Script data type defined in "Subvariation: Unboxed modules" allows for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the CEK values exist only during phase 2 validation; they are never stored on the chain. And as Ziyang says, the 'unboxed modules' subvariation does distinguish module scripts from validators, primarily because (in that variation) they are subject to different syntactic restrictions. So if the deserializer is going to check those, then it needs to know what kind of script it is deserializing.

the `Script` type accordingly
```
data Script = ValidatorScript CompiledCode [ScriptArg]
| ModuleScript CompiledCode [ScriptArg]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! This seems to echo my previous comment about making a distinction (which distinction shall prevail onto the serialisation to be any useful IMO).

Comment on lines +659 to +664
Currently each script on-chain is tagged with a specific ledger language version - V1, V2, V3 or native script - and this version tag is a component of the script hash.
A logical approach, therefore, is to continue doing so for module scripts, and require that a validator script and all modules it references must use the same ledger language version; failure to do so leads to a phase-1 error.

A different approach is to distinguish between validator scripts and module scripts by applying version tags only to validator scripts.
Module scripts are untagged and can be linked to any validator script.
This makes module scripts more reusable, which is advantageous because in most cases, a UPLC program has the same semantics regardless of the ledger language version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that the second approach is sound; because the version not only defines the interface to the validator, but also:

  • Which Plutus builtins are actually available
  • The semantic of some of those builtins
  • The costing functions of those builtins

For example, in Plutus V1/V2, cons_bytestring(256, bytes) is equivalent to cons_bytestring(0, bytes) (the runtime performs a free modulo 255), but in PlutusV3, it results in an out-of-bound error. That's the case for a few other builtins which have subtle semantic changes. (Technically, the semantic is bound to the Program version -- 1.0.0 vs 1.1.0 --, but this one is tightly coupled to the language version and I am taking a slight shortcut here).

So I'd argue that to keep everyone's life easier, enforcing the same "language version" across modules and validators is a fairly reasonable ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the point I made in the next paragraph. I think we'll most likely go with the first approach, i.e., tagged modules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that option too--allowing different language versions here would impose a constraint on all future language versions, which feels error-prone and uncomfortable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the semantic changes of builtin functions all documented in the changelog or anywhere?

Comment on lines +659 to +664
Currently each script on-chain is tagged with a specific ledger language version - V1, V2, V3 or native script - and this version tag is a component of the script hash.
A logical approach, therefore, is to continue doing so for module scripts, and require that a validator script and all modules it references must use the same ledger language version; failure to do so leads to a phase-1 error.

A different approach is to distinguish between validator scripts and module scripts by applying version tags only to validator scripts.
Module scripts are untagged and can be linked to any validator script.
This makes module scripts more reusable, which is advantageous because in most cases, a UPLC program has the same semantics regardless of the ledger language version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the semantic changes of builtin functions all documented in the changelog or anywhere?

Comment on lines +792 to +797
Note that, on Ethereum, a proxy contract can be updated without
changing its contract address---thanks to mutable state. On Cardano, a
script address *is* the hash of its code; of course, changing the code
will change the script address. It is very hard to see how that could
possibly be changed without a fundamental redesign of Cardano. So the
methods discussed below are different in nature from the Ethereum one:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact same thing is true on Cardano. You can easily create proxy contracts that can be updated without changing its contract address.

mkProxyContract :: ClosedTerm (PAsData PCurrencySymbol :--> PScriptContext :--> PUnit)
mkProxyContract  = plam $ \protocolParamsCS ctx -> P.do
  ctxF <- pletFields @'["txInfo", "redeemer", "scriptInfo"] ctx
  infoF <- pletFields @'["inputs", "referenceInputs", "outputs", "signatories", "wdrl"] ctxF.txInfo

  referenceInputs <- plet $ pfromData infoF.referenceInputs

  -- Extract protocol parameter UTxO
  ptraceInfo "Extracting protocol parameter UTxO"
  let paramUTxO =
        pfield @"resolved" #$
          pmustFind @PBuiltinList
            # plam (\txIn ->
                    let resolvedIn = pfield @"resolved" # txIn
                    in phasDataCS # protocolParamsCS # (pfield @"value" # resolvedIn)
                  )
            # referenceInputs

  POutputDatum ((pfield @"outputDatum" #) -> paramDat') <- pmatch $ pfield @"datum" # paramUTxO
  forwardToScriptHash <- plet $ punsafeCoerce @_ @_ @(PAsData PByteString) (pto paramDat')

  let invokedScripts =
        pmap @PBuiltinList
          # plam (\wdrlPair ->
                    let cred = pfstBuiltin # wdrlPair
                    in punsafeCoerce @_ @_ @(PAsData PByteString) $ phead #$ psndBuiltin #$ pasConstr # pforgetData cred
                )
          # pto (pfromData infoF.wdrl)
  pif (pelem # forwardToScriptHash # invokedScripts) (pconstant ()) perror 

The above script is a proxy contract which is parameterized by a state token (an NFT) which authenticates a UTxO that contains the script hash that this proxy forwards validation to (via the withdraw-zero trick). If that UTxO lives at a user's wallet, they can update the proxy contract by spending it back to the same address and changing the datum to be a different script hash. If the UTxO lives at a script, then the script logic will validate any update.

That being said, I would caution that this section on upgradability should be removed altogether.
DApp upgradability is already a security nightmare, it’s very hard to support it without completely sacrificing decentralization. You need to use an onchain governance protocol, like Agora, except these protocols are very experimental on Cardano, so much so that even the creators of Agora do not use it for governance of their protocol.

I think the advice in the CIP regarding how upgradability can be achieved is quite dangerous given how many exploits “upgrade keys” being compromised has led to in Ethereum / Solana, and generally out of scope of this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Plutus Proposals belonging to the 'Plutus' category. State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants