diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 35a022c..f7094a1 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.0","generation_timestamp":"2024-10-12T19:40:43","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.0","generation_timestamp":"2024-10-14T19:34:58","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index fb7bf5a..04c38ae 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,5 +1,6 @@ -API · TaylorDiff.jl

API

API for TaylorDiff.

TaylorDiff.TaylorArrayType
TaylorArray{T, N, A, P}

Representation of Taylor polynomials in array mode.

Fields

  • value::A: zeroth order coefficient
  • partials::NTuple{P, A}: i-th element of this stores the i-th derivative
source
TaylorDiff.TaylorScalarType
TaylorScalar{T, P}

Representation of Taylor polynomials.

Fields

  • value::T: zeroth order coefficient
  • partials::NTuple{P, T}: i-th element of this stores the i-th derivative
source
TaylorDiff.TaylorScalarMethod
TaylorScalar{P}(value::T, seed::T)

Convenience function: construct a Taylor polynomial with zeroth and first order coefficient, acting as a seed.

source
TaylorDiff.TaylorScalarMethod
TaylorScalar{P}(value::T) where {T, P}

Convenience function: construct a Taylor polynomial with zeroth order coefficient.

source
TaylorDiff.can_taylorizeMethod
TaylorDiff.can_taylorize(V::Type)

Determines whether the type V is allowed as the scalar type in a Dual. By default, only <:Real types are allowed.

source
TaylorDiff.derivativeFunction
derivative(f, x, l, ::Val{P})
-derivative(f!, y, x, l, ::Val{P})

Computes P-th directional derivative of f w.r.t. vector x in direction l.

source
TaylorDiff.derivative!Function
derivative!(result, f, x, l, ::Val{P})
-derivative!(result, f!, y, x, l, ::Val{P})

In-place derivative calculation APIs. result is expected to be pre-allocated and have the same shape as y.

source
TaylorDiff.derivativesFunction
derivatives(f, x, l, ::Val{P})
-derivatives(f!, y, x, l, ::Val{P})

Computes all derivatives of f at x up to order P.

source
TaylorDiff.get_term_raiserMethod

Pick a strategy for raising the derivative of a function. If the derivative is like 1 over something, raise with the division rule; otherwise, raise with the multiplication rule.

source
+API · TaylorDiff.jl

API

API for TaylorDiff.

TaylorDiff.TaylorArrayType
TaylorArray{T, N, A, P}

Representation of Taylor polynomials in array mode.

Fields

  • value::A: zeroth order coefficient
  • partials::NTuple{P, A}: i-th element of this stores the i-th derivative
source
TaylorDiff.TaylorScalarType
TaylorScalar{T, P}

Representation of Taylor polynomials.

Fields

  • value::T: zeroth order coefficient
  • partials::NTuple{P, T}: i-th element of this stores the i-th derivative
source
TaylorDiff.TaylorScalarMethod
TaylorScalar{P}(value::T, seed::T)

Convenience function: construct a Taylor polynomial with zeroth and first order coefficient, acting as a seed.

source
TaylorDiff.TaylorScalarMethod
TaylorScalar{P}(value::T) where {T, P}

Convenience function: construct a Taylor polynomial with zeroth order coefficient.

source
TaylorDiff.can_taylorizeMethod
TaylorDiff.can_taylorize(V::Type)

Determines whether the type V is allowed as the scalar type in a Dual. By default, only <:Real types are allowed.

source
TaylorDiff.derivativeFunction
derivative(f, x, ::Val{P})
+derivative(f, x, l, ::Val{P})
+derivative(f!, y, x, l, ::Val{P})

Computes P-th directional derivative of f w.r.t. vector x in direction l. If x is a Number, the direction l can be omitted.

source
TaylorDiff.derivative!Function
derivative!(result, f, x, l, ::Val{P})
+derivative!(result, f!, y, x, l, ::Val{P})

In-place derivative calculation APIs. result is expected to be pre-allocated and have the same shape as y.

source
TaylorDiff.derivativesFunction
derivatives(f, x, l, ::Val{P})
+derivatives(f!, y, x, l, ::Val{P})

Computes all derivatives of f at x up to order P.

source
TaylorDiff.get_term_raiserMethod

Pick a strategy for raising the derivative of a function. If the derivative is like 1 over something, raise with the division rule; otherwise, raise with the multiplication rule.

source
TaylorDiff.@immutableMacro
immutable(def)

Transform a function definition to a @generated function.

  1. Allocations are removed by replacing the output with scalar variables;
  2. Loops are unrolled;
  3. Indices are modified to use 1-based indexing;
source
diff --git a/dev/index.html b/dev/index.html index 8aee8f0..8c18b7e 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,5 +1,5 @@ -Home · TaylorDiff.jl

TaylorDiff.jl

TaylorDiff.jl is an automatic differentiation (AD) package for efficient and composable higher-order derivatives, implemented with operator-overloading on Taylor polynomials.

Disclaimer: this project is still in early alpha stage, and APIs can change any time in the future. Discussions and potential use cases are extremely welcome!

Features

TaylorDiff.jl is designed with the following goals in head:

  • Linear scaling with the order of differentiation (while naively composing first-order differentiation would result in exponential scaling)
  • Same performance with ForwardDiff.jl on first order and second order, so there is no penalty in drop-in replacement
  • Capable for calculating exact derivatives in physical models with ODEs and PDEs
  • Composable with other AD systems like Zygote.jl, so that the above models evaluated with TaylorDiff can be further optimized with gradient-based optimization techniques

TaylorDiff.jl is fast! See our dedicated benchmarks page for comparison with other packages in various tasks.

Installation

] add TaylorDiff

Usage

using TaylorDiff
+Home · TaylorDiff.jl

TaylorDiff.jl

TaylorDiff.jl is an automatic differentiation (AD) package for efficient and composable higher-order derivatives, implemented with operator-overloading on Taylor polynomials.

Disclaimer: this project is still in early alpha stage, and APIs can change any time in the future. Discussions and potential use cases are extremely welcome!

Features

TaylorDiff.jl is designed with the following goals in head:

  • Linear scaling with the order of differentiation (while naively composing first-order differentiation would result in exponential scaling)
  • Same performance with ForwardDiff.jl on first order and second order, so there is no penalty in drop-in replacement
  • Capable for calculating exact derivatives in physical models with ODEs and PDEs
  • Composable with other AD systems like Zygote.jl, so that the above models evaluated with TaylorDiff can be further optimized with gradient-based optimization techniques

TaylorDiff.jl is fast! See our dedicated benchmarks page for comparison with other packages in various tasks.

Installation

] add TaylorDiff

Usage

using TaylorDiff
 
 x = 0.1
 derivative(sin, x, 10) # scalar derivative
@@ -11,4 +11,4 @@
   publisher = {GitHub},
   journal = {GitHub repository},
   howpublished = {\url{https://github.com/JuliaDiff/TaylorDiff.jl}}
-}
+}
diff --git a/dev/objects.inv b/dev/objects.inv index c775002..d44a6fe 100644 Binary files a/dev/objects.inv and b/dev/objects.inv differ diff --git a/dev/search_index.js b/dev/search_index.js index 5384151..14b7ac1 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"api/","page":"API","title":"API","text":"CurrentModule = TaylorDiff","category":"page"},{"location":"api/#API","page":"API","title":"API","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"API for TaylorDiff.","category":"page"},{"location":"api/","page":"API","title":"API","text":"Modules = [TaylorDiff]","category":"page"},{"location":"api/#TaylorDiff.TaylorArray","page":"API","title":"TaylorDiff.TaylorArray","text":"TaylorArray{T, N, A, P}\n\nRepresentation of Taylor polynomials in array mode.\n\nFields\n\nvalue::A: zeroth order coefficient\npartials::NTuple{P, A}: i-th element of this stores the i-th derivative\n\n\n\n\n\n","category":"type"},{"location":"api/#TaylorDiff.TaylorScalar","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{T, P}\n\nRepresentation of Taylor polynomials.\n\nFields\n\nvalue::T: zeroth order coefficient\npartials::NTuple{P, T}: i-th element of this stores the i-th derivative\n\n\n\n\n\n","category":"type"},{"location":"api/#TaylorDiff.TaylorScalar-Union{Tuple{P}, Tuple{T}, Tuple{T, T}} where {T, P}","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{P}(value::T, seed::T)\n\nConvenience function: construct a Taylor polynomial with zeroth and first order coefficient, acting as a seed.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.TaylorScalar-Union{Tuple{P}, Tuple{T}} where {T, P}","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{P}(value::T) where {T, P}\n\nConvenience function: construct a Taylor polynomial with zeroth order coefficient.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.can_taylorize-Tuple{Type{<:Real}}","page":"API","title":"TaylorDiff.can_taylorize","text":"TaylorDiff.can_taylorize(V::Type)\n\nDetermines whether the type V is allowed as the scalar type in a Dual. By default, only <:Real types are allowed.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.derivative","page":"API","title":"TaylorDiff.derivative","text":"derivative(f, x, l, ::Val{P})\nderivative(f!, y, x, l, ::Val{P})\n\nComputes P-th directional derivative of f w.r.t. vector x in direction l.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.derivative!","page":"API","title":"TaylorDiff.derivative!","text":"derivative!(result, f, x, l, ::Val{P})\nderivative!(result, f!, y, x, l, ::Val{P})\n\nIn-place derivative calculation APIs. result is expected to be pre-allocated and have the same shape as y.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.derivatives","page":"API","title":"TaylorDiff.derivatives","text":"derivatives(f, x, l, ::Val{P})\nderivatives(f!, y, x, l, ::Val{P})\n\nComputes all derivatives of f at x up to order P.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.get_term_raiser-Tuple{Any}","page":"API","title":"TaylorDiff.get_term_raiser","text":"Pick a strategy for raising the derivative of a function. If the derivative is like 1 over something, raise with the division rule; otherwise, raise with the multiplication rule.\n\n\n\n\n\n","category":"method"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = TaylorDiff","category":"page"},{"location":"#TaylorDiff.jl","page":"Home","title":"TaylorDiff.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is an automatic differentiation (AD) package for efficient and composable higher-order derivatives, implemented with operator-overloading on Taylor polynomials.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Disclaimer: this project is still in early alpha stage, and APIs can change any time in the future. Discussions and potential use cases are extremely welcome!","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is designed with the following goals in head:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Linear scaling with the order of differentiation (while naively composing first-order differentiation would result in exponential scaling)\nSame performance with ForwardDiff.jl on first order and second order, so there is no penalty in drop-in replacement\nCapable for calculating exact derivatives in physical models with ODEs and PDEs\nComposable with other AD systems like Zygote.jl, so that the above models evaluated with TaylorDiff can be further optimized with gradient-based optimization techniques","category":"page"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is fast! See our dedicated benchmarks page for comparison with other packages in various tasks.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"] add TaylorDiff","category":"page"},{"location":"#Usage","page":"Home","title":"Usage","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using TaylorDiff\n\nx = 0.1\nderivative(sin, x, 10) # scalar derivative\nv, direction = [3.0, 4.0], [1.0, 0.0]\nderivative(x -> sum(exp.(x)), v, direction, 2) # directional derivative","category":"page"},{"location":"","page":"Home","title":"Home","text":"Please see our documentation for more details.","category":"page"},{"location":"#Related-Projects","page":"Home","title":"Related Projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorSeries.jl: a systematic treatment of Taylor polynomials in one and several variables, but its mutating and scalar code isn't great for speed and composability with other packages\nForwardDiff.jl: well-established and robust operator-overloading based forward-mode AD, where higher-order derivatives can be achieved by nesting first-order derivatives\nDiffractor.jl: next-generation source-code transformation based forward-mode and reverse-mode AD, designed with support for higher-order derivatives in mind; but the higher-order functionality is currently only a proof-of-concept\njax.jet: an experimental (and unmaintained) implementation of Taylor-mode automatic differentiation in JAX, sharing the same underlying algorithm with this project","category":"page"},{"location":"#Citation","page":"Home","title":"Citation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"@software{tan2022taylordiff,\n author = {Tan, Songchen},\n title = {TaylorDiff.jl: Fast Higher-order Automatic Differentiation in Julia},\n year = {2022},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/JuliaDiff/TaylorDiff.jl}}\n}","category":"page"}] +[{"location":"api/","page":"API","title":"API","text":"CurrentModule = TaylorDiff","category":"page"},{"location":"api/#API","page":"API","title":"API","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"API for TaylorDiff.","category":"page"},{"location":"api/","page":"API","title":"API","text":"Modules = [TaylorDiff]","category":"page"},{"location":"api/#TaylorDiff.TaylorArray","page":"API","title":"TaylorDiff.TaylorArray","text":"TaylorArray{T, N, A, P}\n\nRepresentation of Taylor polynomials in array mode.\n\nFields\n\nvalue::A: zeroth order coefficient\npartials::NTuple{P, A}: i-th element of this stores the i-th derivative\n\n\n\n\n\n","category":"type"},{"location":"api/#TaylorDiff.TaylorScalar","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{T, P}\n\nRepresentation of Taylor polynomials.\n\nFields\n\nvalue::T: zeroth order coefficient\npartials::NTuple{P, T}: i-th element of this stores the i-th derivative\n\n\n\n\n\n","category":"type"},{"location":"api/#TaylorDiff.TaylorScalar-Union{Tuple{P}, Tuple{T}, Tuple{T, T}} where {T, P}","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{P}(value::T, seed::T)\n\nConvenience function: construct a Taylor polynomial with zeroth and first order coefficient, acting as a seed.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.TaylorScalar-Union{Tuple{P}, Tuple{T}} where {T, P}","page":"API","title":"TaylorDiff.TaylorScalar","text":"TaylorScalar{P}(value::T) where {T, P}\n\nConvenience function: construct a Taylor polynomial with zeroth order coefficient.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.can_taylorize-Tuple{Type{<:Real}}","page":"API","title":"TaylorDiff.can_taylorize","text":"TaylorDiff.can_taylorize(V::Type)\n\nDetermines whether the type V is allowed as the scalar type in a Dual. By default, only <:Real types are allowed.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.derivative","page":"API","title":"TaylorDiff.derivative","text":"derivative(f, x, ::Val{P})\nderivative(f, x, l, ::Val{P})\nderivative(f!, y, x, l, ::Val{P})\n\nComputes P-th directional derivative of f w.r.t. vector x in direction l. If x is a Number, the direction l can be omitted.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.derivative!","page":"API","title":"TaylorDiff.derivative!","text":"derivative!(result, f, x, l, ::Val{P})\nderivative!(result, f!, y, x, l, ::Val{P})\n\nIn-place derivative calculation APIs. result is expected to be pre-allocated and have the same shape as y.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.derivatives","page":"API","title":"TaylorDiff.derivatives","text":"derivatives(f, x, l, ::Val{P})\nderivatives(f!, y, x, l, ::Val{P})\n\nComputes all derivatives of f at x up to order P.\n\n\n\n\n\n","category":"function"},{"location":"api/#TaylorDiff.get_term_raiser-Tuple{Any}","page":"API","title":"TaylorDiff.get_term_raiser","text":"Pick a strategy for raising the derivative of a function. If the derivative is like 1 over something, raise with the division rule; otherwise, raise with the multiplication rule.\n\n\n\n\n\n","category":"method"},{"location":"api/#TaylorDiff.@immutable-Tuple{Any}","page":"API","title":"TaylorDiff.@immutable","text":"immutable(def)\n\nTransform a function definition to a @generated function.\n\nAllocations are removed by replacing the output with scalar variables;\nLoops are unrolled;\nIndices are modified to use 1-based indexing;\n\n\n\n\n\n","category":"macro"},{"location":"theory/","page":"Theory","title":"Theory","text":"CurrentModule = TaylorDiff","category":"page"},{"location":"theory/#Theory","page":"Theory","title":"Theory","text":"","category":"section"},{"location":"theory/","page":"Theory","title":"Theory","text":"TaylorDiff.jl is an operator-overloading based forward-mode automatic differentiation (AD) package. \"Forward-mode\" implies that the basic capability of this package is that, for function fmathbb R^ntomathbb R^m, place to evaluate derivative xinmathbb R^n and direction linmathbb R^n, we compute $ f(x),\\partial f(x)\\times v,\\partial^2f(x)\\times v\\times v,\\cdots,\\partial^pf(x)\\times v\\times\\cdots\\times v $ i.e., the function value and the directional derivative up to order p. This notation might be unfamiliar to Julia users that had experience with other AD packages, but partial f(x) is simply the jacobian J, and partial f(x)times v is simply the Jacobian-vector product (jvp). In other words, this is a simple generalization of Jacobian-vector product to Hessian-vector-vector product, and to even higher orders.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"The main advantage of doing this instead of doing p first-order Jacobian-vector products is that nesting first-order AD results in expential scaling w.r.t p, while this method, also known as Taylor mode, should be (almost) linear scaling w.r.t p. We will see the reason of this claim later.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"In order to achieve this, assuming that f is a nested function f_kcirccdotscirc f_2circ f_1, where each f_i is a basic and simple function, or called \"primitives\". We need to figure out how to propagate the derivatives through each step. In first order AD, this is achieved by the \"dual\" pair x_0+x_1varepsilon, where varepsilon^2=0, and for each primitive we make a method overload $ f(x0+x1\\varepsilon)=f(x0)+\\partial f(x0) x1\\varepsilon $ Similarly in higher-order AD, we need for each primitive a method overload for a truncated Taylor polynomial up to order p, and in this polynomial we will use t instead of varepsilon to denote the sensitivity. \"Truncated\" means t^p+1=0, similar as what we defined for dual numbers. So $ f(x0+x1t+x2t^2+\\cdots+x_pt^p)=? $ What is the math expression that we should put into the question mark? That specific expression is called the \"pushforward rule\", and we will talk about how to derive the pushforward rule below.","category":"page"},{"location":"theory/#Arithmetic-of-polynomials","page":"Theory","title":"Arithmetic of polynomials","text":"","category":"section"},{"location":"theory/","page":"Theory","title":"Theory","text":"Before deriving pushforward rules, let's first introduce several basic properties of polynomials.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"If x(t) and y(t) are both truncated Taylor polynomials, i.e. $ \\begin{aligned} x&=x0+x1t+\\cdots+xpt^p\\\ny&=y0+y1t+\\cdots+ypt^p \\end{aligned} $ Then it's obvious that the polynomial addition and subtraction should be $ (x\\pm y)k=xk\\pm yk $ And with some derivation we can also get the polynomial multiplication rule $ (x\\times y)k=\\sum{i=0}^kxiy{k-i} $ The polynomial division rule is less obvious, but if xy=z, then equivalently x=yz, i.e. $ \\left(\\sum{i=0}^pyit^i\\right)\\left(\\sum{i=0}^pzit^i\\right)=\\sum{i=0}^pxit^i $ if we relate the coefficient of t^k on both sides we get $ \\sum{i=0}^k ziy{k-i}=xk $ so, equivalently, $ zk=\\frac1{y0}\\left(xk-\\sum{i=0}^{k-1}ziy{k-1}\\right) $ This is a recurrence relation, which means that we can first get z0=x0/y0 and then get z_1 using z_0, and then get z_2 using z_0z_1 etc.","category":"page"},{"location":"theory/#Pushforward-rule-for-elementary-functions","page":"Theory","title":"Pushforward rule for elementary functions","text":"","category":"section"},{"location":"theory/","page":"Theory","title":"Theory","text":"Let's now consider how to derive the pushforward rule for elementary functions. We will use exp and log as two examples.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"If x(t) is a polynomial and we want to get e(t)=exp(x(t)), we can actually get that by formulating an ordinary differential equation: $ e'(t)=\\exp(x(t))x'(t);\\quad e0=\\exp(x0) $ If we expand both e and x in the equation, we will get $ \\sum{i=1}^pieit^{i-1}=\\left(\\sum{i=0}^{p-1} eit^i\\right)\\left(\\sum{i=1}^pixit^{i-1}\\right) $ relating the coefficient of t^k-1 on both sides, we get $ kek=\\sum{i=0}^{k-1}ei\\times (k-i)x{k-i} $ This is, again, a recurrence relation, so we can get e_1cdotse_p step-by-step.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"If x(t) is a polynomial and we want to get l(t)=log(x(t)), we can actually get that by formulating an ordinary differential equation: $ l'(t)=\\frac1xx'(t);\\quad l0=\\log(x0) $ If we expand both l and x in the equation, the RHS is simply polynomial divisions, and we get $ lk=\\frac1{x0}\\left(xk-\\frac1k\\sum{i=1}^{k-1}ilix{k-j}\\right) $","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"Now notice the difference between the rule for exp and log: the derivative of exponentiation is itself, so we can obtain from recurrence relation; the derivative of logarithm is 1x, an algebraic expression in x, so it can be directly computed. Similarly, we have (tan x)=1+tan^2x but (arctan x)=(1+x^2)^-1. We summarize (omitting proof) that","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"Every exp-like function (like sin, cos, tan, sinh, ...)'s derivative is somehow recursive\nEvery log-like function (like arcsin, arccos, arctan, operatornamearcsinh, ...)'s derivative is algebraic","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"So all of the elementary functions have an easy pushforward rule that can be computed within O(p^2) time. Note that this is an elegant and straightforward corollary from the definition of \"elementary function\" in differential algebra.","category":"page"},{"location":"theory/#Generic-pushforward-rule","page":"Theory","title":"Generic pushforward rule","text":"","category":"section"},{"location":"theory/","page":"Theory","title":"Theory","text":"For a generic f(x), if we don't bother deriving the specific recurrence rule for it, we can still automatically generate pushforward rule in the following manner. Let's denote the derivative of f w.r.t x to be d(x), then for f(t)=f(x(t)) we have $ f'(t)=d(x(t))x'(t);\\quad f(0)=f(x_0) $ when we expand f and x up to order p into this equation, we notice that only order p-1 is needed for d(x(t)). In other words, we turn a problem of finding p-th order pushforward for f, to a problem of finding p-1-th order pushforward for d, and we can recurse down to the first order. The first-order derivative expressions are captured from ChainRules.jl, which made this process fully automatic.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"This strategy is in principle equivalent to nesting first-order differentiation, which could potentially leads to exponential scaling; however, in practice there is a huge difference. This generation of pushforward rule happens at compile time, which gives the compiler a chance to check redundant expressions and optimize it down to quadratic time. Compiler has stack limits but this should work for at least up to order 100.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"In the current implementation of TaylorDiff.jl, all log-like functions' pushforward rules are generated by this strategy, since their derivatives are simple algebraic expressions; some exp-like functions, like sinh, is also generated; the most-often-used several exp-like functions are hand-written with hand-derived recurrence relations.","category":"page"},{"location":"theory/","page":"Theory","title":"Theory","text":"If you find that the code generated by this strategy is slow, please file an issue and we will look into it.","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = TaylorDiff","category":"page"},{"location":"#TaylorDiff.jl","page":"Home","title":"TaylorDiff.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is an automatic differentiation (AD) package for efficient and composable higher-order derivatives, implemented with operator-overloading on Taylor polynomials.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Disclaimer: this project is still in early alpha stage, and APIs can change any time in the future. Discussions and potential use cases are extremely welcome!","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is designed with the following goals in head:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Linear scaling with the order of differentiation (while naively composing first-order differentiation would result in exponential scaling)\nSame performance with ForwardDiff.jl on first order and second order, so there is no penalty in drop-in replacement\nCapable for calculating exact derivatives in physical models with ODEs and PDEs\nComposable with other AD systems like Zygote.jl, so that the above models evaluated with TaylorDiff can be further optimized with gradient-based optimization techniques","category":"page"},{"location":"","page":"Home","title":"Home","text":"TaylorDiff.jl is fast! See our dedicated benchmarks page for comparison with other packages in various tasks.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"] add TaylorDiff","category":"page"},{"location":"#Usage","page":"Home","title":"Usage","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using TaylorDiff\n\nx = 0.1\nderivative(sin, x, 10) # scalar derivative\nv, direction = [3.0, 4.0], [1.0, 0.0]\nderivative(x -> sum(exp.(x)), v, direction, 2) # directional derivative","category":"page"},{"location":"","page":"Home","title":"Home","text":"Please see our documentation for more details.","category":"page"},{"location":"#Related-Projects","page":"Home","title":"Related Projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TaylorSeries.jl: a systematic treatment of Taylor polynomials in one and several variables, but its mutating and scalar code isn't great for speed and composability with other packages\nForwardDiff.jl: well-established and robust operator-overloading based forward-mode AD, where higher-order derivatives can be achieved by nesting first-order derivatives\nDiffractor.jl: next-generation source-code transformation based forward-mode and reverse-mode AD, designed with support for higher-order derivatives in mind; but the higher-order functionality is currently only a proof-of-concept\njax.jet: an experimental (and unmaintained) implementation of Taylor-mode automatic differentiation in JAX, sharing the same underlying algorithm with this project","category":"page"},{"location":"#Citation","page":"Home","title":"Citation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"@software{tan2022taylordiff,\n author = {Tan, Songchen},\n title = {TaylorDiff.jl: Fast Higher-order Automatic Differentiation in Julia},\n year = {2022},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/JuliaDiff/TaylorDiff.jl}}\n}","category":"page"}] } diff --git a/dev/theory/index.html b/dev/theory/index.html new file mode 100644 index 0000000..1becd72 --- /dev/null +++ b/dev/theory/index.html @@ -0,0 +1,3 @@ + +Theory · TaylorDiff.jl

Theory

TaylorDiff.jl is an operator-overloading based forward-mode automatic differentiation (AD) package. "Forward-mode" implies that the basic capability of this package is that, for function $f:\mathbb R^n\to\mathbb R^m$, place to evaluate derivative $x\in\mathbb R^n$ and direction $l\in\mathbb R^n$, we compute $ f(x),\partial f(x)\times v,\partial^2f(x)\times v\times v,\cdots,\partial^pf(x)\times v\times\cdots\times v $ i.e., the function value and the directional derivative up to order $p$. This notation might be unfamiliar to Julia users that had experience with other AD packages, but $\partial f(x)$ is simply the jacobian $J$, and $\partial f(x)\times v$ is simply the Jacobian-vector product (jvp). In other words, this is a simple generalization of Jacobian-vector product to Hessian-vector-vector product, and to even higher orders.

The main advantage of doing this instead of doing $p$ first-order Jacobian-vector products is that nesting first-order AD results in expential scaling w.r.t $p$, while this method, also known as Taylor mode, should be (almost) linear scaling w.r.t $p$. We will see the reason of this claim later.

In order to achieve this, assuming that $f$ is a nested function $f_k\circ\cdots\circ f_2\circ f_1$, where each $f_i$ is a basic and simple function, or called "primitives". We need to figure out how to propagate the derivatives through each step. In first order AD, this is achieved by the "dual" pair $x_0+x_1\varepsilon$, where $\varepsilon^2=0$, and for each primitive we make a method overload $ f(x0+x1\varepsilon)=f(x0)+\partial f(x0) x1\varepsilon $ Similarly in higher-order AD, we need for each primitive a method overload for a truncated Taylor polynomial up to order $p$, and in this polynomial we will use $t$ instead of $\varepsilon$ to denote the sensitivity. "Truncated" means $t^{p+1}=0$, similar as what we defined for dual numbers. So $ f(x0+x1t+x2t^2+\cdots+x_pt^p)=? $ What is the math expression that we should put into the question mark? That specific expression is called the "pushforward rule", and we will talk about how to derive the pushforward rule below.

Arithmetic of polynomials

Before deriving pushforward rules, let's first introduce several basic properties of polynomials.

If $x(t)$ and $y(t)$ are both truncated Taylor polynomials, i.e. $ \begin{aligned} x&=x0+x1t+\cdots+xpt^p\ +y&=y0+y1t+\cdots+ypt^p \end{aligned} $ Then it's obvious that the polynomial addition and subtraction should be $ (x\pm y)k=xk\pm yk $ And with some derivation we can also get the polynomial multiplication rule $ (x\times y)k=\sum{i=0}^kxiy{k-i} $ The polynomial division rule is less obvious, but if $x/y=z$, then equivalently $x=yz$, i.e. $ \left(\sum{i=0}^pyit^i\right)\left(\sum{i=0}^pzit^i\right)=\sum{i=0}^pxit^i $ if we relate the coefficient of $t^k$ on both sides we get $ \sum{i=0}^k ziy{k-i}=xk $ so, equivalently, $ zk=\frac1{y0}\left(xk-\sum{i=0}^{k-1}ziy{k-1}\right) $ This is a recurrence relation, which means that we can first get z0=x0/y0$, and then get $z_1$ using $z_0$, and then get $z_2$ using $z_0,z_1$ etc.

Pushforward rule for elementary functions

Let's now consider how to derive the pushforward rule for elementary functions. We will use $\exp$ and $\log$ as two examples.

If $x(t)$ is a polynomial and we want to get $e(t)=\exp(x(t))$, we can actually get that by formulating an ordinary differential equation: $ e'(t)=\exp(x(t))x'(t);\quad e0=\exp(x0) $ If we expand both $e$ and $x$ in the equation, we will get $ \sum{i=1}^pieit^{i-1}=\left(\sum{i=0}^{p-1} eit^i\right)\left(\sum{i=1}^pixit^{i-1}\right) $ relating the coefficient of $t^{k-1}$ on both sides, we get $ kek=\sum{i=0}^{k-1}ei\times (k-i)x{k-i} $ This is, again, a recurrence relation, so we can get $e_1,\cdots,e_p$ step-by-step.

If $x(t)$ is a polynomial and we want to get $l(t)=\log(x(t))$, we can actually get that by formulating an ordinary differential equation: $ l'(t)=\frac1xx'(t);\quad l0=\log(x0) $ If we expand both $l$ and $x$ in the equation, the RHS is simply polynomial divisions, and we get $ lk=\frac1{x0}\left(xk-\frac1k\sum{i=1}^{k-1}ilix{k-j}\right) $


Now notice the difference between the rule for $\exp$ and $\log$: the derivative of exponentiation is itself, so we can obtain from recurrence relation; the derivative of logarithm is $1/x$, an algebraic expression in $x$, so it can be directly computed. Similarly, we have $(\tan x)'=1+\tan^2x$ but $(\arctan x)'=(1+x^2)^{-1}$. We summarize (omitting proof) that

  • Every $\exp$-like function (like $\sin$, $\cos$, $\tan$, $\sinh$, ...)'s derivative is somehow recursive
  • Every $\log$-like function (like $\arcsin$, $\arccos$, $\arctan$, $\operatorname{arcsinh}$, ...)'s derivative is algebraic

So all of the elementary functions have an easy pushforward rule that can be computed within $O(p^2)$ time. Note that this is an elegant and straightforward corollary from the definition of "elementary function" in differential algebra.

Generic pushforward rule

For a generic $f(x)$, if we don't bother deriving the specific recurrence rule for it, we can still automatically generate pushforward rule in the following manner. Let's denote the derivative of $f$ w.r.t $x$ to be $d(x)$, then for $f(t)=f(x(t))$ we have $ f'(t)=d(x(t))x'(t);\quad f(0)=f(x_0) $ when we expand $f$ and $x$ up to order $p$ into this equation, we notice that only order $p-1$ is needed for $d(x(t))$. In other words, we turn a problem of finding $p$-th order pushforward for $f$, to a problem of finding $p-1$-th order pushforward for $d$, and we can recurse down to the first order. The first-order derivative expressions are captured from ChainRules.jl, which made this process fully automatic.

This strategy is in principle equivalent to nesting first-order differentiation, which could potentially leads to exponential scaling; however, in practice there is a huge difference. This generation of pushforward rule happens at compile time, which gives the compiler a chance to check redundant expressions and optimize it down to quadratic time. Compiler has stack limits but this should work for at least up to order 100.

In the current implementation of TaylorDiff.jl, all $\log$-like functions' pushforward rules are generated by this strategy, since their derivatives are simple algebraic expressions; some $\exp$-like functions, like sinh, is also generated; the most-often-used several $\exp$-like functions are hand-written with hand-derived recurrence relations.

If you find that the code generated by this strategy is slow, please file an issue and we will look into it.