Skip to content

Commit

Permalink
🚀 Release 0.4.0 (#134)
Browse files Browse the repository at this point in the history
  • Loading branch information
jean-francoisreboud authored Sep 1, 2024
2 parents 424cd06 + 6f8720a commit a6ca885
Show file tree
Hide file tree
Showing 197 changed files with 50,702 additions and 6,946 deletions.
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,40 @@ All notable changes to this project will be documented in this file.

## [unreleased]

## 0.4.0 (2024-09-01)

### Features

🚀 **examples:** integrate Gemma2-2B ([#132](https://github.com/owkin/GrAIdient/pull/132))\
**layer_seq:** LLM sliding window ([#131](https://github.com/owkin/GrAIdient/pull/131))\
🚀 **examples:** 3 LLMs examples ([#130](https://github.com/owkin/GrAIdient/pull/130))\
**layer_seq:** LLM generate ([128](https://github.com/owkin/GrAIdient/pull/128))\
**layer_seq:** MultiplySeq, SiLU & LLM test ([127](https://github.com/owkin/GrAIdient/pull/127))\
**layer_seq:** ValueCausalSeq ([126](https://github.com/owkin/GrAIdient/pull/126))\
**layer_seq:** QueryCausalSeq ([125](https://github.com/owkin/GrAIdient/pull/125))\
**layer_seq:** RoPESeq ([124](https://github.com/owkin/GrAIdient/pull/124))\
**layer_seq:** RMSNormSeq ([123](https://github.com/owkin/GrAIdient/pull/123))\
**layer_seq:** EmbeddingSeq ([122](https://github.com/owkin/GrAIdient/pull/122))\
🪜 **feat:** LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq ([#117](https://github.com/owkin/GrAIdient/pull/117))\
⚙️ **core:** GELU vs GELUApprox ([113](https://github.com/owkin/GrAIdient/pull/113))\
🚀 **perf:** QuerySelf & ValueSelf ([112](https://github.com/owkin/GrAIdient/pull/112))\
🚀 **perf:** benchmark ViT base model ([111](https://github.com/owkin/GrAIdient/pull/111))\
⚙️ **core:** initForward,Backward model API ([109](https://github.com/owkin/GrAIdient/pull/109))\
🪜 **layer_1d:** Dropout1D ([#108](https://github.com/owkin/GrAIdient/pull/108))\
🪜 **feat:** VQGrad, VQGradSeq ([#107](https://github.com/owkin/GrAIdient/pull/107))

### Bug Fixes

🐛 **fix:** run on Apple Silicon ([110](https://github.com/owkin/GrAIdient/pull/110))

### Miscellaneous Tasks

📚 **docs:** LLM doc & split tests ([129](https://github.com/owkin/GrAIdient/pull/129))\
🚀 **perf:** use half in Metal kernels ([121](https://github.com/owkin/GrAIdient/pull/121))\
🔨 **refactor:** handle float16 along float on GPU ([#120](https://github.com/owkin/GrAIdient/pull/120))\
🚀 **perf:** copy & generate weights faster ([119](https://github.com/owkin/GrAIdient/pull/119))\
🚀 **perf:** Convolution2D ([118](https://github.com/owkin/GrAIdient/pull/118))

## 0.3.1 (2023-08-09)

### Bug Fixes
Expand Down
5 changes: 3 additions & 2 deletions Docs/Contributing/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,13 +248,14 @@ containing the commits to merge into the `main` branch.
Do not delete the "Unreleased" section title: future PRs will insert
changelog items in this section.
- Commit and push the changes.
- Squash and merge the new branch into `release_N`.
- Squash and merge the new branch into `release_N` with title \
🔧 chore: update changelog

1. Create a Pull Request for `release_N` targeting the `main` branch.

1. Review and Merge the Pull Request, change the commit
message \
🔧 chore: release X.Y.Z
🚀 Release X.Y.Z

1. Create a GitHub release X.Y.Z from `main`:
- GitHub > Releases > Draft new Release
Expand Down
14 changes: 13 additions & 1 deletion Docs/Examples/AutoEncoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,19 @@ conda env remove --name graiexamples

## Steps

1. Dump the training dataset.
Each train example uses a `CIFARAutoEncoderTrainer`.
The latter is responsible for initializing the training dataset
before the actual training takes place.

1. Train a simple auto encoder model.
1. Train a UNet like auto encoder model.
1. Train a StyleGAN like auto encoder model.

## Further tests

Further tests are available at
[AutoEncoderTests](../../Tests/GrAIExamples/AutoEncoderTests.swift).

The test `testTrain` compares the training of a `SimpleAutoEncoder`
in GrAIdient and in PyTorch to show that the same `loss` is computed
throughout the training.
1 change: 1 addition & 0 deletions Docs/Examples/EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ The following examples are currently available:
- [VGG](VGG.md)
- [Vision Transformer](VisionTransformer.md)
- [Auto Encoder](AutoEncoder.md)
- [LLM](LLM.md)
64 changes: 64 additions & 0 deletions Docs/Examples/LLM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# 🚀 LLM Example

This is the documentation for running
[LLMs](../../Tests/GrAIExamples/LLMExample.swift) on the GPU.

## Setup

This example has some `Python` dependencies. In order to run
the example, we first have to setup the environment:

```bash
conda create --name graiexamples python=3.9
conda activate graiexamples
cd Tests/GrAIExamples/Base
pip install -e .
```

Then:
- Download weights from
[MistralAI](https://docs.mistral.ai/getting-started/open_weight_models/)
(mistral-7B-Instruct-v0.3)
and / or
[Llama](https://llama.meta.com/llama-downloads/)
(llama-2-7b-chat or Meta-Llama-3-8B-Instruct)
and / or Gemma2 from [HuggingFace](https://huggingface.co/google/gemma-2-2b-it)
(Gemma-2-2b-it).
- Update `_modelPathMistral`, `_modelPathLlama2`, `_modelPathLlama3`,
`_modelPathGemma2` in the
[LLMExample](../../Tests/GrAIExamples/LLMExample.swift) file with the
previous downloaded weights.
- Optionnally update `_prompt`.
- Rename `_testGenerateMistral`, `_testGenerateLlama2`, `_testGenerateLlama3`
and `_testGenerateGemma2`
into
`testGenerateMistral`, `testGenerateLlama2`, `testGenerateLlama3` and
`testGenerateGemma2`.
- Run the tests.

It is finally possible to clean the environment 🌍

```bash
conda deactivate
conda env remove --name graiexamples
```

## Steps

1. Generate text from a prompt with Mistral 7B Instruct model.
1. Generate text from a prompt with Llama 2 7B Chat model.
1. Generate text from a prompt with Llama 3 8B Instruct model.
1. Generata text from a prompt with Gemme 2 2B Instruct model.

## Further tests

Further tests are available at
[LLMExampleTests](../../Tests/GrAIExamples/LLMExampleTests.swift).
In order to run them, rename
`_testPredict1` and `_testPredict32` into `testPredict1` and `testPredict32`.

The test `testPredict1` compares the first step of generation
of a toy LLM (just one transformer block) in GrAIdient and in PyTorch.

The test `testPredict32` runs the first step of generation
of a full LLM in GrAIdient and compares the expected result from PyTorch.
14 changes: 14 additions & 0 deletions Docs/Examples/VGG.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,17 @@ conda env remove --name graiexamples
1. Train a model on the training dataset.
1. Evaluate the trained model on the testing dataset:
watch a better performance.

## Benchmarks

To benchmark the time performance of the VGG model, look at
[VGGBenchmark](../../Tests/GrAIExamples/VGGBenchmark.swift) and rename
`_test_TrainVGG` and `_test_EvalVGG` into `test_TrainVGG` and `test_EvalVGG`.

The test `test_TrainVGG` will measure the time spent for training the VGG
model for 20 steps.

The test `test_EvalVGG` will measure the time spent for running the VGG model
in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
17 changes: 17 additions & 0 deletions Docs/Examples/VisionTransformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,20 @@ conda env remove --name graiexamples

1. Dump the training dataset.
1. Train a simple Vision Transformer model.

## Benchmarks

To benchmark the time performance of the Vision Transformer model,
look at
[TransformerBenchmark](../../Tests/GrAIExamples/TransformerBenchmark.swift)
and rename
`_test_TrainTransformer` and `_test_EvalTransformer` into
`test_TrainTransformer` and `test_EvalTransformer`.

The test `test_TrainTransformer` will measure the time spent for training the
VisionTransformer model for 20 steps.

The test `test_EvalTransformer` will measure the time spent for running the
VisionTransformer model in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
2 changes: 1 addition & 1 deletion Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import PackageDescription
let package = Package(
name: "GrAIdient",
platforms: [
.macOS(.v10_15)
.macOS(.v13)
],
products: [
.library(
Expand Down
159 changes: 153 additions & 6 deletions Sources/GrAITestsUtils/Trainer.swift
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ extension TestError: CustomStringConvertible
///
/// - Parameter model: The model on which to select the initialization scheme.
///
func randomSelectWeightsInitializationScheme(model: Model)
public func randomSelectWeightsInitializationScheme(model: Model)
{
let choice = Int.random(in: 0...4)
switch choice {
Expand Down Expand Up @@ -365,6 +365,153 @@ open class FlowTrainer: Trainer
}
}

/// Pipeline that compares gradients of weights computed in the CPU execution context againt the GPU one.
open class FlowPrecisionTrainer: Trainer
{
///
/// The two models:
/// [model to execute with Float precision, same model to execute with Float16 precision].
///
public var models: [Model] = []

/// Get the model to execute with Float precision.
public var modelFloat: Model
{
get {
return models[0]
}
}
/// Get the model to execute with Float16 precision.
public var modelFloat16: Model
{
get {
return models[1]
}
}

///
/// Create a model in the two execution contexts: CPU and GPU.
///
/// - Parameter buildFct: A Function that creates the different layers of the models.
///
public func build(_ buildFct: (ModelContext)->())
{
var baseModels = [BaseModel]()

let context = ModelContext(name: modelName + "Float", curID: 0)
buildFct(context)
baseModels.append(context.model)

context.model = BaseModel(name: modelName + "Float16")
buildFct(context)
baseModels.append(context.model)

var models = [Model]()
for baseModel in baseModels
{
models.append(Model(model: baseModel, modelsPrev: []))
}
self.models = models
}

/// Initialize the kernel of the models.
public func initialize()
{
for i in 0...1
{
if i == 0
{
GrAI.Precision.float = true
randomSelectWeightsInitializationScheme(model: modelFloat)
}

if i > 0
{
models[i].weights = models[i-1].weights
}

if i == 1
{
GrAI.Precision.float16 = true
}

models[i].initialize(
params: optimizerParams,
phase: .Training,
deviceID: DEVICE_ID
)
}
}

///
/// Run the test.
///
/// The goal is to compare the gradients of weights computed with Float precision with
/// the gradients of weights computed with Float16 precision.
///
/// - Parameters:
/// - setData: A function to create/set data to the model.
/// - setLoss: A function to create/set ground truth to the model.
/// - validate: A function that checks whether the relative difference is small enough.
///
public func run<DataT, LossT>(
setData: (DataT?, Model)->(DataT, Int),
setLoss: (LossT?, Model)->(LossT),
validate: (Double) throws -> ()) throws
{
initialize()

var epoch = 0
let nbEpochsMax = 1
while epoch < nbEpochsMax
{
var numLoop = 0
while numLoop < optimizerParams.nbLoops
{
let resultsFloat: [Double]
GrAI.Precision.float = true

var (inputs, batchSize) = setData(nil, modelFloat)
modelFloat.updateKernel(batchSize: batchSize)
try! modelFloat.forward()

var gt = setLoss(nil, modelFloat)
try! modelFloat.backward()
try! modelFloat.update()

resultsFloat = getGradients(model: modelFloat)

let resultsFloat16: [Double]
GrAI.Precision.float16 = true

(inputs, batchSize) = setData(inputs, modelFloat16)
modelFloat16.updateKernel(batchSize: batchSize)
try! modelFloat16.forward()

gt = setLoss(gt, modelFloat16)
try! modelFloat16.backward()
try! modelFloat16.update()

resultsFloat16 = getGradients(model: modelFloat16)

if let gradDiff = checkFlow(resultsFloat, resultsFloat16)
{
if gradDiff.isNaN
{
fatalError("NaN")
}
try validate(gradDiff)
}

modelFloat.incStep()
modelFloat16.incStep()
numLoop += 1
}
epoch += 1
}
}
}

/// Compares gradients of weights computed in the CPU execution context againt the GPU one
/// after a call to the reset API.
open class FlowResetTrainer: FlowTrainer
Expand Down Expand Up @@ -831,18 +978,18 @@ open class TransformTrainer: FlowTrainer
// 5. Compare results.

let diffCPU =
(lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
(lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
(lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
(lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
let diffGPU =
(lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
(lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)
(lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
(lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)

var warning = ""
let maxDiff = max(diffCPU, diffGPU)
let maxIndex = diffCPU < diffGPU ? "GPU" : "CPU"
if diffCPU > 0.0000001
{
warning = "Load Check Warning " + maxIndex + " : "
warning = "Transform Check Warning " + maxIndex + " : "
}
let strDump = warning + String(maxDiff)
print(strDump)
Expand Down
Loading

0 comments on commit a6ca885

Please sign in to comment.