Skip to content

Commit

Permalink
Merge pull request #8 from Dexterp37/better-vfr-train
Browse files Browse the repository at this point in the history
Improve training on the AdobeVFR dataset
  • Loading branch information
Dexterp37 authored Oct 11, 2023
2 parents 24e6730 + 212ec4d commit 790760b
Show file tree
Hide file tree
Showing 10 changed files with 321 additions and 46 deletions.
38 changes: 32 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Recognition and doing inference with them.

## Feature highlights

* [DeepFont-like network architecture](https://arxiv.org/pdf/1507.03196v1.pdf)
* DeepFont-like network architecture. See [​​Z. Wang, J. Yang, H. Jin, E. Shechtman, A. Agarwala, J. Brandt and T. Huang, “DeepFont: Identify Your Font from An Image”, In Proceedings of ACM International Conference on Multimedia (ACM MM) , 2015](https://arxiv.org/abs/1507.03196)
* Configuration-based synthetic dataset generation
* Configuration-based model training via [PyTorch Lightning](https://lightning.ai/pytorch-lightning)
* Supports training and inference on Linux, MacOS and Windows.
Expand Down Expand Up @@ -47,7 +47,7 @@ make test

### Generating a synthetic dataset
If needed, the model can be trained on synthetic data. `fontina` provides a synthetic
dataset generator that follows part of the recommendations from the [DeepFont paper](https://arxiv.org/pdf/1507.03196v1.pdf)
dataset generator that follows part of the recommendations from the [DeepFont paper](https://arxiv.org/abs/1507.03196)
to make the synthetic data look closer to the real data. To use the generator:

1. Make a copy of `configs/sample.yaml`, e.g. `configs/mymodel.yaml`
Expand Down Expand Up @@ -75,7 +75,7 @@ fonts:
3. Run the generation:
```bash
python src/fontina/generate.py -c configs/mymodel.yaml -o outputs/font-images/mymodel
fontina-generate -c configs/mymodel.yaml -o outputs/font-images/mymodel
```

After this completes, there should be one directory per configured font in `outputs/font-images/mymodel`.
Expand Down Expand Up @@ -124,7 +124,7 @@ python src/fontina/train.py -c configs/mymodel.yaml
```

#### Part 2 - Supervised training
1. Open `configs/mymodel.yaml` and tweak the `training` section:
1. Open `configs/mymodel.yaml` (or create a new one!) and tweak the `training` section:

```yaml
training:
Expand Down Expand Up @@ -155,7 +155,7 @@ training:
2. Then run the training with:
```bash
python src/fontina/train.py -c configs/mymodel.yaml
fontina-train -c configs/mymodel.yaml
```

### **(Optional)** - Monitor performance using TensorBoard
Expand All @@ -171,5 +171,31 @@ tensorboard --logdir=lightning_logs
Once training is complete, the resulting model can be used to run inference.

```bash
python src/fontina/predict.py -w "outputs/models/mymodel-full/best_checkpoint.ckpt" -i "assets/images/test.png"
fontina-predict -n 6 -w "outputs/models/mymodel-full/best_checkpoint.ckpt" -i "assets/images/test.png"
```

## AdobeVFR Pre-trained model
The AdobeVFR dataset is currently available for download [at Dropbox, here](https://www.dropbox.com/sh/o320sowg790cxpe/AADDmdwQ08GbciWnaC20oAmna?dl=0). The license for using and distributing the dataset is available [here](https://www.dropbox.com/sh/o320sowg790cxpe/AADDmdwQ08GbciWnaC20oAmna?dl=0&preview=license.txt), which cites:

> This dataset ('Licensed Material') is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications or personal experimentation.
The model, being trained on that dataset, retains the same spirit and the same license applies: the release model can only be used for non-commercial purposes.

### How to train

1. Download the dataset to `assets/AdobeVFR`
2. Unpack `assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new.zip` in that directory so that the `assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new/` path exists
3. Run `fontina-train -c configs/adobe-vfr-autoencoder.yaml`. This will take a long while but progress can be checked with Tensorboard (see the previous sections) during training
4. Change `configs/adobe-vfr.yaml` so that `scae_checkpoint_file` points to the best checkpoint from step (3).
5. Run `fontina-train -c configs/adobe-vfr.yaml`. This will take a long while (but less than the unsupervised training round)

### Downloading the models
While only the full model is needed, the stand-alone autonencoder model is being released as well.

* Stand-alone autoencoder model: [Google Drive](https://drive.google.com/file/d/107Ontyg2FGxOKvhE7KM7HSaJ1Wn2Merr/view?usp=sharing)
* Full model: [Google Drive](https://drive.google.com/file/d/1Fw-bjmapCXe0aCiYvOyGLmYocZDvmptK/view?usp=drive_link)

> **Note**
The pre-trained model achieves a validation loss of 0.3523, with an accuracy of 0.8855 after 14 epochs.
Unfortunately the test performance on `VFR_real_test` is much worse, with a top-1 accuracy of 0.05.
I'm releasing the model in the hope that somebody could help me fixing this 😊😅
87 changes: 87 additions & 0 deletions configs/adobe-vfr-autoencoder.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
# This section of the configuration is used to control
# the generation of the synthetic image data for the
# visual font recognition task.
fonts:
# Whether or not to enable random spacing between characters.
random_character_spacing: False

# The regular expression to use to generate the text
# in the synthetic image samples.
regex_template: '[A-Z0-9]{5,10} [A-Z0-9]{3,7}'

# The path to the directory containing background images.
# If provided, images in this directory will be used as
# background for the generated text. If omitted, images
# will have a white background.
backgrounds_path: "assets/backgrounds"

# The number of samples to generate for each font.
samples_per_font: 50

classes:
- name: Test Font
path: "assets/fonts/test/Test.ttf"
- name: Other Test Font
path: "assets/fonts/test2/Test2.ttf"

# This section controls the training configuration for the model.
training:
only_autoencoder: True

# The path to the pre-trained checkpoint to use for the
# stacked autoencoders within the DeepFont-like model. Setting
# this property skip training the SCAE.
# scae_checkpoint_file: "outputs/adobevfr/final/autoenc-epoch=13-val_loss=0.0016.ckpt"

# Whether or not to use a fixed random seed for training. Note
# that this is useful for creating reproducible runs for debugging
# purposes.
# fixed_seed: 42

# The type of data source stored in the data root.
# It's one of:
# - "raw-images": the data root contains one directory
# per font type, each having the samples coming from
# that font.
# - "adobevfr": the data root contains the AdobeVFR in
# BCF format, i.e. the 'VFR_real_test', 'VFR_syn_train'
# and 'VFR_syn_val' directories.
dataset_type: "adobevfr"

# The root directory containing the data generated from the
# synthetic image generation step.
data_root: "assets/AdobeVFR"

# The directory that will contain the model checkpoints.
output_dir: "outputs/adobevfr/autoenc"

# The number of workers to use for the data loaders. See
# the PyTorch documentation here:
# https://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader
num_workers: 12

# The size of the batch to use for training.
batch_size: 128

# The initial learning rate to use for training.
learning_rate: 0.01

epochs: 10

# The ratio to use for splitting the samples in the data
# root into train, validation and test sets.
# Note that the validation set is used during for validating
# during the training cycle, while the testing set, if
# provided, is used after the training phase is complete.
train_ratio: 0.8
# The following ratios are meaningful only if run_test_cycle
# is enabled.
validation_ratio: 0.1
test_ratio: 0.1

# Whether or not to use a fraction of the data to run a
# test cycle on the trained model. If this is disabled
# then only the train ratio will be used: the validation
# ratio will be automatically computed.
run_test_cycle: True
88 changes: 88 additions & 0 deletions configs/adobe-vfr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
# This section of the configuration is used to control
# the generation of the synthetic image data for the
# visual font recognition task.
fonts:
# Whether or not to enable random spacing between characters.
random_character_spacing: False

# The regular expression to use to generate the text
# in the synthetic image samples.
regex_template: '[A-Z0-9]{5,10} [A-Z0-9]{3,7}'

# The path to the directory containing background images.
# If provided, images in this directory will be used as
# background for the generated text. If omitted, images
# will have a white background.
backgrounds_path: "assets/backgrounds"

# The number of samples to generate for each font.
samples_per_font: 50

classes:
- name: Test Font
path: "assets/fonts/test/Test.ttf"
- name: Other Test Font
path: "assets/fonts/test2/Test2.ttf"

# This section controls the training configuration for the model.
training:
# TODO: When training the autoencoder, use the real images.
only_autoencoder: False

# The path to the pre-trained checkpoint to use for the
# stacked autoencoders within the DeepFont-like model. Setting
# this property skip training the SCAE.
scae_checkpoint_file: "outputs/adobevfr/final/v82-autoenc-epoch=10-val_loss=0.0019-val_accuracy=0.0000.ckpt"

# Whether or not to use a fixed random seed for training. Note
# that this is useful for creating reproducible runs for debugging
# purposes.
# fixed_seed: 42

# The type of data source stored in the data root.
# It's one of:
# - "raw-images": the data root contains one directory
# per font type, each having the samples coming from
# that font.
# - "adobevfr": the data root contains the AdobeVFR in
# BCF format, i.e. the 'VFR_real_test', 'VFR_syn_train'
# and 'VFR_syn_val' directories.
dataset_type: "adobevfr"

# The root directory containing the data generated from the
# synthetic image generation step.
data_root: "assets/AdobeVFR"

# The directory that will contain the model checkpoints.
output_dir: "outputs/adobevfr/full"

# The number of workers to use for the data loaders. See
# the PyTorch documentation here:
# https://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader
num_workers: 12

# The size of the batch to use for training.
batch_size: 128

# The initial learning rate to use for training.
learning_rate: 0.01

epochs: 20

# The ratio to use for splitting the samples in the data
# root into train, validation and test sets.
# Note that the validation set is used during for validating
# during the training cycle, while the testing set, if
# provided, is used after the training phase is complete.
train_ratio: 0.8
# The following ratios are meaningful only if run_test_cycle
# is enabled.
validation_ratio: 0.1
test_ratio: 0.1

# Whether or not to use a fraction of the data to run a
# test cycle on the trained model. If this is disabled
# then only the train ratio will be used: the validation
# ratio will be automatically computed.
run_test_cycle: True
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ readme = "README.md"
license = { text = "MIT" }

[project.scripts]
fontina-generate = "fontina.generate:main"
fontina-train = "fontina.train:main"
fontina-predict = "fontina.predict:main"

Expand Down
13 changes: 8 additions & 5 deletions src/fontina/adobevfr_dataset.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import numpy as np
import cv2
import io
import numpy as np
import torch

from PIL import Image
from torch.utils.data import Dataset


Expand All @@ -26,12 +26,15 @@ def __init__(self, bcf_path: str, dataset_type: str, transform=None):

def __getitem__(self, index):
binary_image = self._get_bcf_entry_by_index(index)
pil_image = Image.open(io.BytesIO(binary_image)).convert("L")
raw_image = np.array(pil_image, dtype="uint8")
image_as_array = np.asarray(
bytearray(io.BytesIO(binary_image).read()), dtype=np.uint8
)
cv2image = cv2.imdecode(image_as_array, cv2.IMREAD_GRAYSCALE)
raw_image = np.array(cv2image, dtype="uint8")
x = self.transform(image=raw_image)["image"] if self.transform else raw_image
# We need to cast to `torch.long` to prevent errors such as
# "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'.
return x, torch.tensor(self.labels[index], dtype=torch.long)
return x, torch.as_tensor(self.labels[index], dtype=torch.long)

def __len__(self):
return len(self._bcf_offsets) - 1
Expand Down
30 changes: 15 additions & 15 deletions src/fontina/augmentation_utils.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
import albumentations as A
import cv2
import numpy as np
import numpy.typing as npt

from albumentations.pytorch import ToTensorV2
from albumentations.core.transforms_interface import ImageOnlyTransform
from PIL import Image


def resize_fixed_height(img: Image.Image, new_height: int = 105):
def resize_fixed_height(img: npt.NDArray[np.uint8], new_height: int = 105):
# From the paper: height is fixed to 105 pixels, width is scaled
# to keep aspect ratio.
width, height = img.size
height, width = img.shape[:2]
new_width = round(new_height * width / height)
return img.resize((new_width, new_height), Image.LANCZOS)
return cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_LANCZOS4)


def split_patches_np(img: npt.NDArray[np.uint8], step: int, drop_last: bool):
Expand All @@ -23,7 +23,7 @@ def split_patches_np(img: npt.NDArray[np.uint8], step: int, drop_last: bool):
patches.append(img[0:height, x : x + step])

# Fixup the last patch instead of dropping it because
# its width is smaller than 105. When cropping with PIL and
# its width is smaller than 105. When cropping and
# the patch is smaller than the needed area, it gets filled
# with black pixels. We should recolor them instead of discarding.
available_width = width % step
Expand Down Expand Up @@ -64,7 +64,11 @@ def apply(self, img, **params):

_, width = img.shape
if width <= 105:
return img
return np.append(
img,
np.full((105, 105 - width), 255, dtype="uint8"),
axis=1,
)

if not self.constrained_patches:
start_x = np.random.randint(0, width - 105)
Expand All @@ -89,9 +93,7 @@ def apply(self, img, **params):
height, width = img.shape
ratio = np.random.uniform(low=self.ratio_range[0], high=self.ratio_range[1])
new_width = round(width * ratio)
squeezed = Image.fromarray(img).resize(
(new_width, height), Image.Resampling.LANCZOS
)
squeezed = cv2.resize(img, (new_width, height), cv2.INTER_LANCZOS4)
return np.array(squeezed)

def get_transform_init_args_names(self):
Expand All @@ -113,9 +115,7 @@ def __init__(self, squeeze_ratio, always_apply=False, p=1.0) -> None:
def apply(self, img, **params):
height, width = img.shape
new_width = round(height * self.squeeze_ratio)
squeezed = Image.fromarray(img).resize(
(new_width, height), Image.Resampling.LANCZOS
)
squeezed = cv2.resize(img, (new_width, height), cv2.INTER_LANCZOS4)
return np.array(squeezed, dtype="uint8")

def get_transform_init_args_names(self):
Expand All @@ -132,7 +132,7 @@ def __init__(self, target_height: int, always_apply=False, p=1.0) -> None:
self.target_height = target_height

def apply(self, img, **params):
resized = resize_fixed_height(Image.fromarray(img), self.target_height)
resized = resize_fixed_height(img, self.target_height)
return np.array(resized)


Expand Down Expand Up @@ -198,10 +198,10 @@ def get_random_square_patch_augmentation() -> A.Compose:
)


def get_test_augmentations(r: float) -> A.Compose:
def get_test_augmentations(squeeze_ratio: float) -> A.Compose:
return A.Sequential(
[
ResizeHeight(target_height=105, always_apply=True),
Squeezing(squeeze_ratio=r, always_apply=True),
Squeezing(squeeze_ratio=squeeze_ratio, always_apply=True),
]
)
Loading

0 comments on commit 790760b

Please sign in to comment.