Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code bug while training parseq in paddleOCR #150

Open
SleepEarlyLiveLong opened this issue Oct 8, 2024 · 0 comments
Open

code bug while training parseq in paddleOCR #150

SleepEarlyLiveLong opened this issue Oct 8, 2024 · 0 comments

Comments

@SleepEarlyLiveLong
Copy link

SleepEarlyLiveLong commented Oct 8, 2024

Hi, thank you for your outstanding work! I encountered an error while trying to reproduce your work based on the PaddleOCR framework. It seems to be a bug in the code. Please take a look at the specific information below:

Here is the config file:
`
Global:
use_gpu: True
epoch_num: 100
log_smooth_window: 20
print_batch_step: 5
save_model_dir: ./output/rec/parseq_cty_v1
save_epoch_step: 3
eval_batch_step: [0, 500]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
character_dict_path: ppocr/utils/dict/parseq_dict_mixlang.txt
character_type: ch
max_text_length: 35 # 35
num_heads: 8
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_parseq.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: OneCycle
max_lr: 0.0007

Architecture:
model_type: rec
algorithm: ParseQ
in_channels: 3
Transform:
Backbone:
name: ViTParseQ
img_size: [32, 128]
patch_size: [4, 8]
embed_dim: 384
depth: 12
num_heads: 6
mlp_ratio: 4
in_channels: 3
Head:
name: ParseQHead
# Architecture
max_text_length: 35
embed_dim: 384
dec_num_heads: 12
dec_mlp_ratio: 4
dec_depth: 1
# Training
perm_num: 6
perm_forward: true
perm_mirrored: true
dropout: 0.1
# Decoding mode (test)
decode_ar: true
refine_iters: 1

Loss:
name: ParseQLoss

PostProcess:
name: ParseQLabelDecode

Metric:
name: RecMetric
main_indicator: acc
is_filter: True

Train:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/train/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQRecAug:
aug_type: 0 # or 1
- ParseQLabelEncode:
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 192
drop_last: True
num_workers: 4

Eval:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 384
num_workers: 4

`

Here is the what 'data_dir' looks like, each folder includes two file: 'data.mdb' and 'lock.mdb', which are generated by 'python tools/create_lmdb_dataset.py /path/to/img/root /path/to/gt /path/to/save/lmdb':
image

Based on infos above, I run order "python3 tools/train.py -c configs/rec/rec_vit_parseq_cty_v1.yml" and encountered a bug at 'ppocr/modeling/heads/rec_parseq_head.py' Line 498:
image

where targets[0]:
image

targets[1]:
image

And:
image

Is there something wrong with the code? How to solve the problem? Thank you a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant