You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My configuration is as follows:
batch_size=384, epoch=20, val_check_interval=500, gpu:3090,others are the default configuration,
charset_train=62_mixed-case
charset_test = string.digits + string.ascii_lowercase + string.ascii_uppercase
I have a few questions to ask you
My dataset format is strictly following your format. My data are all characters, only numbers, uppercase and lowercase English. My dataset split ratio is 8:1:1. details as follows
In your paper, I see that real data and val data sets are not divided under the same data set. Can I understand that the data set under data/val is only used for verification and does not participate in training? According to my guess, I placed my divided data set, that is, the training set, in the real directory, the divided test set in the test directory, and different data sets in the data/val directory. For example, I trained the D001 dataset and placed D004 under data/val. The final test is D001 and D004. The accuracy of D001 is high, but the accuracy of D004 is very low. I don't quite understand the role of the two vals in the data directory, can you explain it, thank you!
Another question is, can I use all your data sets plus my own data set for training, using charset_train=62_mixed-case?
But what I am worried about is that in the demo of hugging face, I used your pre-trained weights to predict my pictures and recognized punctuation marks, but there are no punctuation marks in my data set.
What should I do about it?
3. Does the charset used in the test have to be 32_lowercase?
The text was updated successfully, but these errors were encountered:
My configuration is as follows:
batch_size=384, epoch=20, val_check_interval=500, gpu:3090,others are the default configuration,
charset_train=62_mixed-case
charset_test = string.digits + string.ascii_lowercase + string.ascii_uppercase
I have a few questions to ask you
In your paper, I see that real data and val data sets are not divided under the same data set. Can I understand that the data set under data/val is only used for verification and does not participate in training? According to my guess, I placed my divided data set, that is, the training set, in the real directory, the divided test set in the test directory, and different data sets in the data/val directory. For example, I trained the D001 dataset and placed D004 under data/val. The final test is D001 and D004. The accuracy of D001 is high, but the accuracy of D004 is very low. I don't quite understand the role of the two vals in the data directory, can you explain it, thank you!
But what I am worried about is that in the demo of hugging face, I used your pre-trained weights to predict my pictures and recognized punctuation marks, but there are no punctuation marks in my data set.
What should I do about it?
3. Does the charset used in the test have to be 32_lowercase?
The text was updated successfully, but these errors were encountered: