loading huggingface model? #2737

xieu90 · 2023-08-05T06:44:47Z

xieu90
Aug 5, 2023

Hi DJl Community,
I'm trying to do the speech to text stuff. I saw that using djl one can load huggingface model which use pretrained wav2vec.
from example of speech recognisation i saw that this model was use:
Link1: https://resources.djl.ai/test-models/pytorch/wav2vec2.zip

String url = "https://resources.djl.ai/test-models/pytorch/wav2vec2.zip";
Criteria<Audio, String> criteria = Criteria.builder()
        .setTypes(Audio.class, String.class)
        .optModelUrls(url)
        .optTranslatorFactory(new SpeechRecognitionTranslatorFactory())
        .optModelName("wav2vec2.ptl")
        .optEngine("PyTorch")
        .build();
Audio input = AudioFactory.newInstance().fromFile(Path.of("/home/ace/Downloads/speech.wav"));
try (ZooModel<Audio, String> model = criteria.loadModel();
        Predictor<Audio, String> predictor = model.newPredictor()) {
    String x = predictor.predict(input);
    System.out.println(x);
} catch (TranslateException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}


```now i want to swap the model/url with this one (Link 2) https://huggingface.co/nguyenvulebinh/wav2vec2-large-vi
of cource that link doesnt have any zip file and it failed.
my question would be: how was that wav2vec2.zip was created? can i do the same with that model in the link 2?
I saw that model was download some times in last month, but after wandering around i found nothing to download either, not sure how people use hugging face (I'm noob there >.<) 


I also saw this 
.optModelUrls("djl://ai.djl.huggingface.pytorch/
somewhere, i tried it with second link but didnt work either, some error about not finding model if i remember well.

so any hint/direction to load model directly or convert then load would greatly appreciated.


here is my dependency in pom (there are a lot cause i tried this and that a bit)

	<!-- <dependency>
		<groupId>ai.djl</groupId>
		<artifactId>model-zoo</artifactId>
		<version>0.23.0</version>
	</dependency> -->
	<dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-model-zoo</artifactId>
		<version>0.23.0</version>
	</dependency>
	<dependency>
		<groupId>ai.djl.huggingface</groupId>
		<artifactId>tokenizers</artifactId>
		<version>0.23.0</version>
	</dependency>
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>api</artifactId>
		<version>0.23.0</version>
	</dependency>

	<!-- DJL API -->
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>api</artifactId>
		<version>0.23.0</version>
	</dependency>

	<!-- DJL PyTorch engine -->
	<dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-engine</artifactId>
		<version>0.23.0</version>
	</dependency>

	<!-- DJL PyTorch native library -->
	<!-- <dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-native-auto</artifactId>
		<version>1.9.0</version>
	</dependency> -->

	<!-- DJL PyTorch model zoo -->
	<dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-model-zoo</artifactId>
		<version>0.23.0</version>
	</dependency>

frankfliu · 2023-08-05T17:45:30Z

frankfliu
Aug 5, 2023

The models on huggingface hub are not TorchScript model. You need trace the model into TorchScript or convert it to onnx before it can be loaded with DJL. See: https://docs.djl.ai/master/docs/pytorch/how_to_convert_your_model_to_torchscript.html

0 replies

frankfliu · 2023-08-05T17:47:46Z

frankfliu
Aug 5, 2023

djl://ai.djl.huggingface.pytorch/ is a model zoo that we manually traced the model from Huggingface into torchscript model. It only covers a limited set of models.

0 replies

xieu90 · 2023-08-06T06:01:23Z

xieu90
Aug 6, 2023
Author

thank you, i am trying the tracing tutorial now
and modified it a bit hoping it will pass to the model i am trying.


import torch
from transformers import AutoProcessor, AutoModelForPreTraining  ###modified import those thing instead of torchvision 

# An instance of your model.
processor = AutoProcessor.from_pretrained("nguyenvulebinh/wav2vec2-large-vi")###copy pasted from model
model = AutoModelForPreTraining.from_pretrained("nguyenvulebinh/wav2vec2-large-vi")### copy pasted from model

# Switch the model to eval model
model.eval()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)  ### not sure if I need to edit this line, i think yes, but what would it be, some example audio file? leaving it like that got error 
(1, 3, 224, 224) --> RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 3, 224, 224]
(1, 3, 224) --> RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 3, 224]
(1, 224) --> RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)

# Save the TorchScript model
traced_script_module.save("traced_resnet_model.pt")

could you please give me any hint for line: example = torch.rand(1, 3, 224, 224) ?

0 replies

frankfliu · 2023-08-06T06:08:37Z

frankfliu
Aug 6, 2023

The example we provide is specific for image classification model. We use torch.rand(1, 3, 224, 224) as a fake input for a 224x224 RGB image (3 channels).

You need use an input that matches your model.

Here is another example for NLP models: https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/src/main/python/huggingface_converter.py#L87

0 replies

xieu90 · 2023-08-07T06:04:25Z

xieu90
Aug 7, 2023
Author

I have followed different urls in net
and i have this version at this moment, I think its the script one and no more trace version.
the result of audio to text is correct, but i have no idea now what i should put into
torch.jit.trace_module

using the trace version i managed to generate a pt file with example=input_values, but i think it was badly generated using return_dict = false at model, or strict=False at trace --> logits isnt there, then some ndarray or list was null when using in djl.

i also managed to generate some onnx model, but it wasn't usable somehow (also same null ndarray i think)


import torch
from transformers import AutoProcessor, AutoModelForPreTraining
import soundfile as sf
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
# input audio
path="/home/ace/pythonProject/de/own/untitled.wav"

# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("nguyenvulebinh/wav2vec2-base-vietnamese-250h")
model = Wav2Vec2ForCTC.from_pretrained("nguyenvulebinh/wav2vec2-base-vietnamese-250h")#, torchscript=True, return_dict=False


# code copied at usage on model's site
# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch

# load dummy dataset and read soundfiles
ds = map_to_array({
    "file": '/home/ace/pythonProject/de/own/untitled.wav'
})

# tokenize
input_values = processor(ds["speech"], return_tensors="pt", padding="longest").input_values  # Batch size 1

# # retrieve logits
logits = model(input_values).logits

# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

print(transcription) #### correct result audio -> text 
# print(predicted_ids)
# print(logits)
# print(input_values)
# print(ds)


# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
converted = torch.jit.trace_module(model, {'main_input_name':input_values})###, strict=False, main_input_name found in Wav2Vec2ForCTC

torch.jit.save(converted, "tracedModule.pt")

0 replies

xieu90 · 2023-08-07T17:08:09Z

xieu90
Aug 7, 2023
Author

I found a tool to generate onnx model from huggingface
https://huggingface.co/docs/transformers/serialization

optimum-cli export onnx --model nguyenvulebinh/wav2vec2-base-vietnamese-250h onnxOptimum/
Framework not specified. Using pt to export to ONNX.
/home/ace/.local/lib/python3.10/site-packages/transformers/configuration_utils.py:380: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
  warnings.warn(
Automatic task detection to automatic-speech-recognition (possible synonyms are: audio-ctc, speech2seq-lm).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using framework PyTorch: 2.0.1+cu117
/home/ace/.local/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:595: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/ace/.local/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:634: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Post-processing the exported models...
Validating ONNX model onnxOptimum/model.onnx...
        -[✓] ONNX model output names match reference model (logits)
        - Validating ONNX Model output "logits":
                -[✓] (2, 49, 110) matches (2, 49, 110)
                **-[x] values not close enough, max diff: 4.9054622650146484e-05 (atol: 1e-05)**
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 4.9054622650146484e-05.
 The exported model was saved at: onnxOptimum

so export worked and created onnxOptimum folder, in there was the onnx model, also vocab.json, which are the vietnamese alphabet. i checked with predicted_ids , replace predicted ids with number in vocab.json gave me the correct sentence from audio file.

from above console i saw this: -[x] values not close enough, max diff: 4.9054622650146484e-05 (atol: 1e-05)
not sure if the export really 100% success.

so i zipped the file and load it using djl

String modelUrl = "/home/ace/javaProjects/djl/src/main/resources/onnxOptimum.zip";
Criteria<Audio, String> criteria = Criteria.builder()
        .setTypes(Audio.class, String.class)
        .optModelUrls(modelUrl)
        .optTranslator(new MyTranslator())
        // .optTranslatorFactory(new SpeechRecognitionTranslatorFactory())
        // .optModelName("wav2vec2_large_vi.onnx")
        .optModelName("model.onnx")
        .optEngine("OnnxRuntime") // use OnnxRuntime engine by default
        .build();

Audio input = AudioFactory.newInstance()
        .fromFile(Path.of("/home/ace/javaProjects/djl/src/main/resources/untitled.wav"));
try (ZooModel<Audio, String> model = criteria.loadModel();
        Predictor<Audio, String> predictor = model.newPredictor()) {
    String x = predictor.predict(input);
    System.out.println(x);
} catch (TranslateException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

SpeechRecognitionTranslatorFactory will create SpeechRecognitionTranslator which extends NoBatchifyTranslator
i cloned the whole class to MyTranslator now and trying to fix error .ClassCastException: class [[[F cannot be cast to class [Ljava.lang.String; ([[[F and [Ljava.lang.String; are in module java.base of loader 'bootstrap')" occured on

processOutput(TranslatorContext ctx, NDList list) throws Exception {
return list.get(0).toStringArray()[0];

in python:
logits shape (1,310,110)
predicted_ids shape (1,310)

in djl processOutput the NDList has tensor something during debug, and i saw the shape there is similar to logits shape in python. so i guess here instead of return list.get(0).toStringArray()[0]
it should be somehow called predicted_ids = torch.argmax(logits, dim=-1) , but i have no idea how to call it or is it even possible in djl?
NDArray array0 = list.get(0);
long argMax0 = array0.argMax().getLong(); // only one big number which isnt even in vocab.json.

any idea?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loading huggingface model? #2737

{{title}}

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

loading huggingface model? #2737

xieu90 Aug 5, 2023

Replies: 6 comments

frankfliu Aug 5, 2023

frankfliu Aug 5, 2023

xieu90 Aug 6, 2023 Author

frankfliu Aug 6, 2023

xieu90 Aug 7, 2023 Author

xieu90 Aug 7, 2023 Author

xieu90
Aug 5, 2023

frankfliu
Aug 5, 2023

frankfliu
Aug 5, 2023

xieu90
Aug 6, 2023
Author

frankfliu
Aug 6, 2023

xieu90
Aug 7, 2023
Author

xieu90
Aug 7, 2023
Author