You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to adapt the JoyTag model from this repository (https://github.com/fpgaminer/joytag) for use with onnxruntime-node. However, I'm encountering an issue where the predicted tags from the model do not match the expected results for given images.
Expected Tags
For a football match image, the expected tags based on the model should closely align with terms like short_hair, brown_hair, black_hair, photoshop_(medium), standing, male_focus, multiple_boys, shorts, dark_skin, blurry, tattoo, facial_hair, grass, dark-skinned_male, beard, ball, 6+boys, third-party_edit, 4boys, sportswear, emblem, mustache, motion_blur, photo_(medium), logo, 5boys, bald, 3d, real_life, soccer_uniform, crowd, soccer_ball, etc., encompassing various attributes and actions observable in the image.
Actual Tags
The tags received are unrelated and nonsensical, not matching the expected context of the image.
Code Example
Below is the main code where I handle image preprocessing and make predictions using the ONNX model:
importortfrom'onnxruntime-node';importfsfrom'fs';importJimpfrom'Jimp';constMODEL_PATH='../joytag/model.onnx';constIMAGE_PATH='https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg';constTHRESHOLD=0.4;consttopTags=fs.readFileSync(`../joytag/top_tags.txt`,'utf8').split('\n').filter(line=>line.trim()).map(line=>line.trim());asyncfunctionprepareImage(imagePath,targetSize){constimage=awaitJimp.read(imagePath);const{ width, height }=image.bitmap;constmaxDim=Math.max(width,height);constpadLeft=Math.floor((maxDim-width)/2);constpadTop=Math.floor((maxDim-height)/2);letpaddedImage=newJimp(maxDim,maxDim,0xFFFFFFFF);paddedImage.composite(image,padLeft,padTop);if(maxDim!==targetSize){paddedImage=awaitpaddedImage.resize(targetSize,targetSize,Jimp.RESIZE_BICUBIC);}constimageTensor=newFloat32Array(3*targetSize*targetSize);awaitpaddedImage.scan(0,0,targetSize,targetSize,function(x,y,idx){constpos=(y*targetSize+x)*3;imageTensor[pos]=this.bitmap.data[idx+0]/255.0;// RimageTensor[pos+1]=this.bitmap.data[idx+1]/255.0;// GimageTensor[pos+2]=this.bitmap.data[idx+2]/255.0;// B});constmean=[0.485,0.456,0.406];conststd=[0.229,0.224,0.225];for(leti=0;i<imageTensor.length;i+=3){imageTensor[i]=(imageTensor[i]-mean[0])/std[0];// RimageTensor[i+1]=(imageTensor[i+1]-mean[1])/std[1];// GimageTensor[i+2]=(imageTensor[i+2]-mean[2])/std[2];// B}consttensor=newort.Tensor('float32',imageTensor,[1,3,targetSize,targetSize]);returntensor;}asyncfunctionpredict(imagePath){constimageTensor=awaitprepareImage(imagePath,448);console.log(imageTensor)constsession=awaitort.InferenceSession.create(MODEL_PATH);constfeeds={input: imageTensor};constresults=awaitsession.run(feeds);console.log(results)constoutput=results.output.data;// Helper function to apply the sigmoid functionfunctionsigmoid(x){return1/(1+Math.exp(-x));}// Apply sigmoid to each output scoreconstprobabilities=output.map(sigmoid);constscores={};probabilities.forEach((prob,i)=>{scores[topTags[i]]=prob;});constpredictedTags=Object.keys(scores).filter(tag=>scores[tag]>THRESHOLD);consttagString=predictedTags.join(', ');return{ tagString, scores, predictedTags };}asyncfunctionmain(){const{ tagString, scores, predictedTags }=awaitpredict(IMAGE_PATH);console.log({predictedTags})}main();
Request for Assistance:
Any guidance or suggestions on why the tag predictions might be diverging so significantly would be greatly appreciated.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Issue Description
I am currently trying to adapt the JoyTag model from this repository (https://github.com/fpgaminer/joytag) for use with
onnxruntime-node
. However, I'm encountering an issue where the predicted tags from the model do not match the expected results for given images.Expected Tags
For a football match image, the expected tags based on the model should closely align with terms like
short_hair, brown_hair, black_hair, photoshop_(medium), standing, male_focus, multiple_boys, shorts, dark_skin, blurry, tattoo, facial_hair, grass, dark-skinned_male, beard, ball, 6+boys, third-party_edit, 4boys, sportswear, emblem, mustache, motion_blur, photo_(medium), logo, 5boys, bald, 3d, real_life, soccer_uniform, crowd, soccer_ball,
etc., encompassing various attributes and actions observable in the image.Actual Tags
The tags received are unrelated and nonsensical, not matching the expected context of the image.
Code Example
Below is the main code where I handle image preprocessing and make predictions using the ONNX model:
Request for Assistance:
Any guidance or suggestions on why the tag predictions might be diverging so significantly would be greatly appreciated.
Environment
Beta Was this translation helpful? Give feedback.
All reactions