-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: User-friendly training while monitoring internal llm processes #248
Comments
I think it'd be cool to see what's happening during training of the model - do you know of any that offer that visibility? |
Cool! Yes, I would also like to see what's happening. No, but llm's said that extracting data is done more often when fine tuning. Deepseek-r1 said, that it isn't difficult and it wrote a python code to query for certain data. They also told me that there are numerous python libraries to extract and transform data to visualizations. The question is more which data do you extract, from there on it is not so difficult. Comfyui has visualization nodes for sound, but I think it is easier to go for python, there are python libraries especially for visualizations, I believe. Actually, because the open source models have not been smart/stable enough for me, I haven't tried yet to save the models hundreds of times. Max 150 - 200 times. And acting weird at certain moments, can also have total other reasons. I did'n notice any difference when I didn't save the model, now I think of it. |
Something else: I made a workaround, for the chat problem. If you have longer conversations you have to build a whole structure with agents and that is undoable. Unfortunately, there is no chat node like a chat app with the whole conversation in one box. Ollama has a limited number of input tokens (1024). You can change that in the command shell (2048/4096), but costs a lot of gpu memory. This means that the conversations can not be too long. The workaround I now have is saving the whole conversation, uploading it again piping it in the agent, adding the last comments to the whole conversation and saving it again. Kind of making a conversation loop with a .txt file. This is an important hurdle, because comfyui needs chat nodes (and a solution for too much gpu memory), where you can write, act and react in one chat node. |
Another thing: There are several strategies you can use for training by saving:
|
I'd love to build a chat node for comfyUI - It would be amazing to have something like that integrated! |
I'm not sure how it would work to overwrite the original model - because the ModelFile is based off an existing model - so it might break if you try and write over the existing one? |
I am not sure either, because now it is impossible to overwrite the existing model. Maybe better would be, to not overwrite; but to give the model an ascending model name (model01, model02, model03, etc.) when you save it. This together with the functionality that the new saved model will automatically be uploaded in the next generation. And also functionality (e.g.) that automatically deletes every old model/ or keeps every model/ or deletes all but keeps a model every hour/ every 10 times/ if the hard disk is full/ after a certain amount of hard disk space/etc.. Advantage of not overwriting, is giving insights which generation model (how many times saved while training) you have. I now do not think that "an improvement of saving the model" is needed and functionality for multiple options for deleting saved models is better. |
Saving conversations (in the model) under a different name, would be handy. And in that way you could build (change) your model in the direction that you want. Also Text field (save to) (yes/no) and (if yes) Text field (.txt file name), for saving to certain folder. In this way you can also save the important conversations (or all if you chose) externally. E.g. for other(better) models in the future. You can integrate this functionality in the chat node (for more user-friendliness), or Create model file node, or Save Model file node (or all functionalities divided under above nodes and each individual functionality integrated in the node where it logically belongs) This way you can create a lot of different configurations, each with its own practical functionality. And this way you can also build architectures (set of different configurations), where you can easily switch between models (and or integrate them, using multiple (different) base models in a workflow), with each model its own specialties, while letting the models learn from each other (while saving each model after its own certain amount of progress(/times/amount of disk space, etc.) and or improved conversations and/or rules. |
Examples for better understanding the possibilities: With this functionality (and monitoring and steering functionality) you can even build complex architectures, where several models are monitoring (and adapting) their own internal processes (and/or each others processes). You can also train llms to give advice which models can best be "how connected" with their(/each others) internal processes and which models can (yes or no) steer/change their own (and/or others) internal settings (e.g. weights), to change certain wanted/unwanted behavior. The challenge will be, which internal processes to chose; that will make the behavior of the model visible in the most efficient way? This hurdle can be taken, by using the pattern recognition from the llm itself, to analyze internal processes and trial and error. You start with known data sets, Queries and visualizations from the fine tune community. But more challenges, is also more rewards in this case. Because there are endless amounts of ways of extracting (which) data (in which way), because of the extreme complexity of llms. But this also has huge chances; because it will unlock endless ways to improve the model, getting new insights and doing new discoveries. And you can start easy with known data sets that are used for fine tuning. The community could then create sophisticated architectures workflows and share them among each other. A way would be, creating a new Github project/environment where you and users can discuss and post new architectures workflows. Especially architectures workflows that lead to better (and/or new) functionality. Most easy way would be a section with url links to several architectures workflows (for different purposes/training methods) in this project. I say this, because as soon as you have functionality which makes these kinds of architectures possible, it will unlock endless improvements, solutions, applications, new functionality (like giving the lmm control to all settings of all custom node of comfyui, while training the lmm to become a Comfyui Master), additions and integrations with existing Comfyui workflows. It is wise to think about this, which moment it would be useful to create which environment for users to co-create with griptape and each other, while creating the most optimal mutual benefits and profits to enhance to process for unlocking better models trained in extremely user-friendly and revolutionary new ways. |
Hi Jason,
I think I've found a solution for the mixed results of saving (and reusing) a LLM (through the Save Model file node) as a way of extremely user-friendly training.
What if you also extract data (every generation) from the LLM from internal processes (for example, an attention card) and create a measure/monitor tool/node?
In the example of extracting an attention card (with Python), you can see which tokens the LLM has used and with which attention. If the LLM then hallucinates, it should be visible in the attention card.
Additionally, other data from the internal processes could be used to monitor whether the LLM starts overfitting (or exhibits other unwanted behavior). This can be automated by creating a feedback loop, allowing the LLM to self-correct if it starts overfitting.
You can also visualize this data (e.g., heat maps) for users, enabling them to see which input/method yields the most intelligence growth.
The possibilities are endless, but I am aware that there will be many technological hurdles.
I'd like to ask for your opinion on whether this is a realistic idea. Could the open-source community adopt and develop this idea? User-friendly training and development of AI models for everyone could democratize AI, offering numerous benefits and creating a rich ecosystem of various tools, functions, and nodes built on this idea.
What do you think, Jason? Is it worthwhile to start a project on GitHub to pitch this idea to the community?
Or does anyone else have an opinion on whether this idea is worthwhile to pitch?
Thank you in advance,
MedleMedler
The text was updated successfully, but these errors were encountered: