Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: User-friendly training while monitoring internal llm processes #248

Open
MedleMedler opened this issue Jan 31, 2025 · 9 comments
Open
Labels
someday/maybe No current plans to work on it, but might be something to do later.

Comments

@MedleMedler
Copy link

Hi Jason,

I think I've found a solution for the mixed results of saving (and reusing) a LLM (through the Save Model file node) as a way of extremely user-friendly training.

What if you also extract data (every generation) from the LLM from internal processes (for example, an attention card) and create a measure/monitor tool/node?

In the example of extracting an attention card (with Python), you can see which tokens the LLM has used and with which attention. If the LLM then hallucinates, it should be visible in the attention card.

Additionally, other data from the internal processes could be used to monitor whether the LLM starts overfitting (or exhibits other unwanted behavior). This can be automated by creating a feedback loop, allowing the LLM to self-correct if it starts overfitting.

You can also visualize this data (e.g., heat maps) for users, enabling them to see which input/method yields the most intelligence growth.

The possibilities are endless, but I am aware that there will be many technological hurdles.

I'd like to ask for your opinion on whether this is a realistic idea. Could the open-source community adopt and develop this idea? User-friendly training and development of AI models for everyone could democratize AI, offering numerous benefits and creating a rich ecosystem of various tools, functions, and nodes built on this idea.

What do you think, Jason? Is it worthwhile to start a project on GitHub to pitch this idea to the community?

Or does anyone else have an opinion on whether this idea is worthwhile to pitch?

Thank you in advance,

MedleMedler

@shhlife
Copy link
Collaborator

shhlife commented Jan 31, 2025

I think it'd be cool to see what's happening during training of the model - do you know of any that offer that visibility?

@MedleMedler
Copy link
Author

MedleMedler commented Jan 31, 2025

Cool!

Yes, I would also like to see what's happening.

No, but llm's said that extracting data is done more often when fine tuning. Deepseek-r1 said, that it isn't difficult and it wrote a python code to query for certain data. They also told me that there are numerous python libraries to extract and transform data to visualizations.

The question is more which data do you extract, from there on it is not so difficult.
With visualization you can lose a lot of data; then you have to know the bandwidth where it is visible that the model starts acting weird.

Comfyui has visualization nodes for sound, but I think it is easier to go for python, there are python libraries especially for visualizations, I believe.

Actually, because the open source models have not been smart/stable enough for me, I haven't tried yet to save the models hundreds of times. Max 150 - 200 times. And acting weird at certain moments, can also have total other reasons. I did'n notice any difference when I didn't save the model, now I think of it.
I can try saving the model many hundred times and see what happens. Did you notice any weirdness other than normal?
I wouldn't be surprised if "saving a llm as a training method", results in more gradually harmonious evolutions of structuring the tokens in its own transformers in a more natural way. Because this is how the llm did it itself.

@MedleMedler
Copy link
Author

MedleMedler commented Jan 31, 2025

Something else:

I made a workaround, for the chat problem. If you have longer conversations you have to build a whole structure with agents and that is undoable. Unfortunately, there is no chat node like a chat app with the whole conversation in one box.

Ollama has a limited number of input tokens (1024). You can change that in the command shell (2048/4096), but costs a lot of gpu memory. This means that the conversations can not be too long.
A solution would be that you use RAG (Rules? Something else?) as a temporary library to save the conversation temporary. Then you can chat with your llm all day long.

The workaround I now have is saving the whole conversation, uploading it again piping it in the agent, adding the last comments to the whole conversation and saving it again. Kind of making a conversation loop with a .txt file.

This is an important hurdle, because comfyui needs chat nodes (and a solution for too much gpu memory), where you can write, act and react in one chat node.
When this hurdle is taken, it can unlock comfyui for all kinds of advanced llm usage and training. Then you really can interact with the llm and develop your llm in the direction that you want.

@MedleMedler
Copy link
Author

MedleMedler commented Feb 1, 2025

Another thing:
At this moment the Model save node writes a new (other name). I think ideal would be a choice where you can overwrite the model name or not overwrite (and saving the model under another name). This way you can use these nodes for different "training by saving" methods.

There are several strategies you can use for training by saving:

  • Saving the model every generation;
  • Saving the model every time period (e.g. every 4 hours);
  • Saving the model only when it has become better.

@shhlife
Copy link
Collaborator

shhlife commented Feb 1, 2025

I'd love to build a chat node for comfyUI - It would be amazing to have something like that integrated!

@shhlife
Copy link
Collaborator

shhlife commented Feb 1, 2025

I'm not sure how it would work to overwrite the original model - because the ModelFile is based off an existing model - so it might break if you try and write over the existing one?

@MedleMedler
Copy link
Author

MedleMedler commented Feb 3, 2025

I am not sure either, because now it is impossible to overwrite the existing model.

Maybe better would be, to not overwrite; but to give the model an ascending model name (model01, model02, model03, etc.) when you save it. This together with the functionality that the new saved model will automatically be uploaded in the next generation. And also functionality (e.g.) that automatically deletes every old model/ or keeps every model/ or deletes all but keeps a model every hour/ every 10 times/ if the hard disk is full/ after a certain amount of hard disk space/etc..

Advantage of not overwriting, is giving insights which generation model (how many times saved while training) you have. I now do not think that "an improvement of saving the model" is needed and functionality for multiple options for deleting saved models is better.

@MedleMedler
Copy link
Author

MedleMedler commented Feb 5, 2025

Saving conversations (in the model) under a different name, would be handy. And in that way you could build (change) your model in the direction that you want.
If this functionality uses RAG, then this functionality will even be useful, if you decide only saving a (certain refined) chat conversation and leaving the model itself unchanged/intact. Could well be, that your RAG nodes already have this functionality.
Above also means a button that can switch between saving changed model (name) incremental (yes/no) and/or RAG conversation (yes/no) and/or deleting old models (by name, every 10 (other) times, after a certain amount of disk space).

Also Text field (save to) (yes/no) and (if yes) Text field (.txt file name), for saving to certain folder. In this way you can also save the important conversations (or all if you chose) externally. E.g. for other(better) models in the future.

You can integrate this functionality in the chat node (for more user-friendliness), or Create model file node, or Save Model file node (or all functionalities divided under above nodes and each individual functionality integrated in the node where it logically belongs)

This way you can create a lot of different configurations, each with its own practical functionality.

And this way you can also build architectures (set of different configurations), where you can easily switch between models (and or integrate them, using multiple (different) base models in a workflow), with each model its own specialties, while letting the models learn from each other (while saving each model after its own certain amount of progress(/times/amount of disk space, etc.) and or improved conversations and/or rules.

@MedleMedler
Copy link
Author

MedleMedler commented Feb 5, 2025

Examples for better understanding the possibilities:

With this functionality (and monitoring and steering functionality) you can even build complex architectures, where several models are monitoring (and adapting) their own internal processes (and/or each others processes).

You can also train llms to give advice which models can best be "how connected" with their(/each others) internal processes and which models can (yes or no) steer/change their own (and/or others) internal settings (e.g. weights), to change certain wanted/unwanted behavior.

The challenge will be, which internal processes to chose; that will make the behavior of the model visible in the most efficient way? This hurdle can be taken, by using the pattern recognition from the llm itself, to analyze internal processes and trial and error. You start with known data sets, Queries and visualizations from the fine tune community.

But more challenges, is also more rewards in this case. Because there are endless amounts of ways of extracting (which) data (in which way), because of the extreme complexity of llms. But this also has huge chances; because it will unlock endless ways to improve the model, getting new insights and doing new discoveries.

And you can start easy with known data sets that are used for fine tuning.
E.g. you let a llm write python code to extract data and let it create an attention card (you can do this with Any Nodes and maybe with multimodel llms like janus pro). This card has the tokens which the lmm has used with which amount of attention. After that, you can have a conversation with a lmm what you can measure with this data and how the llm would adapt the workflow and write code to steer the workflow and or models. You build the workflow and let the lmm monitor the internal processes, writing an analyses report with advice/ insight and discoveries how the lmm behaves.
After a certain progress/insights, you adapt the workflow and start measuring again to see if it is an improvement.
Trying while you go, will develop knowledge and with the level of intelligence llms now have, this knowledge will bring improvement as an universal knowing/principle.
Universal, because piping the attention card back in to the llm alone (and making the lmm aware from its own internal processes for the first time), will for sure give huge improvements.
This because of the capabilities llms now have to recognize very complex patterns in any data they analyze and can give advice to improve any and all processes accordingly.

The community could then create sophisticated architectures workflows and share them among each other.
Creating environments where you motivate users to share their knowledge can be done in different ways, depending on what fits best for Griptape.

A way would be, creating a new Github project/environment where you and users can discuss and post new architectures workflows. Especially architectures workflows that lead to better (and/or new) functionality.
This way the motivation for users is, that they can start with ready-to-use-architectures workflows for different purposes, with the possibility to adapt them theirs selves, while inviting them to share their knowledges/workflows so the community can create the best progress for everybody.

Most easy way would be a section with url links to several architectures workflows (for different purposes/training methods) in this project.

I say this, because as soon as you have functionality which makes these kinds of architectures possible, it will unlock endless improvements, solutions, applications, new functionality (like giving the lmm control to all settings of all custom node of comfyui, while training the lmm to become a Comfyui Master), additions and integrations with existing Comfyui workflows.

It is wise to think about this, which moment it would be useful to create which environment for users to co-create with griptape and each other, while creating the most optimal mutual benefits and profits to enhance to process for unlocking better models trained in extremely user-friendly and revolutionary new ways.

@shhlife shhlife added the someday/maybe No current plans to work on it, but might be something to do later. label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
someday/maybe No current plans to work on it, but might be something to do later.
Projects
None yet
Development

No branches or pull requests

2 participants