Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install ollama models tinyllama and phi3:mini in dev hub for experimentation #5824

Merged
merged 6 commits into from
Jul 3, 2024

Conversation

balajialg
Copy link
Contributor

@balajialg balajialg commented Jun 28, 2024

After a lengthy conversation with Greg and Eric in ucb-datahub-staff channel about ollama, I thought it would be nice to install couple of small ollama models in dev hub. Success criteria would be to run the ollama notebook in https://github.com/pamelafox/ollama-python-playground/tree/main in Dev hub as a proof of concept. I am also curious about the memory/cpu requirement to run these models in one of our hubs.

I don't know whether the image build will succeed as the local build stalled after 10 minutes. I can revert this if you all think there are better options to install ollama. Thanks

Ref:
https://uctech.slack.com/archives/C04NEF48SCR/p1719593690539859
https://ollama.com/library/tinyllama
https://ollama.com/library/phi3

@balajialg balajialg changed the title Installing tinyllama and phi3:mini in dev hub for experimentation Installing ollama models tinyllama and phi3:mini in dev hub for experimentation Jun 28, 2024
@gmerritt
Copy link
Contributor

gmerritt commented Jun 28, 2024

I’m sure we will want the ollama binary and the supported models to only exist once on a hub’s file system. We don’t want 3-4 GB of redundant files in every user’s homedir.

When we want ollama to be used all within a user’s pod and accessed via openai libraries from a notebook, it must be run in that pod in server mode. This may be done with just running the binary in server mode. Otherwise it would need to be installed as a linux system service in the user’s pod.

@gmerritt
Copy link
Contributor

I don’t understand the got clone. Standard install is to download the binary, or download & run the little installer script that does that plus makes it a system service.

Don’t “ollama run” as that will start the simple conversation app. Do ollama pull to get models.

But models should be stored centrally; istr you can just tell ollama where to locally find the models, or we could symlink.

@balajialg
Copy link
Contributor Author

@gmerritt Thank you! I updated the installation to use a Docker image I found for Ollama. Currently, the Ollama models are only installed in the dev hub, so this setup is specifically for our team's experimentation +(possibly demo) rather than for all users on the datahub.

I would like to have the app run by default so users can directly access the chat service from Jupyter notebooks without needing to run any commands in terminals.

Copy link
Contributor

@shaneknapp shaneknapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ryanlovett
Copy link
Collaborator

I don't believe running docker in our container will work in this manner. This specifies executing docker run within a container build whereas I think you want to actually run containers along side the user pod. You can specify additional containers with c.KubeSpawner.singleuser_extra_containers. I set up something like this in gradebook hub where we run another app in the user pod.

You'll probably need to specify what mounts those containers will need access to, what ports they should listen on, etc. I'm not sure what client programs are going to connect to those containers. If it's just whatever is in the existing singleuser environment then that's fine.

@gmerritt
Copy link
Contributor

gmerritt commented Jul 2, 2024

If I were using a datahub ollama instance, I might prefer to access a shared resource on a 5x bigger model running on bigger hardware, shared by 10 or 15 students (or some math kind of like that), rather than each & every person having to bother to run their own super-tiny model...I recognize that it becomes a new & different scaling problem, of course, but note that an ollama server caches nothing in terms of input/output data, so it is a super-scalable kind of resource, imho!

@balajialg
Copy link
Contributor Author

balajialg commented Jul 2, 2024

Thanks @ryanlovett! I will check the gradebook example and update the commit.

@gmerritt Fair point - I would love to have such a set up as described by you in one of our hubs. I just thought we can take baby steps running a smaller model on a single user server in Dev hub to start with and scale to a bigger model in a shared setting post user demand, infra admin bandwidth, better policies around model deployment etc..

@balajialg
Copy link
Contributor Author

balajialg commented Jul 2, 2024

@ryanlovett Added ollama model images (phi3 and tinyllama) as part of the extra container stanza running on ports 5000 and 5001 respectively. Couldn't find any other service running in the same port from my limited due diligence.

@ryanlovett
Copy link
Collaborator

That looks like the correct syntax.

I tried to find the phi3 and tinyllama containers but they don't exist, e.g. try docker pull phi3:latest or docker pull tinyllama:latest on your own machine. I think the image specs need to be fixed? I looked this up in order to try to understand how they expect to be communicated with. I would also suggest not using latest tag to ensure reproducibility.

@ericvd-ucb
Copy link
Contributor

From Fox in my email


> "Just chatting with my team member about this.
> 
> I wonder if you could try adding ollama to your Docker image using a similar approach as:
> https://github.com/prulloac/devcontainer-features/blob/main/src/ollama/install.sh
> 
> And then seeing if you can do !ollama run ?
> 
> You could then make a wrapper that use subprocess to call that. Or you could try seeing if the actual local server works, that seems a bit dubious. 

@balajialg
Copy link
Contributor Author

Thanks all for your inputs. My latest commit adds a postbuild file which downloads ollama binary and runs the phi3 and tinyllama models (in Dev hub's default image)

@gmerritt
Copy link
Contributor

gmerritt commented Jul 3, 2024

From Fox in my email


> "Just chatting with my team member about this.
> 
> I wonder if you could try adding ollama to your Docker image using a similar approach as:
> https://github.com/prulloac/devcontainer-features/blob/main/src/ollama/install.sh
> 
> And then seeing if you can do !ollama run ?
> 
> You could then make a wrapper that use subprocess to call that. Or you could try seeing if the actual local server works, that seems a bit dubious. 

That linked github script is just a wrapper around the standard click-through .sh installer for linux as shown here: https://ollama.com/download/linux

@gmerritt
Copy link
Contributor

gmerritt commented Jul 3, 2024

From Fox in my email


> "Just chatting with my team member about this.
> 
> I wonder if you could try adding ollama to your Docker image using a similar approach as:
> https://github.com/prulloac/devcontainer-features/blob/main/src/ollama/install.sh
> 
> And then seeing if you can do !ollama run ?
> 
> You could then make a wrapper that use subprocess to call that. Or you could try seeing if the actual local server works, that seems a bit dubious. 

...and the local server mode of ollama does work just fine on datahub...

@balajialg balajialg changed the title Installing ollama models tinyllama and phi3:mini in dev hub for experimentation Install ollama models tinyllama and phi3:mini in dev hub for experimentation Jul 3, 2024
@balajialg
Copy link
Contributor Author

Thanks folks! I will merge the postbuild script for now and do some testing in the dev-staging hub

@balajialg balajialg merged commit 9e4e3f1 into berkeley-dsep-infra:staging Jul 3, 2024
22 checks passed
@balajialg balajialg deleted the ollama branch July 3, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants