No, Ollama is running on an old PC with a GeForce 1060 and 16gig of ram…
Yes, it’s a “webserver” running in the background exposing an API.
However, if I “top” my system, without chatting, it sits at 0% usage; it’s only when asking that the system peeks at around 55-70% CPU.
You have to understand there is 2 things here: the server and the model. The server is always running, but requires next to nothing in terms of resources.
The model is what computing your questions, this is the heavy part. It’s started on use, then after a delay, it’s closing.
TL;DR
To answer your real question, you could use Ollama on the same system that you are using.
No, Ollama is running on an old PC with a GeForce 1060 and 16gig of ram…
Yes, it’s a “webserver” running in the background exposing an API.
However, if I “top” my system, without chatting, it sits at 0% usage; it’s only when asking that the system peeks at around 55-70% CPU.
You have to understand there is 2 things here: the server and the model. The server is always running, but requires next to nothing in terms of resources.
The model is what computing your questions, this is the heavy part. It’s started on use, then after a delay, it’s closing.
TL;DR To answer your real question, you could use Ollama on the same system that you are using.