Self hosted LLM

HumanPerson@sh.itjust.works · 9 months ago

Self hosted LLM

Scrubbles@poptalk.scrubbles.tech · 9 months ago

text-generation-webui is kind of the standard from what I’ve seen to run it with a webui, but the vram stuff here is accurate. Text LLMs require an insane amount of vram to keep a conversation going.

Morethanevil@lemmy.fedifriends.social · 9 months ago

There is an easy way with OpenWebUI but LLM are mostly accelerated by CUDA or ROCm. CPU acceleration is slow, but you can try it

passepartout@feddit.de · 9 months ago

I tried Huggingface TGI yesterday, but all of the reasonable models need at least 16 gigs of vram. The only model i got working (on a desktop machine with a amd 6700xt gpu) was microsoft phi-2.