I used to think they were bots. I still do, but I used to, too.
I used to think they were bots. I still do, but I used to, too.
Similar to previous reply about MATE with font size changes, I do that with plasma. I hadn’t seen plasma big screen you linked, I’ll definitely try that one out. I’ve wondered about https://en.m.wikipedia.org/wiki/Plasma_Mobile? Like these sort of niche projects don’t always get a lot of attention, if the bigscreen project doesn’t work out, I’d bet the plasma mobile project is fairly active and given the way it scales for displays might work really well on a tv
Speaking of scaling since you mentioned it. I have noticed scaling in general feels a lot better in Wayland. If you’d only tried it in X11 before, might want to see if Wayland works better for you.
First a caveat/warning - you’ll need a beefy GPU to run larger models, there are some smaller models that perform pretty well.
Adding a medium amount of extra information for you or anyone else that might want to get into running models locally
If you look at https://ollama.com/library?sort=featured you can see models
Model size is measured by parameter count. Generally higher parameter models are better (more “smart”, more accurate) but it’s very challenging/slow to run anything over 25b parameters on consumer GPUs. I tend to find 8-13b parameter models are a sort of sweet spot, the 1-4b parameter models are meant more for really low power devices, they’ll give you OK results for simple requests and summarizing, but they’re not going to wow you.
If you look at the ‘tags’ for the models listed below, you’ll see things like 8b-instruct-q8_0
or 8b-instruct-q4_0
. The q part refers to quantization, or shrinking/compressing a model and the number after that is roughly how aggressively it was compressed. Note the size of each tag and how the size reduces as the quantization gets more aggressive (smaller numbers). You can roughly think of this size number as “how much video ram do I need to run this model”. For me, I try to aim for q8 models, fp16 if they can run in my GPU. I wouldn’t try to use anything below q4 quantization, there seems to be a lot of quality loss below q4. Models can run partially or even fully on a CPU but that’s much slower. Ollama doesn’t yet support these new NPUs found in new laptops/processors, but work is happening there.
It’s a good thing that real open source models are getting good enough to compete with or exceed OpenAI.
I’ll preface by saying I think LLMs are useful and in the next couple years there will be some interesting new uses and existing ones getting streamlined…
But they’re just next word predictors. The best you could say about intelligence is that they have an impressive ability to encode knowledge in a pretty efficient way (the storage density, not the execution of the LLM), but there’s no logic or reasoning in their execution or interaction with them. It’s one of the reasons they’re so terrible at math.
I like the game, but agree with the over-tutorialed complaints. They have two difficulty modes, I wish only story mode got all the handholding. I think there’s enough obvious indicators to get you through all the game mechanics.
Coming from c# then typescript and nextjs, rye feels very intuitive and like a nice bridge / gateway drug into python.
Lan-mouse looks great but keep in mind that there’s no network encryption right now. There is a GitHub ticket open and the developer seems eager to add encryption. It’s just worth understanding that all your keystrokes are going across the network unencrypted.
Shoot your shot, player.
Don’t go crazy or over the top, don’t overdo it, but just say it. If they’re a good friend they won’t be scared away. If they’re like you that way you’ll both be happier.
Don’t overthink it, ask them if they’d ever like to hang out or do something more like a date.
Ballsy, direct, badass. That can be you.
Dating is awkward but life gets a lot better once you get more comfortable with it. Everyone is a dating idiot until they’re not, there’s a good chance your friend is still in the idiot stage and maybe hell be over the moon that you helped push through it.
More than distro hopping maybe try out a zen kernel or compiling kernel yourself and changing kernel config and scheduler, or a newer version of the stock kernel?
I’m not super current on what’s in each kernel but I’d expect latest mainline to handle newer processors better than some of the older stable kernels in some of the more mainstream slower releasing distros.
Ran Asahi for several months, tried it out again recently. It’s good/fine, I just don’t love fedora.
There’s some funkiness with the more complicated install, the AI acceleration doesn’t work, no thunderbolt / docking station.
MacBooks are great hardware but I don’t think they’re the best option for Linux right now. If you’re never going to boot into macOS then I’d look for x13, new Qualcomm, isn’t there a framework arm64 option now or was that a RISC module?
I’m also assuming you’re not looking to do any gaming? Because gaming on ARM is not really a thing right now and doesn’t feel like it will be for a long while.
It’s not a cinematic masterpiece but it had a distinctive look and vibe with a cool soundtrack, interestingly strange plot. I saw it again a few years ago and remembered why I liked it as an angsty teen.
Really love arch and the AUR. I’ve been tempted to get nix set up for the rare cases when there’s no AUR package or the AUR package is unmaintained. I figure if there’s no package in the AUR or nixpkgs, it’s probably not worth running.
btop reports some gpu, network and disk information that I don’t think shows up in htop, feels a bit more comprehensive maybe? Both are fine, but I too use btop, it’s nice.
Random trivia: I think btop has been rewritten like 3-5 times now? It’s sort of an inside joke to the point that someone suggested another rewrite from C++ to Rust ( https://github.com/aristocratos/btop/issues/5 ). I guess the guy just likes writing system monitoring console apps.
There’s quantization which basically compresses the model to use a smaller data type for each weight. Reduces memory requirements by half or even more.
There’s also airllm which loads a part of the model into RAM, runs those calculations, unloads that part, loads the next part, etc… It’s a nice option but the performance of all that loading/unloading is never going to be great, especially on a huge model like llama 405b
Then there are some neat projects to distribute models across multiple computers like exo and petals. They’re more targeted at a p2p-style random collection of computers. I’ve run petals in a small cluster and it works reasonably well.
MAWP - Archer
If you go, definitely stay at Four Seasons Total Landscaping next door, best accommodations around and their convention spaces are great for any press conferences you might need to hastily put together.