โGPU acceleration capabilities in llamafiles are limited, making them primarily optimized for CPU inference. If your workflow demands GPU-intensive operations or extremely high inference throughput, you might find llamafiles less efficient compared to GPU-optimized cloud solutions.โ
Definitely going to be a dealbreaker for a lot of people.
Last release was May 14, and I only see a handful of commits (looks like minor refactoring). Is this actively maintained / worked on?
Particularly asking because I've been using it and it is great for what it does, but if it doesn't work on new models I'm going to feel more pressure to look at alternatives (which is a pain, because I am quite sure none of the can compete with "download this one file executable and point it at a .gguf file" for ease of use).
I built a file sharing CLI called ffl which is also an APE built on Cosmopolitan Libc, just like llamafile.
Since llamafiles can be quite large (multi-GB), I built ffl to help 'ship' these single-file binaries easily across different OSs using WebRTC. It feels natural to pair an APE transfer tool with APE AI models.
โGPU acceleration capabilities in llamafiles are limited, making them primarily optimized for CPU inference. If your workflow demands GPU-intensive operations or extremely high inference throughput, you might find llamafiles less efficient compared to GPU-optimized cloud solutions.โ
Definitely going to be a dealbreaker for a lot of people.
Last release was May 14, and I only see a handful of commits (looks like minor refactoring). Is this actively maintained / worked on?
Particularly asking because I've been using it and it is great for what it does, but if it doesn't work on new models I'm going to feel more pressure to look at alternatives (which is a pain, because I am quite sure none of the can compete with "download this one file executable and point it at a .gguf file" for ease of use).
Mozilla is working on it again, and they're asking for input:
https://github.com/mozilla-ai/llamafile/discussions/809
I built a file sharing CLI called ffl which is also an APE built on Cosmopolitan Libc, just like llamafile.
Since llamafiles can be quite large (multi-GB), I built ffl to help 'ship' these single-file binaries easily across different OSs using WebRTC. It feels natural to pair an APE transfer tool with APE AI models.
https://github.com/nuwainfo/ffl
Llamafile did great work on improving cpu inferencing and generosly upstreamed it into llama.cpp
Wonderful project, please check it out.
Issue llama.cpp is shipped and no more updated. Or you need to download it all.