I am probably not the demographics you expect. I don’t do “production” in that sense, but I have ollama running quite often when I am working, as I use it for RAG and as a fancy knowledge extraction engine. It is incredibly useful:
- I can test a lot of models by just pulling them (very useful as progress is very fast),
- using their command line is trivial,
- the fact that it keeps running in the background means that it starts once every few days and stays out of the way,
- it integrates nicely with langchain (and a host of other libraries), which means that it is easy to set up some sophisticated process and abstract away the LLM itself.
> what do you do after the initial experimentation?
I just keep using it. And for now, I keep tweaking my scripts but I expect them to stabilise at some point, because I use these models to do some real work, and this work is not monkeying about with LLMs.
> I'm not even talking about giving credit to llama.cpp, just mentioning that this product is gaining disproportionate attention and kudos compared to the value it delivers.
For me, there is nothing that comes close in terms of integration and convenience. The value it delivers is great, because it enables me to do some useful work without wasting time worrying about lower-level architecture details. Again, I am probably not in the demographics you have in mind (I am not a CS person and my programming is usually limited to HPC), but ollama is very useful to me. Its reputation is completely deserved, as far as I am concerned.
The use case is exploratory literature review in a specific scientific field.
I have a setup that takes pdfs and does some OCR and layout detection with Amazon, and then bunch them with some internal reports. Then, I have a pipeline to write summaries of each document and another one to slice them into chunks, get embeddings and set up a vector store for a RAG chat bot. At the moment it’s using Mixtral and the command line. But I like being able to swap LLMs to experiments with different models and quantisation without hassle, and I more or less plan to set this up on a remote server to free some resources on my workstation so the web UI could come in handy. Running this locally is a must for confidentiality reasons. I’d like to get rid of Textract as well, but unfortunately I haven’t found a solution that’s even close. Tesseract in particular was very disappointing.
I am probably not the demographics you expect. I don’t do “production” in that sense, but I have ollama running quite often when I am working, as I use it for RAG and as a fancy knowledge extraction engine. It is incredibly useful:
- I can test a lot of models by just pulling them (very useful as progress is very fast),
- using their command line is trivial,
- the fact that it keeps running in the background means that it starts once every few days and stays out of the way,
- it integrates nicely with langchain (and a host of other libraries), which means that it is easy to set up some sophisticated process and abstract away the LLM itself.
> what do you do after the initial experimentation?
I just keep using it. And for now, I keep tweaking my scripts but I expect them to stabilise at some point, because I use these models to do some real work, and this work is not monkeying about with LLMs.
> I'm not even talking about giving credit to llama.cpp, just mentioning that this product is gaining disproportionate attention and kudos compared to the value it delivers.
For me, there is nothing that comes close in terms of integration and convenience. The value it delivers is great, because it enables me to do some useful work without wasting time worrying about lower-level architecture details. Again, I am probably not in the demographics you have in mind (I am not a CS person and my programming is usually limited to HPC), but ollama is very useful to me. Its reputation is completely deserved, as far as I am concerned.