LLM stands for Large Language Model and is a large-scale AI model that has been trained with an extensive amount of text and code. Beside the well known and widespread Chat GPT, today there are many powerful Open Source alternatives available. The advantage of an Open Source LLM is that you can use such a model in your own application within your own environment. There is no dependency on an external service provider that can raise prices, shut down services, or remove models.
But the question that inevitably arises is: Where to start? At least that’s the question I asked myself. After some research I found out that it isn’t such difficulty as it sounds to run a local LLM.
First of all there is a place called Hugging Face providing a kind of market place for all kinds of AI models. After you have registers yourself on the page you can search and download all kinds of different Models. Of course each model is different and addresses different needs and requirements. But the good news is that there is a kind of common open standard to run a LLM called LLaMA CCP. Lamma CCP allows you to run a LLM with minimal setup and state-of-the-art performance on a wide variety of hardware – locally and in the cloud. And of course there is also a Python binding available. And this makes is easy to test a LLM in a Docker container.
Continue reading “How to Run LLMs in a Docker Container”
