LLM’s on your Local_ ABC

Published on January 27, 2025

Running LLMs Locally_

The WHY and HOW to use them …

There are a few reasons why you should consider using LLMs in your local. Company policies, Data Privacy, and Security are a few. But there are also benefits! Local deployment offers greater flexibility to customize the LLM according to specific needs, such as fine-tuning the model on proprietary datasets or adjusting performance parameters.

So lets jump in a couple of options for you to do this.

LM Studio offers a nice GUI from where you can download and run different models by just doing a couple of clicks!

Llama 3.2–3b parameter model now running on my local:

As shown above, you can use different endpoints to check if the model is running, send a chat history to the model to predict the next assistant response, predict the next tokens given a prompt, or generate text embeddings for a given text input:

You can easily download and run any of the available models on your local:

LM Studio does not require a connection in order to work. This includes functions like chatting with models, chatting with documents, or running a local server, none of these require an internet connection.

Lets explore Ollama!

This is as simple as download from their site ^ and run the model (any model) from your terminal:

ollama run llama3.2 >

You can also use and test API endpoints >

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What are you?",
  "stream": false
}

You can generate the next message in a chat. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "why should we consider using LLM on our local?"
    }
  ],
  "stream": false
}'

To consider, you should have about 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

As mentioned before, an active internet connection is essential for the initial setup, downloading models, and accessing certain features of Ollama, the core functionality of running models locally does not require ongoing internet access. This design allows you to use the power of large language models within a secure, controlled, and offline environment once everything is properly installed and configured.

Keep it simple and secure :)

Best!

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri