Skip to main content

Intro: 🚀 Run DeepSeek AI Locally on Your Mac! (Super Easy Setup)

AI just got more accessible! DeepSeek is shaking up the industry by delivering powerful AI models that run on a fraction of the GPU power—so much so that you can even run them on your MacBook or home PC! In this tutorial, I’ll walk you through:

Installing & Running DeepSeek on macOS using Ollama & Chatbox

Choosing the Right Model Size for your system (8B, 14B, 32B…)

Comparing DeepSeek’s performance to LLaMA 3 & other models

Using DeepSeek in Python for local AI-powered applications

💡 Whether you’re an AI enthusiast, developer, or someone looking for a private, cloud-free AI assistant, this guide will get you started fast!

📌 Resources & Commands (As mentioned in the video):

🔹 Install Ollama: brew install ollama

🔹 Install Chatbox: brew install –cask chatbox

🔹 Run DeepSeek: ollama run deepseek:r1-14b

🚀 Want more hands-on AI guidance? Book a consultation session with me: [Your Website]

📢 Like, Subscribe & Comment if you found this helpful! Let me know what AI topics you’d like to see next.

#AI #DeepSeek #MacAI #LocalAI #Ollama #Chatbox #MachineLearning

Deep Dive

Artificial Intelligence is evolving fast, and one of the latest breakthroughs is DeepSeek AI. This cutting-edge model requires a fraction of the GPU power for training and inference, making it possible to run even on a MacBook or consumer-grade hardware. In this tutorial, I’ll walk you through how to set up DeepSeek AI locally—no complex configurations, no cloud dependencies, just AI running smoothly on your machine!

🚀 Why DeepSeek AI?

DeepSeek is changing the game by offering powerful AI capabilities without needing high-end GPUs. Some highlights:

  • Lower hardware requirements – Runs on a MacBook with 16GB RAM!
  • No cloud lock-in – Keep your AI local and private.
  • Better efficiency – Outperforms LLaMA 3 in many tasks while using less storage and compute power.

🔧 Setting Up DeepSeek AI on macOS

Thanks to Ollama (a fantastic tool for running local AI models), setting up DeepSeek AI is incredibly easy. Here’s how:

1️⃣ Install the Required Tools

If you have Homebrew installed, just run:

brew install ollama
brew install --cask chatbox
  • Ollama manages AI models locally, allowing you to test different LLMs with ease.
  • Chatbox is a GUI tool that provides a simple chat interface for interacting with AI models.

2️⃣ Choose the Right Model Size

DeepSeek AI offers different model sizes, so pick one based on your available RAM:

Model RAM Requirement
8B Works on 16GB MacBook (faster but limited)
14B Works well on 16GB MacBook (better results)
32B Ideal for 64GB MacBook Pro

For most users, 14B is the sweet spot between performance and quality.

3️⃣ Run DeepSeek AI Locally

Once installed, start the AI model with:

ollama run deepseek:r1-14b

If it’s your first time, it will download the model (~9GB). Once ready, you can chat with DeepSeek directly in the terminal or through Chatbox.

🖥️ Using DeepSeek AI in Python

Want to integrate DeepSeek into your own applications? Ollama makes it easy:

import ollama
response = ollama.chat(model='deepseek:r1-14b', prompt='How does a re-ranker work?')
print(response)

This allows you to build local AI-powered apps without cloud dependencies. Perfect for privacy-sensitive projects!

🌟 Why This Matters

DeepSeek AI is a game-changer because it: ✅ Runs on local machines (no expensive cloud servers!) ✅ Works well even on consumer hardware ✅ Provides better results than comparable local models

If you’re into AI and want a powerful, private, and efficient model on your Mac, DeepSeek AI is worth checking out!

💬 What’s Next?

If you found this helpful, let me know in the comments! Also, if you need consulting on AI or hosting solutions, feel free to reach out. Let’s build something awesome together! 🚀

Full transcript

So welcome back everyone to our tutorial series this time about AI. And I’m pretty sure you’ve heard about DeepSeq around shattering the industry. And somehow they managed to make an AI model which just requires a tiny fraction of the GPU amount, especially for training, and then applying it to smaller models such that you can even use it in your home machine. And in this tutorial, I’m going to show you how you can even run a small version on your MacBook. And actually, it’s doing quite well.

Like my MacBook currently has 16 gigabytes of ram which is basically equivalent to virtual ram like for example in nvidia gpus or amd gpus and this opens up so many possibilities like you could even host it in a cloud machine, in a cloud server. And actually the 16 gigabyte VRAM results compared to, for example, Lama 3 are insane. Like it’s way better than something else I’ve seen before. And I can just recommend that you are going to try it out. And actually I will show you that it’s super easy to get set up. And I mean, we are doing it quite visually in our Mac machine now. But I mean, you can install it on a server and it’s quite easy to get set up and started. So, like you might have seen, the main model is a 671 billion parameter model, which would require quite a hefty server still. But they distilled their model down into other versions, like basically taking the Lama model or QN model, for example, and teach it with the DeepSeq knowledge such that we can use a smaller version as well and still get quite decent results.

I mean you can see the differences here and there’s a lot of materials but as I want to keep this short I just want to get started and they are recommending VLLM. But in my opinion, the easier combination is Ollama, which is really nice tool to host different LLMs, like you can really easily try out different models locally without needing to worry about different setups, especially on Mac. You don’t even need to worry about drivers and so on and no CUDA setup and you can just get started. And in combination with that, we’re going to use Chatbox, which is the GUI component. So basically that you have a nice chat interface. But in the tutorial, I also want to dive into a Python interface and I will show you how you can even chat with the local model or a server model.

In this case using Ollama as a backend quite easily. But for now, let’s get started with the visual stuff. In Mac, if you have Homebrew installed, it’s quite easy. Like you just have to type in brew install olama, brew install cask chatbox, and you’re ready to go. And olama is run via the terminal, even though there are different GUI applications for it. But it’s quite easy. So we can just use this. And the first step is to find the right size of this model and heading over to Ollama, they have a nice library and I already selected DeepSeq R1. And as we can see here, they have the 1.5 billion, 7B, 8B, 14B and so on models. And they are all the distilled versions that we’ve seen on the GitHub page of DeepSeek. And roughly said, you would choose a number that is slightly lower than your RAM amount. So for example, if you’re on the MacBook, you might get away with the 8B model, but it could be really limited because there’s a ton of other stuff running on your computer as well. But for example, with the 16 gigabyte MacBook, you can run the 14B model quite well, even though I have experienced if you’re running a lot of GPU intensive stuff, like having, I don’t know, browsers open with many, many tabs, different applications open, it’s maybe a little bit limited and slow.

I mean, if you need a quick response, I would choose the 8B model on a 16 gigabyte MacBook. But if you’re looking for more quality based results, you would choose the 14B model. Then of course, if you’re having one of these really big MacBooks, 64 gigabyte VRAM, you could even choose the 32B model. And of course, I mean, if you’re having a server, you could have a look at the 70B or even 671B model. But as I said in this tutorial, we are going to look at consumer hardware, 14B, 8B. And they are already showing you the command, how to start it. And in our case, we will choose the 14b model. So if you want to run this, you can just paste it in here. Olamaran DeepSeq R1 14b. If it’s your first time, it’s first going to download it.

It will be like we’ve seen here, nine gigabytes of storage required. And we could start chatting in here. But like I said, we are going to use Chatbox, which is a nice little GUI. And in Chatbox, we can either click on the global settings or like I just did on the chat base settings. And I have selected the Ollama API provider, like there are many different ones, but chatbox is working with Ollama. The URL is set by default. And in here I can select all Ollama models I have downloaded. In our case, we’re going to select DeepSeek. For temperature, they are recommending anything between 0.5 and 0.7. This is coming from the DeepSeek developers. And for this tutorial, I’m going to choose temperature 0.6. Hit save. And they also recommend not playing around with the or not delivering the task in the system prompt like you would have done it usually, but in the user prompt. So in our case, I will just leave this as default, you are helpful assistant. And in my message, I’m going to paste.

I mean, I have just pre-typed it. We can delete it as well. And I would just type it in here. How would you write a rock in Python using a re-ranker? This is something we’re going to look at later on in another tutorial, but I just want to show you how the results are looking like and now I’ve hit enter and as you can already see it’s way slower than the web version of DeepSeek but like as or as I said we are going to look at the local version and as this is running on my own computer right now it’s of course taking a little time longer. Even though in my opinion, it’s quite amazing that it’s even having this deep knowledge about, for example, RAC algorithms or any other algorithms in just 10 gigabyte of storage space. And if you’re thinking about it that is definitely crazy especially if you would have run this question with llama 3 for example it would be way worse like not at all in this direction and more hallucinations so i’m actually quite impressed. And this would go on and on.

But like as I have promised you, I want to take a look at the Python interface as well. And the nice thing about Ulama is that we can set it to whatever, and then we can actually communicate with the local deep seek as you would have done it with OpenIF for example. The important thing is still to select deep seek in here. And that way we can even, like I said, just communicate with Python, with our deep seek model in the background without too much setup. And for me, this is quite a game changer because for example, in some applications, I’m still running the really beefy questions via open router or like any other API provider of public AI models.

But if I have sensitive stuff that isn’t allowed to run in the cloud, or if I just have small tasks, like for example, asking the AI, does it contain numbers or whatever, then I can use this local version. And as I’m not streaming the answer, the response will quite take some time because unlike the chatbox interface where we streamed, this Python setup is waiting until the response is finished. And of course it will take some time. But I mean, if we would be using streaming, which I will show in another tutorial we could already see results appearing but yeah in order to not keep the video too long because this will take I don’t know maybe three to five minutes I will end it here so let me know if you want to find out more about AI and what tutorials I should do next. And of course, if you are up for a little consulting session, if you have any projects in mind, just head over to my website and we could have a little chat. All right. So that’s it for now. See you around. Bye.

Oh hi there 👋
It’s nice to meet you.

Sign up to receive one paid tutorial/course per month for free and a weekly summary of new posts and tutorials in the Data/Cloud/AI space

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Close Menu

Wow look at this!

This is an optional, highly
customizable off canvas area.

About Salient

The Castle
Unit 345
2500 Castle Dr
Manhattan, NY

T: +216 (0)40 3629 4753
E: hello@themenectar.com