Getting Started with DeepSeek R1: The Open-Source Reasoning Model

Open Seas 20 min read February 1, 2025 |
0

Run DeepSeek R1 on AWS Bedrock or locally with Ollama. Learn to use its chain-of-thought reasoning for complex problem-solving, coding, and math.

DeepSeek R1 changed the game when it dropped in January 2025. An open-source reasoning model that rivals OpenAI’s o1 at a fraction of the cost? The AI community took notice. Now it’s available as a fully managed model on AWS Bedrock, making it easy to add serious reasoning capabilities to your applications.

This tutorial shows you how to run DeepSeek R1 on Bedrock and locally with Ollama, with practical examples of its chain-of-thought reasoning.

Why DeepSeek R1?

DeepSeek R1 stood out for three reasons at its January 2025 launch:

FeatureDeepSeek R1OpenAI o1Claude 3.5 Sonnet
Open Source✅ Yes❌ No❌ No
ReasoningNative CoTNative CoTPrompted
Cost (per 1M tokens)~$0.55~$15~$3
Local Deployment✅ Yes❌ No❌ No

What’s the difference in reasoning?

  • Native Chain-of-Thought (R1, o1): These models were trained with reinforcement learning to generate explicit reasoning tokens before producing an answer. The thinking happens in a dedicated phase (<think> tags in R1, hidden in o1), and the model literally “thinks out loud” as part of its architecture.

  • Prompted reasoning (Claude, GPT-4): These models can reason step-by-step when you ask them to (“think through this carefully”), but it’s prompt engineering—not architectural. They weren’t specifically RL-trained to decompose problems before answering.

The practical difference: R1 and o1 will automatically reason through complex problems even without prompting, using more tokens but achieving higher accuracy on math and logic tasks. Claude excels at following instructions, creative tasks, and long-context analysis where explicit CoT isn’t needed.

The key innovation in R1 is transparent reasoning—you can see the model’s thought process as it works through problems, making it excellent for debugging and understanding model behavior.

DeepSeek R1 is fully managed on Bedrock—no infrastructure to maintain, pay-per-token pricing, and enterprise security.

Enable the Model

  1. Open the Amazon Bedrock console
  2. Go to Model access in the left navigation
  3. Find DeepSeek and request access to DeepSeek-R1
  4. Access is typically granted immediately

Basic Usage with boto3

import boto3
import json
import re

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def ask_deepseek(prompt: str) -> dict:
    """Query DeepSeek R1 on Bedrock."""
    response = bedrock.invoke_model(
        modelId="us.deepseek.r1-v1:0",  # Cross-region inference profile
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 4096,
            "temperature": 0.7
        })
    )

    result = json.loads(response["body"].read())
    text = result["choices"][0]["text"]

    # Parse reasoning from <think> tags
    reasoning = ""
    answer = text
    think_match = re.search(r"<think>(.*?)</think>", text, re.DOTALL)
    if think_match:
        reasoning = think_match.group(1).strip()
        answer = text[think_match.end():].strip()

    return {
        "reasoning": reasoning,
        "answer": answer
    }

# Test with a reasoning problem
response = ask_deepseek("""
A farmer has 17 sheep. All but 9 run away. How many sheep does the farmer have left?
""")

print("Reasoning:", response["reasoning"])
print("\nAnswer:", response["answer"])
Output
Reasoning: Let me work through this step by step.

1. The farmer starts with 17 sheep
2. "All but 9 run away" means 9 sheep did NOT run away
3. So 9 sheep remain with the farmer
4. The question asks how many sheep the farmer has left
5. The answer is 9 sheep

This is a classic word problem that tricks people into calculating 17-9=8.
The key is the phrase "all but 9" which means "except for 9" or "9 remain."

Answer: The farmer has 9 sheep left.

Streaming for Long Responses

For complex reasoning tasks, stream the response to see the model “think” in real-time:

def stream_deepseek(prompt: str):
    """Stream DeepSeek R1 response."""
    response = bedrock.invoke_model_with_response_stream(
        modelId="us.deepseek.r1-v1:0",
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 4096,
            "temperature": 0.7
        })
    )

    for event in response["body"]:
        chunk = json.loads(event["chunk"]["bytes"])
        if "choices" in chunk and chunk["choices"]:
            text = chunk["choices"][0].get("text", "")
            print(text, end="", flush=True)

# Complex math problem
stream_deepseek("""
Solve step by step: A train leaves Station A at 9:00 AM traveling at 60 mph.
Another train leaves Station B (120 miles away) at 10:00 AM traveling toward
Station A at 80 mph. At what time do the trains meet?
""")

Using the Converse API

For multi-turn conversations, use Bedrock’s Converse API:

def chat_deepseek(messages: list[dict]) -> dict:
    """Multi-turn chat with DeepSeek R1."""
    response = bedrock.converse(
        modelId="us.deepseek.r1-v1:0",
        messages=messages,
        inferenceConfig={
            "maxTokens": 4096,
            "temperature": 0.7
        }
    )

    content = response["output"]["message"]["content"]

    # Extract answer and reasoning from content blocks
    answer = ""
    reasoning = ""
    for block in content:
        if "text" in block:
            answer = block["text"]
        elif "reasoningContent" in block:
            reasoning = block["reasoningContent"]["reasoningText"]["text"]

    return {"answer": answer, "reasoning": reasoning}

# Multi-turn reasoning
messages = [
    {"role": "user", "content": [{"text": "What is 15% of 80?"}]}
]

result1 = chat_deepseek(messages)
print("First answer:", result1["answer"])

# Follow up - use the answer text for the assistant message
messages.append({"role": "assistant", "content": [{"text": result1["answer"]}]})
messages.append({"role": "user", "content": [{"text": "Now add 25% tax to that result"}]})

result2 = chat_deepseek(messages)
print("Second answer:", result2["answer"])

Option 2: Local Deployment with Ollama

For development or when you need to run offline, Ollama makes it easy to run DeepSeek locally.

Install and Run

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek R1 (7B distilled version - 4.7GB)
ollama pull deepseek-r1:7b

# Or the larger 14B version for better reasoning
ollama pull deepseek-r1:14b

# Run interactively
ollama run deepseek-r1:7b

Python Integration

import requests
import json

def ask_deepseek_local(prompt: str, model: str = "deepseek-r1:7b") -> str:
    """Query local DeepSeek via Ollama."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

# Test locally
answer = ask_deepseek_local("""
Write a Python function to check if a string is a valid IPv4 address.
Include edge cases in your reasoning.
""")
print(answer)
Output
Let me think through this step by step...

Edge cases to consider:
1. Empty string
2. Wrong number of octets (not exactly 4)
3. Non-numeric characters
4. Leading zeros (e.g., "01.02.03.04")
5. Values outside 0-255 range
6. Whitespace

Here's the implementation:

```python
def is_valid_ipv4(ip: str) -> bool:
  """Check if string is valid IPv4 address."""
  if not ip or not isinstance(ip, str):
      return False

  parts = ip.split('.')

  if len(parts) != 4:
      return False

  for part in parts:
      # Check for empty parts
      if not part:
          return False

      # Check for non-digits
      if not part.isdigit():
          return False

      # Check for leading zeros (except "0" itself)
      if len(part) > 1 and part[0] == '0':
          return False

      # Check range
      if not 0 <= int(part) <= 255:
          return False

  return True
```

This handles all the edge cases I identified.

Streaming with Ollama

def stream_deepseek_local(prompt: str, model: str = "deepseek-r1:7b"):
    """Stream response from local DeepSeek."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": True
        },
        stream=True
    )

    for line in response.iter_lines():
        if line:
            data = json.loads(line)
            print(data.get("response", ""), end="", flush=True)

stream_deepseek_local("Explain the Transformer architecture in simple terms.")

Setting Context Window Size

Context window size significantly affects VRAM usage. For longer reasoning tasks, increase the context:

def ask_with_context(prompt: str, num_ctx: int = 32768) -> str:
    """Query with custom context window size."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "deepseek-r1:7b",
            "prompt": prompt,
            "stream": False,
            "options": {"num_ctx": num_ctx}  # 8192, 16384, 32768, etc.
        }
    )
    return response.json()["response"]

Model Variants

DeepSeek offers several model sizes. VRAM requirements vary significantly with context window size:

ModelParametersDownloadVRAM (8K ctx)VRAM (32K ctx)Best For
deepseek-r1:1.5b1.5B1.1GB~2GB~4GBEdge devices, simple tasks
deepseek-r1:7b7B4.7GB5.5GB8.2GBDevelopment, general reasoning
deepseek-r1:14b14B9.0GB10.6GB17.1GBComplex reasoning, coding
deepseek-r1:32b32B19GB~22GB~35GBAdvanced math, research
deepseek-r1:70b70B40GB~45GB~70GBMaximum local capability
DeepSeek-R1 (full)671BN/ABedrock onlyBedrock onlyProduction, enterprise

VRAM measurements for 7B and 14B tested on NVIDIA L40S GPUs with Ollama (January 2026). Larger models estimated based on parameter scaling.

Practical Examples

Example 1: Code Review with Reasoning

code_to_review = '''
def get_user(id):
    query = f"SELECT * FROM users WHERE id = {id}"
    return db.execute(query)
'''

prompt = f"""
Review this code for security issues. Explain your reasoning step by step.

```python
{code_to_review}

"""

response = ask_deepseek(prompt) print(response[“answer”])


<OutputBlock>
{`Let me analyze this code for security vulnerabilities...

**Step 1: Identify the pattern**
The code constructs a SQL query using string formatting (f-string) with user input.

**Step 2: Recognize the vulnerability**
This is a classic SQL injection vulnerability. The \`id\` parameter is directly
interpolated into the query without sanitization.

**Step 3: Demonstrate the exploit**
If \`id = "1 OR 1=1"\`, the query becomes:
\`SELECT * FROM users WHERE id = 1 OR 1=1\`
This returns ALL users, bypassing authentication.

**Step 4: Propose the fix**
Use parameterized queries:

\`\`\`python
def get_user(id):
    query = "SELECT * FROM users WHERE id = ?"
    return db.execute(query, (id,))
\`\`\`

**Severity: CRITICAL**
SQL injection can lead to data theft, data loss, and complete system compromise.`}
</OutputBlock>

### Example 2: Mathematical Proof

```python
prompt = """
Prove that the square root of 2 is irrational. Show each logical step.
"""

response = ask_deepseek(prompt)
print(response["reasoning"])

Example 3: Algorithm Design

prompt = """
Design an algorithm to find the longest palindromic substring in a string.
Analyze the time and space complexity of your solution.
"""

response = ask_deepseek(prompt)
print(response["answer"])

Comparing DeepSeek R1 vs Claude

When should you use each? Here’s how R1 compares to OpenAI o1 (Dec 2024) and Claude 3.5 Sonnet:

BenchmarkDeepSeek R1OpenAI o1Claude 3.5 Sonnet
AIME 2024 (math)79.8%74.4%16.0%
MATH-50097.3%96.4%78.3%
Codeforces Elo2,029 (96.3%)1,673 (89%)-
Cost (per 1M tokens)$0.55$15.00$3.00
Open sourceYes (MIT)NoNo
Local deploymentYesNoNo
Context window128K128K200K

Sources: DeepSeek R1 paper, OpenAI o1 announcement

Key insight: R1 matches or slightly beats o1 on reasoning benchmarks while being over 100x cheaper ($0.70 vs $75 per 1M tokens) and open source. This is what made R1’s January 2025 release significant—comparable frontier performance at a fraction of the cost.

Qualitative differences:

AspectDeepSeek R1OpenAI o1Claude 3.5 Sonnet
Reasoning visible<think> tags❌ Hidden❌ Hidden
Math/code reasoningExcellentExcellentGood
Creative writingServiceableGoodExcellent
Instruction followingGoodGoodBetter
Enterprise supportCommunityOpenAIAnthropic

Cost Comparison

Running 1 million tokens through each model:

ModelInput CostOutput CostTotal (1M each)
DeepSeek R1 (Bedrock)$0.14$0.55~$0.70
Claude 3.5 Sonnet$3.00$15.00~$18.00
GPT-4 Turbo$10.00$30.00~$40.00
OpenAI o1$15.00$60.00~$75.00

For reasoning-heavy workloads, DeepSeek R1 offers 25-100x cost savings.

Full Example: Reasoning Agent

Here’s a complete example combining Bedrock with structured output:

"""
DeepSeek R1 Reasoning Agent
Uses chain-of-thought for complex problem solving.
"""
import boto3
import json
from pydantic import BaseModel
from typing import Optional

class ReasoningResult(BaseModel):
    """Structured reasoning output."""
    question: str
    reasoning_steps: list[str]
    answer: str
    confidence: str  # high, medium, low

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def reason(question: str) -> ReasoningResult:
    """Use DeepSeek R1 for structured reasoning."""

    prompt = f"""
    Solve this problem step by step. Format your response as:

    REASONING:
    1. [First step]
    2. [Second step]
    ...

    ANSWER: [Your final answer]

    CONFIDENCE: [high/medium/low]

    Problem: {question}
    """

    response = bedrock.invoke_model(
        modelId="us.deepseek.r1-v1:0",
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 2048,
            "temperature": 0.3
        })
    )

    result = json.loads(response["body"].read())
    text = result["choices"][0]["text"]

    # Parse structured output
    reasoning_section = text.split("ANSWER:")[0].replace("REASONING:", "").strip()
    steps = [s.strip() for s in reasoning_section.split("\n") if s.strip() and s[0].isdigit()]

    answer_section = text.split("ANSWER:")[1].split("CONFIDENCE:")[0].strip()
    confidence = text.split("CONFIDENCE:")[1].strip().lower() if "CONFIDENCE:" in text else "medium"

    return ReasoningResult(
        question=question,
        reasoning_steps=steps,
        answer=answer_section,
        confidence=confidence
    )

# Example usage
result = reason("""
A company's revenue grew 20% in Year 1, declined 10% in Year 2,
and grew 15% in Year 3. If they started with $1,000,000,
what's their revenue after Year 3?
""")

print(f"Question: {result.question}")
print(f"\nReasoning:")
for i, step in enumerate(result.reasoning_steps, 1):
    print(f"  {step}")
print(f"\nAnswer: {result.answer}")
print(f"Confidence: {result.confidence}")
Output
Question: A company's revenue grew 20% in Year 1...

Reasoning:
1. Start with $1,000,000
2. Year 1: $1,000,000 × 1.20 = $1,200,000
3. Year 2: $1,200,000 × 0.90 = $1,080,000
4. Year 3: $1,080,000 × 1.15 = $1,242,000

Answer: $1,242,000

Confidence: high

What’s Next

You’ve learned how to run DeepSeek R1 on Bedrock and locally. Key takeaways:

  1. Use Bedrock for production - Fully managed, enterprise security, pay-per-token
  2. Use Ollama for development - Fast iteration, offline work, free
  3. Leverage transparent reasoning - See the model’s thought process
  4. Consider cost - 25-100x cheaper than alternatives for reasoning tasks

Further Reading:

Related Tutorials:

Found this helpful?
0

Comments

Loading comments...