Getting Started with DeepSeek R1: The Open-Source Reasoning Model

DeepSeek R1 changed the game when it dropped in January 2025. An open-source reasoning model that rivals OpenAI’s o1 at a fraction of the cost? The AI community took notice. Now it’s available as a fully managed model on AWS Bedrock, making it easy to add serious reasoning capabilities to your applications.

This tutorial shows you how to run DeepSeek R1 on Bedrock and locally with Ollama, with practical examples of its chain-of-thought reasoning.

Why DeepSeek R1?

DeepSeek R1 stood out for three reasons at its January 2025 launch:

Feature	DeepSeek R1	OpenAI o1	Claude 3.5 Sonnet
Open Source	✅ Yes	❌ No	❌ No
Reasoning	Native CoT	Native CoT	Prompted
Cost (per 1M tokens)	~$0.55	~$15	~$3
Local Deployment	✅ Yes	❌ No	❌ No

What’s the difference in reasoning?

Native Chain-of-Thought (R1, o1): These models were trained with reinforcement learning to generate explicit reasoning tokens before producing an answer. The thinking happens in a dedicated phase (<think> tags in R1, hidden in o1), and the model literally “thinks out loud” as part of its architecture.
Prompted reasoning (Claude, GPT-4): These models can reason step-by-step when you ask them to (“think through this carefully”), but it’s prompt engineering—not architectural. They weren’t specifically RL-trained to decompose problems before answering.

The practical difference: R1 and o1 will automatically reason through complex problems even without prompting, using more tokens but achieving higher accuracy on math and logic tasks. Claude excels at following instructions, creative tasks, and long-context analysis where explicit CoT isn’t needed.

The key innovation in R1 is transparent reasoning—you can see the model’s thought process as it works through problems, making it excellent for debugging and understanding model behavior.

Option 1: AWS Bedrock (Recommended for Production)

DeepSeek R1 is fully managed on Bedrock—no infrastructure to maintain, pay-per-token pricing, and enterprise security.

Enable the Model

Open the Amazon Bedrock console
Go to Model access in the left navigation
Find DeepSeek and request access to DeepSeek-R1
Access is typically granted immediately

Basic Usage with boto3

import boto3
import json
import re

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def ask_deepseek(prompt: str) -> dict:
    """Query DeepSeek R1 on Bedrock."""
    response = bedrock.invoke_model(
        modelId="us.deepseek.r1-v1:0",  # Cross-region inference profile
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 4096,
            "temperature": 0.7
        })
    )

    result = json.loads(response["body"].read())
    text = result["choices"][0]["text"]

    # Parse reasoning from <think> tags
    reasoning = ""
    answer = text
    think_match = re.search(r"<think>(.*?)</think>", text, re.DOTALL)
    if think_match:
        reasoning = think_match.group(1).strip()
        answer = text[think_match.end():].strip()

    return {
        "reasoning": reasoning,
        "answer": answer
    }

# Test with a reasoning problem
response = ask_deepseek("""
A farmer has 17 sheep. All but 9 run away. How many sheep does the farmer have left?
""")

print("Reasoning:", response["reasoning"])
print("\nAnswer:", response["answer"])

Output

Reasoning: Let me work through this step by step.

1. The farmer starts with 17 sheep
2. "All but 9 run away" means 9 sheep did NOT run away
3. So 9 sheep remain with the farmer
4. The question asks how many sheep the farmer has left
5. The answer is 9 sheep

This is a classic word problem that tricks people into calculating 17-9=8.
The key is the phrase "all but 9" which means "except for 9" or "9 remain."

Answer: The farmer has 9 sheep left.

Streaming for Long Responses

For complex reasoning tasks, stream the response to see the model “think” in real-time:

def stream_deepseek(prompt: str):
    """Stream DeepSeek R1 response."""
    response = bedrock.invoke_model_with_response_stream(
        modelId="us.deepseek.r1-v1:0",
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 4096,
            "temperature": 0.7
        })
    )

    for event in response["body"]:
        chunk = json.loads(event["chunk"]["bytes"])
        if "choices" in chunk and chunk["choices"]:
            text = chunk["choices"][0].get("text", "")
            print(text, end="", flush=True)

# Complex math problem
stream_deepseek("""
Solve step by step: A train leaves Station A at 9:00 AM traveling at 60 mph.
Another train leaves Station B (120 miles away) at 10:00 AM traveling toward
Station A at 80 mph. At what time do the trains meet?
""")

Using the Converse API

For multi-turn conversations, use Bedrock’s Converse API:

def chat_deepseek(messages: list[dict]) -> dict:
    """Multi-turn chat with DeepSeek R1."""
    response = bedrock.converse(
        modelId="us.deepseek.r1-v1:0",
        messages=messages,
        inferenceConfig={
            "maxTokens": 4096,
            "temperature": 0.7
        }
    )

    content = response["output"]["message"]["content"]

    # Extract answer and reasoning from content blocks
    answer = ""
    reasoning = ""
    for block in content:
        if "text" in block:
            answer = block["text"]
        elif "reasoningContent" in block:
            reasoning = block["reasoningContent"]["reasoningText"]["text"]

    return {"answer": answer, "reasoning": reasoning}

# Multi-turn reasoning
messages = [
    {"role": "user", "content": [{"text": "What is 15% of 80?"}]}
]

result1 = chat_deepseek(messages)
print("First answer:", result1["answer"])

# Follow up - use the answer text for the assistant message
messages.append({"role": "assistant", "content": [{"text": result1["answer"]}]})
messages.append({"role": "user", "content": [{"text": "Now add 25% tax to that result"}]})

result2 = chat_deepseek(messages)
print("Second answer:", result2["answer"])

Option 2: Local Deployment with Ollama

For development or when you need to run offline, Ollama makes it easy to run DeepSeek locally.

Install and Run

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek R1 (7B distilled version - 4.7GB)
ollama pull deepseek-r1:7b

# Or the larger 14B version for better reasoning
ollama pull deepseek-r1:14b

# Run interactively
ollama run deepseek-r1:7b

Python Integration

import requests
import json

def ask_deepseek_local(prompt: str, model: str = "deepseek-r1:7b") -> str:
    """Query local DeepSeek via Ollama."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

# Test locally
answer = ask_deepseek_local("""
Write a Python function to check if a string is a valid IPv4 address.
Include edge cases in your reasoning.
""")
print(answer)

Output

Let me think through this step by step...

Edge cases to consider:
1. Empty string
2. Wrong number of octets (not exactly 4)
3. Non-numeric characters
4. Leading zeros (e.g., "01.02.03.04")
5. Values outside 0-255 range
6. Whitespace

Here's the implementation:

```python
def is_valid_ipv4(ip: str) -> bool:
  """Check if string is valid IPv4 address."""
  if not ip or not isinstance(ip, str):
      return False

  parts = ip.split('.')

  if len(parts) != 4:
      return False

  for part in parts:
      # Check for empty parts
      if not part:
          return False

      # Check for non-digits
      if not part.isdigit():
          return False

      # Check for leading zeros (except "0" itself)
      if len(part) > 1 and part[0] == '0':
          return False

      # Check range
      if not 0 <= int(part) <= 255:
          return False

  return True
```

This handles all the edge cases I identified.

Streaming with Ollama

def stream_deepseek_local(prompt: str, model: str = "deepseek-r1:7b"):
    """Stream response from local DeepSeek."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": True
        },
        stream=True
    )

    for line in response.iter_lines():
        if line:
            data = json.loads(line)
            print(data.get("response", ""), end="", flush=True)

stream_deepseek_local("Explain the Transformer architecture in simple terms.")

Setting Context Window Size

Context window size significantly affects VRAM usage. For longer reasoning tasks, increase the context:

def ask_with_context(prompt: str, num_ctx: int = 32768) -> str:
    """Query with custom context window size."""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "deepseek-r1:7b",
            "prompt": prompt,
            "stream": False,
            "options": {"num_ctx": num_ctx}  # 8192, 16384, 32768, etc.
        }
    )
    return response.json()["response"]

Model Variants

DeepSeek offers several model sizes. VRAM requirements vary significantly with context window size:

Model	Parameters	Download	VRAM (8K ctx)	VRAM (32K ctx)	Best For
deepseek-r1:1.5b	1.5B	1.1GB	~2GB	~4GB	Edge devices, simple tasks
deepseek-r1:7b	7B	4.7GB	5.5GB	8.2GB	Development, general reasoning
deepseek-r1:14b	14B	9.0GB	10.6GB	17.1GB	Complex reasoning, coding
deepseek-r1:32b	32B	19GB	~22GB	~35GB	Advanced math, research
deepseek-r1:70b	70B	40GB	~45GB	~70GB	Maximum local capability
DeepSeek-R1 (full)	671B	N/A	Bedrock only	Bedrock only	Production, enterprise

VRAM measurements for 7B and 14B tested on NVIDIA L40S GPUs with Ollama (January 2026). Larger models estimated based on parameter scaling.

Practical Examples

Example 1: Code Review with Reasoning

code_to_review = '''
def get_user(id):
    query = f"SELECT * FROM users WHERE id = {id}"
    return db.execute(query)
'''

prompt = f"""
Review this code for security issues. Explain your reasoning step by step.

```python
{code_to_review}

"""

response = ask_deepseek(prompt) print(response[“answer”])


<OutputBlock>
{`Let me analyze this code for security vulnerabilities...

**Step 1: Identify the pattern**
The code constructs a SQL query using string formatting (f-string) with user input.

**Step 2: Recognize the vulnerability**
This is a classic SQL injection vulnerability. The \`id\` parameter is directly
interpolated into the query without sanitization.

**Step 3: Demonstrate the exploit**
If \`id = "1 OR 1=1"\`, the query becomes:
\`SELECT * FROM users WHERE id = 1 OR 1=1\`
This returns ALL users, bypassing authentication.

**Step 4: Propose the fix**
Use parameterized queries:

\`\`\`python
def get_user(id):
    query = "SELECT * FROM users WHERE id = ?"
    return db.execute(query, (id,))
\`\`\`

**Severity: CRITICAL**
SQL injection can lead to data theft, data loss, and complete system compromise.`}
</OutputBlock>

### Example 2: Mathematical Proof

```python
prompt = """
Prove that the square root of 2 is irrational. Show each logical step.
"""

response = ask_deepseek(prompt)
print(response["reasoning"])

Example 3: Algorithm Design

prompt = """
Design an algorithm to find the longest palindromic substring in a string.
Analyze the time and space complexity of your solution.
"""

response = ask_deepseek(prompt)
print(response["answer"])

Comparing DeepSeek R1 vs Claude

When should you use each? Here’s how R1 compares to OpenAI o1 (Dec 2024) and Claude 3.5 Sonnet:

Benchmark	DeepSeek R1	OpenAI o1	Claude 3.5 Sonnet
AIME 2024 (math)	79.8%	74.4%	16.0%
MATH-500	97.3%	96.4%	78.3%
Codeforces Elo	2,029 (96.3%)	1,673 (89%)	-
Cost (per 1M tokens)	$0.55	$15.00	$3.00
Open source	Yes (MIT)	No	No
Local deployment	Yes	No	No
Context window	128K	128K	200K

Sources: DeepSeek R1 paper, OpenAI o1 announcement

Key insight: R1 matches or slightly beats o1 on reasoning benchmarks while being over 100x cheaper ($0.70 vs $75 per 1M tokens) and open source. This is what made R1’s January 2025 release significant—comparable frontier performance at a fraction of the cost.

Qualitative differences:

Aspect	DeepSeek R1	OpenAI o1	Claude 3.5 Sonnet
Reasoning visible	✅ `<think>` tags	❌ Hidden	❌ Hidden
Math/code reasoning	Excellent	Excellent	Good
Creative writing	Serviceable	Good	Excellent
Instruction following	Good	Good	Better
Enterprise support	Community	OpenAI	Anthropic

Cost Comparison

Running 1 million tokens through each model:

Model	Input Cost	Output Cost	Total (1M each)
DeepSeek R1 (Bedrock)	$0.14	$0.55	~$0.70
Claude 3.5 Sonnet	$3.00	$15.00	~$18.00
GPT-4 Turbo	$10.00	$30.00	~$40.00
OpenAI o1	$15.00	$60.00	~$75.00

For reasoning-heavy workloads, DeepSeek R1 offers 25-100x cost savings.

Full Example: Reasoning Agent

Here’s a complete example combining Bedrock with structured output:

"""
DeepSeek R1 Reasoning Agent
Uses chain-of-thought for complex problem solving.
"""
import boto3
import json
from pydantic import BaseModel
from typing import Optional

class ReasoningResult(BaseModel):
    """Structured reasoning output."""
    question: str
    reasoning_steps: list[str]
    answer: str
    confidence: str  # high, medium, low

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def reason(question: str) -> ReasoningResult:
    """Use DeepSeek R1 for structured reasoning."""

    prompt = f"""
    Solve this problem step by step. Format your response as:

    REASONING:
    1. [First step]
    2. [Second step]
    ...

    ANSWER: [Your final answer]

    CONFIDENCE: [high/medium/low]

    Problem: {question}
    """

    response = bedrock.invoke_model(
        modelId="us.deepseek.r1-v1:0",
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 2048,
            "temperature": 0.3
        })
    )

    result = json.loads(response["body"].read())
    text = result["choices"][0]["text"]

    # Parse structured output
    reasoning_section = text.split("ANSWER:")[0].replace("REASONING:", "").strip()
    steps = [s.strip() for s in reasoning_section.split("\n") if s.strip() and s[0].isdigit()]

    answer_section = text.split("ANSWER:")[1].split("CONFIDENCE:")[0].strip()
    confidence = text.split("CONFIDENCE:")[1].strip().lower() if "CONFIDENCE:" in text else "medium"

    return ReasoningResult(
        question=question,
        reasoning_steps=steps,
        answer=answer_section,
        confidence=confidence
    )

# Example usage
result = reason("""
A company's revenue grew 20% in Year 1, declined 10% in Year 2,
and grew 15% in Year 3. If they started with $1,000,000,
what's their revenue after Year 3?
""")

print(f"Question: {result.question}")
print(f"\nReasoning:")
for i, step in enumerate(result.reasoning_steps, 1):
    print(f"  {step}")
print(f"\nAnswer: {result.answer}")
print(f"Confidence: {result.confidence}")

Output

Question: A company's revenue grew 20% in Year 1...

Reasoning:
1. Start with $1,000,000
2. Year 1: $1,000,000 × 1.20 = $1,200,000
3. Year 2: $1,200,000 × 0.90 = $1,080,000
4. Year 3: $1,080,000 × 1.15 = $1,242,000

Answer: $1,242,000

Confidence: high

What’s Next

You’ve learned how to run DeepSeek R1 on Bedrock and locally. Key takeaways:

Use Bedrock for production - Fully managed, enterprise security, pay-per-token
Use Ollama for development - Fast iteration, offline work, free
Leverage transparent reasoning - See the model’s thought process
Consider cost - 25-100x cheaper than alternatives for reasoning tasks

Further Reading:

DeepSeek R1 on AWS Bedrock
DeepSeek R1 Paper
Ollama Documentation
Data Models for AI Applications - Structure your DeepSeek outputs

Related Tutorials:

S3 Vectors Getting Started - Build RAG with DeepSeek
Fishing Report Agent - Tool-calling patterns

Why DeepSeek R1?

Option 1: AWS Bedrock (Recommended for Production)

Enable the Model

Basic Usage with boto3

Streaming for Long Responses

Using the Converse API

Option 2: Local Deployment with Ollama

Install and Run

Python Integration

Streaming with Ollama

Setting Context Window Size

Model Variants

Practical Examples

Example 1: Code Review with Reasoning

Example 3: Algorithm Design

Comparing DeepSeek R1 vs Claude

Cost Comparison

Full Example: Reasoning Agent

What’s Next

Comments