Getting Started with DeepSeek R1: The Open-Source Reasoning Model
Run DeepSeek R1 on AWS Bedrock or locally with Ollama. Learn to use its chain-of-thought reasoning for complex problem-solving, coding, and math.
DeepSeek R1 changed the game when it dropped in January 2025. An open-source reasoning model that rivals OpenAI’s o1 at a fraction of the cost? The AI community took notice. Now it’s available as a fully managed model on AWS Bedrock, making it easy to add serious reasoning capabilities to your applications.
This tutorial shows you how to run DeepSeek R1 on Bedrock and locally with Ollama, with practical examples of its chain-of-thought reasoning.
Why DeepSeek R1?
DeepSeek R1 stood out for three reasons at its January 2025 launch:
| Feature | DeepSeek R1 | OpenAI o1 | Claude 3.5 Sonnet |
|---|---|---|---|
| Open Source | ✅ Yes | ❌ No | ❌ No |
| Reasoning | Native CoT | Native CoT | Prompted |
| Cost (per 1M tokens) | ~$0.55 | ~$15 | ~$3 |
| Local Deployment | ✅ Yes | ❌ No | ❌ No |
What’s the difference in reasoning?
-
Native Chain-of-Thought (R1, o1): These models were trained with reinforcement learning to generate explicit reasoning tokens before producing an answer. The thinking happens in a dedicated phase (
<think>tags in R1, hidden in o1), and the model literally “thinks out loud” as part of its architecture. -
Prompted reasoning (Claude, GPT-4): These models can reason step-by-step when you ask them to (“think through this carefully”), but it’s prompt engineering—not architectural. They weren’t specifically RL-trained to decompose problems before answering.
The practical difference: R1 and o1 will automatically reason through complex problems even without prompting, using more tokens but achieving higher accuracy on math and logic tasks. Claude excels at following instructions, creative tasks, and long-context analysis where explicit CoT isn’t needed.
The key innovation in R1 is transparent reasoning—you can see the model’s thought process as it works through problems, making it excellent for debugging and understanding model behavior.
Option 1: AWS Bedrock (Recommended for Production)
DeepSeek R1 is fully managed on Bedrock—no infrastructure to maintain, pay-per-token pricing, and enterprise security.
Enable the Model
- Open the Amazon Bedrock console
- Go to Model access in the left navigation
- Find DeepSeek and request access to DeepSeek-R1
- Access is typically granted immediately
Basic Usage with boto3
import boto3
import json
import re
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def ask_deepseek(prompt: str) -> dict:
"""Query DeepSeek R1 on Bedrock."""
response = bedrock.invoke_model(
modelId="us.deepseek.r1-v1:0", # Cross-region inference profile
body=json.dumps({
"prompt": prompt,
"max_tokens": 4096,
"temperature": 0.7
})
)
result = json.loads(response["body"].read())
text = result["choices"][0]["text"]
# Parse reasoning from <think> tags
reasoning = ""
answer = text
think_match = re.search(r"<think>(.*?)</think>", text, re.DOTALL)
if think_match:
reasoning = think_match.group(1).strip()
answer = text[think_match.end():].strip()
return {
"reasoning": reasoning,
"answer": answer
}
# Test with a reasoning problem
response = ask_deepseek("""
A farmer has 17 sheep. All but 9 run away. How many sheep does the farmer have left?
""")
print("Reasoning:", response["reasoning"])
print("\nAnswer:", response["answer"])
Reasoning: Let me work through this step by step. 1. The farmer starts with 17 sheep 2. "All but 9 run away" means 9 sheep did NOT run away 3. So 9 sheep remain with the farmer 4. The question asks how many sheep the farmer has left 5. The answer is 9 sheep This is a classic word problem that tricks people into calculating 17-9=8. The key is the phrase "all but 9" which means "except for 9" or "9 remain." Answer: The farmer has 9 sheep left.
Streaming for Long Responses
For complex reasoning tasks, stream the response to see the model “think” in real-time:
def stream_deepseek(prompt: str):
"""Stream DeepSeek R1 response."""
response = bedrock.invoke_model_with_response_stream(
modelId="us.deepseek.r1-v1:0",
body=json.dumps({
"prompt": prompt,
"max_tokens": 4096,
"temperature": 0.7
})
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if "choices" in chunk and chunk["choices"]:
text = chunk["choices"][0].get("text", "")
print(text, end="", flush=True)
# Complex math problem
stream_deepseek("""
Solve step by step: A train leaves Station A at 9:00 AM traveling at 60 mph.
Another train leaves Station B (120 miles away) at 10:00 AM traveling toward
Station A at 80 mph. At what time do the trains meet?
""")
Using the Converse API
For multi-turn conversations, use Bedrock’s Converse API:
def chat_deepseek(messages: list[dict]) -> dict:
"""Multi-turn chat with DeepSeek R1."""
response = bedrock.converse(
modelId="us.deepseek.r1-v1:0",
messages=messages,
inferenceConfig={
"maxTokens": 4096,
"temperature": 0.7
}
)
content = response["output"]["message"]["content"]
# Extract answer and reasoning from content blocks
answer = ""
reasoning = ""
for block in content:
if "text" in block:
answer = block["text"]
elif "reasoningContent" in block:
reasoning = block["reasoningContent"]["reasoningText"]["text"]
return {"answer": answer, "reasoning": reasoning}
# Multi-turn reasoning
messages = [
{"role": "user", "content": [{"text": "What is 15% of 80?"}]}
]
result1 = chat_deepseek(messages)
print("First answer:", result1["answer"])
# Follow up - use the answer text for the assistant message
messages.append({"role": "assistant", "content": [{"text": result1["answer"]}]})
messages.append({"role": "user", "content": [{"text": "Now add 25% tax to that result"}]})
result2 = chat_deepseek(messages)
print("Second answer:", result2["answer"])
Option 2: Local Deployment with Ollama
For development or when you need to run offline, Ollama makes it easy to run DeepSeek locally.
Install and Run
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull DeepSeek R1 (7B distilled version - 4.7GB)
ollama pull deepseek-r1:7b
# Or the larger 14B version for better reasoning
ollama pull deepseek-r1:14b
# Run interactively
ollama run deepseek-r1:7b
Python Integration
import requests
import json
def ask_deepseek_local(prompt: str, model: str = "deepseek-r1:7b") -> str:
"""Query local DeepSeek via Ollama."""
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": model,
"prompt": prompt,
"stream": False
}
)
return response.json()["response"]
# Test locally
answer = ask_deepseek_local("""
Write a Python function to check if a string is a valid IPv4 address.
Include edge cases in your reasoning.
""")
print(answer)
Let me think through this step by step...
Edge cases to consider:
1. Empty string
2. Wrong number of octets (not exactly 4)
3. Non-numeric characters
4. Leading zeros (e.g., "01.02.03.04")
5. Values outside 0-255 range
6. Whitespace
Here's the implementation:
```python
def is_valid_ipv4(ip: str) -> bool:
"""Check if string is valid IPv4 address."""
if not ip or not isinstance(ip, str):
return False
parts = ip.split('.')
if len(parts) != 4:
return False
for part in parts:
# Check for empty parts
if not part:
return False
# Check for non-digits
if not part.isdigit():
return False
# Check for leading zeros (except "0" itself)
if len(part) > 1 and part[0] == '0':
return False
# Check range
if not 0 <= int(part) <= 255:
return False
return True
```
This handles all the edge cases I identified. Streaming with Ollama
def stream_deepseek_local(prompt: str, model: str = "deepseek-r1:7b"):
"""Stream response from local DeepSeek."""
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": model,
"prompt": prompt,
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
data = json.loads(line)
print(data.get("response", ""), end="", flush=True)
stream_deepseek_local("Explain the Transformer architecture in simple terms.")
Setting Context Window Size
Context window size significantly affects VRAM usage. For longer reasoning tasks, increase the context:
def ask_with_context(prompt: str, num_ctx: int = 32768) -> str:
"""Query with custom context window size."""
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "deepseek-r1:7b",
"prompt": prompt,
"stream": False,
"options": {"num_ctx": num_ctx} # 8192, 16384, 32768, etc.
}
)
return response.json()["response"]
Model Variants
DeepSeek offers several model sizes. VRAM requirements vary significantly with context window size:
| Model | Parameters | Download | VRAM (8K ctx) | VRAM (32K ctx) | Best For |
|---|---|---|---|---|---|
| deepseek-r1:1.5b | 1.5B | 1.1GB | ~2GB | ~4GB | Edge devices, simple tasks |
| deepseek-r1:7b | 7B | 4.7GB | 5.5GB | 8.2GB | Development, general reasoning |
| deepseek-r1:14b | 14B | 9.0GB | 10.6GB | 17.1GB | Complex reasoning, coding |
| deepseek-r1:32b | 32B | 19GB | ~22GB | ~35GB | Advanced math, research |
| deepseek-r1:70b | 70B | 40GB | ~45GB | ~70GB | Maximum local capability |
| DeepSeek-R1 (full) | 671B | N/A | Bedrock only | Bedrock only | Production, enterprise |
VRAM measurements for 7B and 14B tested on NVIDIA L40S GPUs with Ollama (January 2026). Larger models estimated based on parameter scaling.
Practical Examples
Example 1: Code Review with Reasoning
code_to_review = '''
def get_user(id):
query = f"SELECT * FROM users WHERE id = {id}"
return db.execute(query)
'''
prompt = f"""
Review this code for security issues. Explain your reasoning step by step.
```python
{code_to_review}
"""
response = ask_deepseek(prompt) print(response[“answer”])
<OutputBlock>
{`Let me analyze this code for security vulnerabilities...
**Step 1: Identify the pattern**
The code constructs a SQL query using string formatting (f-string) with user input.
**Step 2: Recognize the vulnerability**
This is a classic SQL injection vulnerability. The \`id\` parameter is directly
interpolated into the query without sanitization.
**Step 3: Demonstrate the exploit**
If \`id = "1 OR 1=1"\`, the query becomes:
\`SELECT * FROM users WHERE id = 1 OR 1=1\`
This returns ALL users, bypassing authentication.
**Step 4: Propose the fix**
Use parameterized queries:
\`\`\`python
def get_user(id):
query = "SELECT * FROM users WHERE id = ?"
return db.execute(query, (id,))
\`\`\`
**Severity: CRITICAL**
SQL injection can lead to data theft, data loss, and complete system compromise.`}
</OutputBlock>
### Example 2: Mathematical Proof
```python
prompt = """
Prove that the square root of 2 is irrational. Show each logical step.
"""
response = ask_deepseek(prompt)
print(response["reasoning"])
Example 3: Algorithm Design
prompt = """
Design an algorithm to find the longest palindromic substring in a string.
Analyze the time and space complexity of your solution.
"""
response = ask_deepseek(prompt)
print(response["answer"])
Comparing DeepSeek R1 vs Claude
When should you use each? Here’s how R1 compares to OpenAI o1 (Dec 2024) and Claude 3.5 Sonnet:
| Benchmark | DeepSeek R1 | OpenAI o1 | Claude 3.5 Sonnet |
|---|---|---|---|
| AIME 2024 (math) | 79.8% | 74.4% | 16.0% |
| MATH-500 | 97.3% | 96.4% | 78.3% |
| Codeforces Elo | 2,029 (96.3%) | 1,673 (89%) | - |
| Cost (per 1M tokens) | $0.55 | $15.00 | $3.00 |
| Open source | Yes (MIT) | No | No |
| Local deployment | Yes | No | No |
| Context window | 128K | 128K | 200K |
Sources: DeepSeek R1 paper, OpenAI o1 announcement
Key insight: R1 matches or slightly beats o1 on reasoning benchmarks while being over 100x cheaper ($0.70 vs $75 per 1M tokens) and open source. This is what made R1’s January 2025 release significant—comparable frontier performance at a fraction of the cost.
Qualitative differences:
| Aspect | DeepSeek R1 | OpenAI o1 | Claude 3.5 Sonnet |
|---|---|---|---|
| Reasoning visible | ✅ <think> tags | ❌ Hidden | ❌ Hidden |
| Math/code reasoning | Excellent | Excellent | Good |
| Creative writing | Serviceable | Good | Excellent |
| Instruction following | Good | Good | Better |
| Enterprise support | Community | OpenAI | Anthropic |
Cost Comparison
Running 1 million tokens through each model:
| Model | Input Cost | Output Cost | Total (1M each) |
|---|---|---|---|
| DeepSeek R1 (Bedrock) | $0.14 | $0.55 | ~$0.70 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | ~$18.00 |
| GPT-4 Turbo | $10.00 | $30.00 | ~$40.00 |
| OpenAI o1 | $15.00 | $60.00 | ~$75.00 |
For reasoning-heavy workloads, DeepSeek R1 offers 25-100x cost savings.
Full Example: Reasoning Agent
Here’s a complete example combining Bedrock with structured output:
"""
DeepSeek R1 Reasoning Agent
Uses chain-of-thought for complex problem solving.
"""
import boto3
import json
from pydantic import BaseModel
from typing import Optional
class ReasoningResult(BaseModel):
"""Structured reasoning output."""
question: str
reasoning_steps: list[str]
answer: str
confidence: str # high, medium, low
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def reason(question: str) -> ReasoningResult:
"""Use DeepSeek R1 for structured reasoning."""
prompt = f"""
Solve this problem step by step. Format your response as:
REASONING:
1. [First step]
2. [Second step]
...
ANSWER: [Your final answer]
CONFIDENCE: [high/medium/low]
Problem: {question}
"""
response = bedrock.invoke_model(
modelId="us.deepseek.r1-v1:0",
body=json.dumps({
"prompt": prompt,
"max_tokens": 2048,
"temperature": 0.3
})
)
result = json.loads(response["body"].read())
text = result["choices"][0]["text"]
# Parse structured output
reasoning_section = text.split("ANSWER:")[0].replace("REASONING:", "").strip()
steps = [s.strip() for s in reasoning_section.split("\n") if s.strip() and s[0].isdigit()]
answer_section = text.split("ANSWER:")[1].split("CONFIDENCE:")[0].strip()
confidence = text.split("CONFIDENCE:")[1].strip().lower() if "CONFIDENCE:" in text else "medium"
return ReasoningResult(
question=question,
reasoning_steps=steps,
answer=answer_section,
confidence=confidence
)
# Example usage
result = reason("""
A company's revenue grew 20% in Year 1, declined 10% in Year 2,
and grew 15% in Year 3. If they started with $1,000,000,
what's their revenue after Year 3?
""")
print(f"Question: {result.question}")
print(f"\nReasoning:")
for i, step in enumerate(result.reasoning_steps, 1):
print(f" {step}")
print(f"\nAnswer: {result.answer}")
print(f"Confidence: {result.confidence}")
Question: A company's revenue grew 20% in Year 1... Reasoning: 1. Start with $1,000,000 2. Year 1: $1,000,000 × 1.20 = $1,200,000 3. Year 2: $1,200,000 × 0.90 = $1,080,000 4. Year 3: $1,080,000 × 1.15 = $1,242,000 Answer: $1,242,000 Confidence: high
What’s Next
You’ve learned how to run DeepSeek R1 on Bedrock and locally. Key takeaways:
- Use Bedrock for production - Fully managed, enterprise security, pay-per-token
- Use Ollama for development - Fast iteration, offline work, free
- Leverage transparent reasoning - See the model’s thought process
- Consider cost - 25-100x cheaper than alternatives for reasoning tasks
Further Reading:
- DeepSeek R1 on AWS Bedrock
- DeepSeek R1 Paper
- Ollama Documentation
- Data Models for AI Applications - Structure your DeepSeek outputs
Related Tutorials:
- S3 Vectors Getting Started - Build RAG with DeepSeek
- Fishing Report Agent - Tool-calling patterns
Comments
to join the discussion.