Cross-Attention Fusion: Combining Text Embeddings with Structured Features
Concatenation is the default. Here's why cross-attention works better for combining text embeddings with tabular data—and how to implement it in PyTorch.
Tested, explained, with code that runs
When new models drop or interesting papers come out, I spin up the GPUs, implement the ideas, and report back what actually works. These are practical guides with runnable code, written from the Coast of Somewhere Beautiful. I learn by building, and I'm here to help you do the same.
Browse tutorials
Build a semantic search system to find historically similar College Football Playoff games using Amazon S3 Vectors and Bedrock embeddings.
Create a shared Christmas tree where visitors add AI-generated ornaments using Amazon Nova Canvas, with defense-in-depth content moderation using Bedrock Guardrails and Claude.
Create an AI bartender that suggests cocktails based on weather, searches by ingredient, and generates party menus with shopping lists.
Create an AI agent that combines tide, weather, and marine data to generate fishing reports. Learn tool-calling patterns with the Strands SDK, NOAA APIs, and Claude on AWS Bedrock.
Concatenation is the default. Here's why cross-attention works better for combining text embeddings with tabular data—and how to implement it in PyTorch.
Technical deep dive into DeepSeek V3.2's architecture: DeepSeek Sparse Attention (DSA), integrated reasoning with tool-use, and how it achieves IMO gold-medal performance.
Compare Python's data modeling options for AI/ML applications. Learn when to use dataclasses, TypedDict, or Pydantic for API responses, embeddings metadata, and agent tool contracts.
Build a semantic search system using AWS's new serverless vector storage. Store millions of embeddings in S3 with sub-second query times and serverless pricing.
Compare Mamba's selective state space architecture against LSTM and Transformer for hard drive failure prediction. Learn when SSMs beat attention.
Learn how bi-encoders enable sub-millisecond semantic search over millions of documents. Build a complete search system with sentence-transformers, FAISS indexing, and production-ready Python code.
When bi-encoders aren't accurate enough, cross-encoders dramatically improve search relevance. Build a two-stage retrieval system with MS MARCO rerankers and sentence-transformers.
Build a production image search system using OpenAI's CLIP model, Amazon OpenSearch Serverless for vector storage, and Claude on Bedrock for image descriptions. Complete Python implementation with real AWS outputs.
Build a complete sentence encoder from the ground up. Learn tokenization, embedding layers, pooling strategies, and benchmark on semantic similarity.
Build a production-ready failure prediction system using real Backblaze data. Compare traditional ML vs deep learning approaches and learn when each shines.
Deep dive into DeepSeek R1's architecture: how pure RL training enables chain-of-thought reasoning, the GRPO algorithm, MoE design, and knowledge distillation.
Run DeepSeek R1 on AWS Bedrock or locally with Ollama. Learn to use its chain-of-thought reasoning for complex problem-solving, coding, and math.