I finally understand how large language models actually work
After reading the 2025 textbook “Foundations of LLMs”
It blew my mind and cleared up years of confusion
Here’s everything i learned (in plain english):
An open-source extension for LLM serving engines – LMCache
It's like a caching layer for large-scale, production LLM inference.
LMCache implements smart KV cache management, reusing key–value states of previously seen text across GPU, CPU and local disk.
It can reuse any…
Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.
What exactly is a vector index, and why does it matter for your database performance?
As databases grow larger, searching through millions or billions of vectors can become painfully slow. A vector index is designed to solve this problem by making smart trade-offs between search…
How Mem0 works under the hood
1. Message enters -
A new user message triggers the pipeline.
2. Fetch context -
Mem0 pulls two things from storage:
i) A running summary of past exchanges
ii) The last m raw messages
This keeps history compact but still meaningful.
3. Extraction…
How does HNSW actually work?
𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗡𝗮𝘃𝗶𝗴𝗮𝗯𝗹𝗲 𝗦𝗺𝗮𝗹𝗹 𝗪𝗼𝗿𝗹𝗱 (𝗛𝗡𝗦𝗪) is the algorithm behind most modern vector databases, but the algorithm can seem pretty complex.
Here's the breakdown of how it works, and why so many vector databases use…
Multi-vector embeddings are amazing for retrieval quality.
But are they worth the massive memory bills?
Multi-vector models like ColBERT and ColPali are incredible at capturing semantic information. They maintain token-level meanings in text and identify different parts of…
Fundamentals of a 𝗩𝗲𝗰𝘁𝗼𝗿 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲.
With the rise of GenAI, Vector Databases skyrocketed in popularity. The truth - Vector Databases are also useful outside of a Large Language Model context.
When it comes to Machine Learning, we often deal with Vector Embeddings.…
𝗜𝗡𝗗𝗘𝗫𝗜𝗡𝗚 vs 𝗦𝗘𝗔𝗥𝗖𝗛𝗜𝗡𝗚: 𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗣𝗮𝗿𝘁𝗻𝗲𝗿𝘀𝗵𝗶𝗽 𝗘𝘃𝗲𝗿𝘆 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗠𝘂𝘀𝘁 𝗠𝗮𝘀𝘁𝗲𝗿
Most teams focus on optimizing search algorithms while ignoring the foundation that makes them work.
Here's the breakdown:
What is…
This could be huge for massive memory savings.
Paper shows a new way to cut KV cache while keeping answers strong.
On code completion it even beats full cache using 1.5% memory.
KV cache saves past attention for each token and layer, so cost grows fast.
The problem is that…
This is a best-in-class technical explanation of the causal structure of Transformer LLMs.
If you are averse to the usual janus genre of content, just skip the sentence about “interferometric cognition”; the rest of the post is the opposite of obscurantist (namely: clarifying).
This is a best-in-class technical explanation of the causal structure of Transformer LLMs.
If you are averse to the usual janus genre of content, just skip the sentence about “interferometric cognition”; the rest of the post is the opposite of obscurantist (namely: clarifying).
Stop chunking first. Start embedding first.
𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 improves retrieval quality of your RAG system:
First, let’s do a quick chunking refresher:
𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with)
• Token Chunking - split by token count
•…
HOW INFORMATION FLOWS THROUGH TRANSFORMERS
Because I've looked at those "transformers explained" pages and they really suck at explaining.
There are two distinct information highways in the transformer architecture:
- The residual stream (black arrows): Flows vertically through…
HOW INFORMATION FLOWS THROUGH TRANSFORMERS
Because I've looked at those "transformers explained" pages and they really suck at explaining.
There are two distinct information highways in the transformer architecture:
- The residual stream (black arrows): Flows vertically through… https://t.co/IVsNUq4hUT
1K Followers 3K FollowingAWS Cloud Trainer | AWS Community Builder | Ph.D. scholar @TIETofficial | tweet about programming, Data structure, ML & my experiments with life n work!
70 Followers 812 FollowingI'm honest caring and open minded...I'm a woman of God with a good heart I love the word of God I love to read the word of God
77K Followers 13K FollowingNewsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
590 Followers 95 FollowingSoftware, C++, Communities, Microsoft MVP.
I just make the pie bigger so that everyone gets a slice.
💪 Leading @italiancpp, @coding_gym, and @ml_modena
269 Followers 214 FollowingPrincipal Research Scientist@NVIDIA https://t.co/tNX1USGsr8 https://t.co/i04ibv0cut https://t.co/qLg0w6SGDH
Opinions are my own
11K Followers 50 FollowingAn open-source declarative framework for building modular AI software. Programming—not prompting—LLMs via higher-level abstractions & optimizers.
48K Followers 231 FollowingDysfunctional Programming account #1. Senior SWE at Bloomberg. I write C++ for money. ex-Haskell, ex-OCaml. All opinions are my own.
14K Followers 179 FollowingInterested in programming, electronics, mechanics, and hand drawing.
blog: https://t.co/O70YitimtH
backup: https://t.co/Fu5pufJ3fw
17K Followers 127 FollowingExpert in tackling complex software challenges, with deep knowledge of TCP, MySQL and PostgreSQL kernels, and a passion for AI, history, math, and physics.
204K Followers 26 FollowingManus is the general AI agent that bridges minds and actions: it doesn't just think, it delivers results. Download our app: https://t.co/XSfjRhjdgo
29K Followers 431 FollowingProfessor, CS, U. British Columbia. CIFAR AI Chair, Vector Institute. Sr. Advisor, DeepMind | ML, AI, deep RL, deep learning, AI-Generating Algorithms (AI-GAs)
5K Followers 3K FollowingFind me on Mastodon/Bsky. ~~Compilers @Igalia. @llvmweekly author. Previously: @lowRISC CTO and co-founder, researcher at UCambridge~~
80K Followers 1 FollowingDemocratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
165K Followers 2K FollowingGoogle Whistleblower via James O'Keefe . Disclosed Google's "Machine Learning Fairness", the AI system that censors and controls your access to information.
2K Followers 1K FollowingAssociate Professor, Department of Computer Science @CSAalto, @AaltoUniversity. Research interests: theory of distributed and parallel computing.
3K Followers 674 FollowingWe conduct world-class research on computer science and design innovations at @AaltoUniversity. Meet our people & read more about our work at https://t.co/w8TwAd9FiM.
No recent Favorites. New Favorites will appear here.