jacky @cppchangeworld

c++ dev search engine backend he/him Joined July 2018

Tweets

1K
Followers

20
Following

557
Likes

1K

Hasan Toor ✪ @hasantoxr

4 days ago

I finally understand how large language models actually work After reading the 2025 textbook “Foundations of LLMs” It blew my mind and cleared up years of confusion Here’s everything i learned (in plain english):

45 457 3K 290K 6K

Download Image

Vivek Galatage @vivekgalatage

4 days ago

Static branch prediction on newer Intel processors by Matt Godbolt xania.org/201602/bpu-par…

Vivek Galatage @vivekgalatage

a month ago

Static branch prediction on newer Intel processors by Matt Godbolt xania.org/201602/bpu-par… https://t.co/MYda0XT2T8

8 87 616 228K 500

Download Image

1 15 111 8K 76

Download Image

TuringPost @TheTuringPost

5 days ago

An open-source extension for LLM serving engines – LMCache It's like a caching layer for large-scale, production LLM inference. LMCache implements smart KV cache management, reusing key–value states of previously seen text across GPU, CPU and local disk. It can reuse any…

8 89 431 23K 348

Download Image

Avi Chawla @_avichawla

5 days ago

4 strategies LLMs use to generate text:

2 75 311 22K 283

Download Image

Matei Zaharia @matei_zaharia

6 days ago

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.

30 124 865 97K 575

Download Image

Victoria Slocum @victorialslocum

6 days ago

What exactly is a vector index, and why does it matter for your database performance? As databases grow larger, searching through millions or billions of vectors can become painfully slow. A vector index is designed to solve this problem by making smart trade-offs between search…

11 100 488 21K 393

Download Image

GitHubDaily @GitHub_Daily

6 days ago

在开发 AI 应用时，多轮对话或输入内容过多导致模型回答质量下降，是每位开发者要面对的头疼问题。最近看到 LangChain 团队给出了 6 种解决方案，让 AI 模型在复杂场景下依然保持高质量输出。基于 LangGraph…

4 46 233 24K 268

Download Image

沉浸式翻译 @immersivetran

7 days ago

Google 刚发布了 AI Agent 构建指南老规矩已经翻译成中文版和中英双语版，文末提供下载 1. AI Agent到底是什么？ 2. 如何组装一个AI Agent？大脑 (模型) 双手 (工具) 记忆 (数据架构) 行动计划 3. 如何让Agent靠谱，不说假话？ 4. 如何确保Agent上线后不出乱子？…

7 213 673 68K 876

Download Image

mem0 @mem0ai

6 days ago

How Mem0 works under the hood 1. Message enters - A new user message triggers the pipeline. 2. Fetch context - Mem0 pulls two things from storage: i) A running summary of past exchanges ii) The last m raw messages This keeps history compact but still meaningful. 3. Extraction…

12 30 297 20K 299

Download Video

Victoria Slocum @victorialslocum

7 days ago

How does HNSW actually work? 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗡𝗮𝘃𝗶𝗴𝗮𝗯𝗹𝗲 𝗦𝗺𝗮𝗹𝗹 𝗪𝗼𝗿𝗹𝗱 (𝗛𝗡𝗦𝗪) is the algorithm behind most modern vector databases, but the algorithm can seem pretty complex. Here's the breakdown of how it works, and why so many vector databases use…

6 109 743 37K 673

Download Image

Femke Plantinga @femke_plantinga

7 days ago

Multi-vector embeddings are amazing for retrieval quality. But are they worth the massive memory bills? Multi-vector models like ColBERT and ColPali are incredible at capturing semantic information. They maintain token-level meanings in text and identify different parts of…

3 43 302 13K 275

Download Image

Aurimas Griciūnas @Aurimas_Gr

a week ago

Fundamentals of a 𝗩𝗲𝗰𝘁𝗼𝗿 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲. With the rise of GenAI, Vector Databases skyrocketed in popularity. The truth - Vector Databases are also useful outside of a Large Language Model context. When it comes to Machine Learning, we often deal with Vector Embeddings.…

11 172 889 53K 1K

Download Image

Vivek Galatage @vivekgalatage

a week ago

A good, light read on hardware basics such as cache, prefetch, false sharing, and branches. needoneapp.medium.com/the-hardware-k…

8 143 956 34K 1K

Download Image

jacky @cppchangeworld

a week ago

Optimizing ClickHouse for Intel's ultra-high core count processors clickhouse.com/blog/optimizin…

0 0 0 0 0

Milvus @milvusio

2 weeks ago

𝗜𝗡𝗗𝗘𝗫𝗜𝗡𝗚 vs 𝗦𝗘𝗔𝗥𝗖𝗛𝗜𝗡𝗚: 𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗣𝗮𝗿𝘁𝗻𝗲𝗿𝘀𝗵𝗶𝗽 𝗘𝘃𝗲𝗿𝘆 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗠𝘂𝘀𝘁 𝗠𝗮𝘀𝘁𝗲𝗿 Most teams focus on optimizing search algorithms while ignoring the foundation that makes them work. Here's the breakdown: What is…

0 3 9 315 5

Download Image

Pramod Goyal @goyal__pramod

2 weeks ago

One of the best technical blogs I have ever read period!

11 312 3K 178K 4K

Download Image

Rohan Paul @rohanpaul_ai

3 weeks ago

This could be huge for massive memory savings. Paper shows a new way to cut KV cache while keeping answers strong. On code completion it even beats full cache using 1.5% memory. KV cache saves past attention for each token and layer, so cost grows fast. The problem is that…

7 29 171 11K 129

Download Image

davidad 🎇 @davidad

3 weeks ago

This is a best-in-class technical explanation of the causal structure of Transformer LLMs. If you are averse to the usual janus genre of content, just skip the sentence about “interferometric cognition”; the rest of the post is the opposite of obscurantist (namely: clarifying).

j⧉nus @repligate

3 weeks ago

45 296 2K 264K 3K

Download Image

12 64 676 58K 643

Weaviate vector database @weaviate_io

3 weeks ago

Stop chunking first. Start embedding first. 𝗟𝗮𝘁𝗲 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 improves retrieval quality of your RAG system: First, let’s do a quick chunking refresher: 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 (the basics we all started with) • Token Chunking - split by token count •…

11 37 327 19K 423

Download Video

j⧉nus @repligate

3 weeks ago

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through…