ML | game engines | compilers | HPC
Research Engineer @PrimeIntellect
Building my custom pytorch: https://t.co/5PnTE712WQgithub.com/MarioSieg Berlin, GermanyJoined February 2024
my matrix multiplication kernels for magnetron now beat PyTorch's performance on my CPU:
Magnetron matmul: 1955.2 GFLOP/s
Torch matmul: 1752.3 GFLOP/s
check out the kernel code: github.com/MarioSieg/magn…
magnetron detects cache sizes and uses state of the art block tuning…
My attempt at creating a creepy "Jarvis like" sci-fi horror AI assisant. Built him a year ago in C++ with raylib for rendering and whisper.cpp for speech recognition. The mouth animation definitely needs some work
magnetron's GPT-2 inference example is working!!
one year ago I wrote the first C file to build a small pytorch clone, today LLMs can be implemented by it.
Took a lot of hours and work to get everything right and I can't wait to continue with even more advanced models like…
Upcoming features of piquant - our blazingly fast quantization library
- int2 quantization
- direct quanitization of bf16 tensors
- sign quantization
- SIMD kernels for stochastic rounding
Sometimes I have random creative "attacks" where I build random stuff.
Last time it was techno music generated with pure code, this time it's a small cryptocurrency...
It's not about money, it's about exploring, learning and having fun. This approach taught me 99% of what I…
Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run.
Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks.
Contribute towards AGI via open, permissionless compute.
Our fast quantization library piquant will support 2-bit quantization and new 4-bit kernels for even higher performance on AVX-512 CPUs in the next release.
Get ready to crunch those packed integers!
This is not PyTorch.
It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT.
Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more.
The API got very close to PyTorch the last month, more to come!…
To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors.
Boolean tensors are used for filtering, indexing and as attention and loss masks and much more.
The main logical operators AND, OR, XOR and NOT are now supported.
Another more step towards LLM…
Awesome work by @_mario_neo_ to accelerate quantization of pseudo-gradients in decentralized training settings like DiLoCo - already integrated in pccl (prime collective communication library)
Awesome work by @_mario_neo_ to accelerate quantization of pseudo-gradients in decentralized training settings like DiLoCo - already integrated in pccl (prime collective communication library)
7K Followers 3K Following🛠️ Founder @AbideAI 👐 ML Engineer 👩💻☕
📚 Book Author: LLMOps (2025), ✍️ GPU Engg for AI Systems (2026)
💬🐦 Talk to me about LLMs, MLSys & GPU Training
166 Followers 4K FollowingJunior@Nankai University | Major in CS | Research in CV, GenAI | Full Stack Developer | Beginner in Crypto | Runner, Cyclist, Gym-goer | Rap enthusiast
2K Followers 2K Following🐻 7+ yrs AI × Blockchain dev
Building DeepFocus & @bemeos_
curiosity runs me, love writing getting started guides and deep tech reports on web3 and AI products
450 Followers 2K FollowingPreviously Head of Software at Thred, realtime simulation of factories at scale.
Previously @ Meta, co-founder SCR & Perplexico
Rusty Rob on Youtube
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
848 Followers 5K FollowingResearch Fellow @MSFTResearch India | Ex RA @gatech_scs, (B. + M.) Tech. CSE IITD | PL, Verification and Theorem Proving | Sports + Music + Food (in that order)
9K Followers 988 FollowingDeveloper of next-generation interactive entertainment experiences born from infinite wisdom.
Currently working on Kowloon's Curse.
29K Followers 2K FollowingSoftware performance expert. Ranked in the top 2% of scientists globally (Stanford/Elsevier 2024) and among GitHub's top 1000 developers.
15K Followers 2K FollowingPassionate about gamedev technologies. I create things. Created #raylib, #raygui, #rres and many other tools as @raylibtech. FOSS at: https://t.co/sIUDDNc2tV
35K Followers 961 FollowingAuthor of https://t.co/arW0hnVET0 and https://t.co/RN9xXOzhON. @sourcegraph working on @ampcode. Ex-@zeddotdev. Programming where the rubber hits the road.
451K Followers 77 FollowingTensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation
1K Followers 492 Followinggame engine development - building an open world crafting game - PhD MechE Univ. Colorado (laser sensing) day job = methane detection and mitigation technology
42K Followers 204 FollowingDysfunctional Programming account #1. Senior SWE at Bloomberg. I write C++ for money. ex-Haskell, ex-OCaml. All opinions are my own.
101K Followers 2K FollowingFollow for posts about GitHub repos, DSPy, and agents
Subscribe for top posts
DM to share your AI project (Due to volume of DMs I'll prioritize subscribers)
4.3M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
178 Followers 266 FollowingI like building software | Working on a tensor library from scratch: https://t.co/CPhWN3O8rq | Blog: https://t.co/aoUjpbPpw9
61K Followers 119 FollowingFounded by @MichaelLarabel in 2004, Phoronix is the largest #opensource news, #Linux hardware reviews & Linux PC/server/HPC performance benchmark site.
18K Followers 337 FollowingSoftware engineer and logic design hobbyist. Since 2021, Building RISC-V SoCs from scratch and hacking xv6/Linux to life, TU Berlin graduated
No recent Favorites. New Favorites will appear here.