-
Tsinghua University
- Berkeley, CA
-
00:16
(UTC -08:00) - abmfy.github.io
Highlights
- Pro
Stars
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Accelerating MoE with IO and Tile-aware Optimizations
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A high-throughput and memory-efficient inference and serving engine for LLMs
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Bark is an iOS App which allows you to push custom notifications to your iPhone
🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
Open-source implementation of AlphaEvolve
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Containerization is a Swift package for running Linux containers on macOS.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Lets make video diffusion practical!
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
FlashMLA: Efficient Multi-head Latent Attention Kernels
Efficient 2:4 sparse training algorithms and implementations
GPGPU processor supporting RISCV-V extension, developed with Chisel HDL
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Awesome LLM compression research papers and tools.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
A nanoGPT pipeline packed in a spreadsheet




