abmfy

Bowen Wang abmfy

CS undergraduate @ THU, visiting Sky Computing Lab @ UC Berkeley

187 followers · 51 following

Tsinghua University
Berkeley, CA
00:16 (UTC -08:00)
abmfy.github.io

Achievements

x3 x2

Achievements

x3 x2

Highlights

Organizations

Stars

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 2,662 156 Updated Dec 26, 2025

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 472 33 Updated Dec 25, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 477 27 Updated Nov 19, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,323 12,239 Updated Dec 28, 2025

umbertocappellazzo / Omni-AVSR

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".

Python 27 2 Updated Nov 13, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,960 296 Updated Dec 22, 2025

Finb / Bark

Bark is an iOS App which allows you to push custom notifications to your iPhone

Swift 7,262 581 Updated Dec 24, 2025

vipshop / cache-dit

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 824 46 Updated Dec 26, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,204 180 Updated Nov 18, 2024

yichuan-w / LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 6,584 630 Updated Dec 25, 2025

toyaix / TritonLLM

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

Python 62 2 Updated Oct 18, 2025

ChenmienTan / RL2

Python 972 101 Updated Dec 23, 2025

algorithmicsuperintelligence / openevolve

Open-source implementation of AlphaEvolve

Python 4,980 765 Updated Dec 24, 2025

bvaughn / react-resizable-panels

TypeScript 5,028 200 Updated Dec 27, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 10,245 1,282 Updated Nov 3, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,148 107 Updated Dec 28, 2025