Skip to content
View abmfy's full-sized avatar

Highlights

  • Pro

Organizations

@THUDM @SAST-skill-docers @vllm-project @NOP-Processor

Block or report abmfy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 2,662 156 Updated Dec 26, 2025

Accelerating MoE with IO and Tile-aware Optimizations

Python 472 33 Updated Dec 25, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 477 27 Updated Nov 19, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,323 12,239 Updated Dec 28, 2025

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".

Python 27 2 Updated Nov 13, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,960 296 Updated Dec 22, 2025

Bark is an iOS App which allows you to push custom notifications to your iPhone

Swift 7,262 581 Updated Dec 24, 2025

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 824 46 Updated Dec 26, 2025

Puzzles for learning Triton

Jupyter Notebook 2,204 180 Updated Nov 18, 2024

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 6,584 630 Updated Dec 25, 2025

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

Python 62 2 Updated Oct 18, 2025
Python 972 101 Updated Dec 23, 2025

Open-source implementation of AlphaEvolve

Python 4,980 765 Updated Dec 24, 2025

Nano vLLM

Python 10,245 1,282 Updated Nov 3, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,148 107 Updated Dec 28, 2025

Containerization is a Swift package for running Linux containers on macOS.

Swift 8,199 235 Updated Dec 23, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 697 89 Updated Dec 28, 2025

Lets make video diffusion practical!

Python 16,410 1,601 Updated Oct 16, 2025

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 96,717 10,730 Updated Dec 9, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,939 923 Updated Dec 15, 2025

中英文停用词表(3076,包含部分特殊符号)

21 Updated Dec 30, 2024

一年过去了,你在华子食堂里花的钱都花在哪儿了?

Python 472 81 Updated Dec 23, 2024

Efficient 2:4 sparse training algorithms and implementations

Python 58 1 Updated Dec 8, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 3,104 269 Updated Dec 5, 2024

GPGPU processor supporting RISCV-V extension, developed with Chisel HDL

Scala 856 118 Updated Dec 28, 2025

APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding

Python 14 Updated Jul 22, 2024

Awesome LLM compression research papers and tools.

1,750 113 Updated Nov 10, 2025

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,916 94 Updated Aug 15, 2024

A nanoGPT pipeline packed in a spreadsheet

2,142 128 Updated Jun 17, 2024
Next