-
Notifications
You must be signed in to change notification settings - Fork 696
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Replace .data_ptr with .mutable_data_ptr or .const_data_ptr
cla signed
#5267
opened Dec 20, 2025 by
cyyever
Loading…
Optimizations for index_select_scalar_cumsum_kernel on ROCm
cla signed
module: rocm
#5263
opened Dec 18, 2025 by
amd-wsung102
Loading…
Refactor TBE benchmark reporter to use structured data config
cla signed
fb-exported
meta-exported
#5260
opened Dec 18, 2025 by
gchalump
Loading…
Fix blackwell CUTLASS attention meta registration + actually test compile
cla signed
fb-exported
meta-exported
#5259
opened Dec 18, 2025 by
jbschlosser
Loading…
Optimize benchmark index generation with std::sample()
cla signed
fb-exported
meta-exported
#5254
opened Dec 17, 2025 by
terdogan
Loading…
Remove unused dedup_map and associated includes from benchmarks
cla signed
fb-exported
meta-exported
#5253
opened Dec 17, 2025 by
terdogan
Loading…
Move the prefetched info to preallocated buffers
cla signed
fb-exported
meta-exported
#5251
opened Dec 17, 2025 by
chouxi
Loading…
Enable direct MX4→BF16 dequantization to reduce memory (python side) (2/2)
cla signed
fb-exported
meta-exported
#5250
opened Dec 17, 2025 by
armandsauzay
Loading…
Add aarch64 intrinsic-based dequantization to autovec routine
cla signed
fb-exported
meta-exported
#5249
opened Dec 17, 2025 by
Nicoshev
Loading…
Choose _autovec version of GenerateEmbeddingSpMDMRowWiseSparse on AArch64
cla signed
fb-exported
meta-exported
#5247
opened Dec 17, 2025 by
MatzeB
Loading…
Specialize more cases to improve EmbeddingSpMDMNBitBenchmark
cla signed
fb-exported
meta-exported
#5245
opened Dec 17, 2025 by
MatzeB
Loading…
Add EmbeddingSpMDMNBitRowWiseSparse autovectorized variant
cla signed
fb-exported
meta-exported
#5244
opened Dec 17, 2025 by
MatzeB
Loading…
Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions
cla signed
module: rocm
#5233
opened Dec 16, 2025 by
aryaman-gupta
Loading…
support object cache in ssd l2 cache and add more unit tests
cla signed
fb-exported
meta-exported
#5228
opened Dec 16, 2025 by
zhaojuanmao
Loading…
Optimizing 4-bit dequant to FP32 on AArch64 using vectorized intrinsics in EmbeddingSpMDMAutovec
cla signed
#5224
opened Dec 15, 2025 by
marma01
Loading…
Update heuristic to support variant batch sizes
cla signed
fb-exported
meta-exported
#5211
opened Dec 10, 2025 by
zjing14
Loading…
Use H100 runners for OSS CI
cla signed
fb-exported
meta-exported
#5205
opened Dec 9, 2025 by
q10
Loading…
Modifying clear_all_staged_data to accomadate KV Tensor Deletion
cla signed
fb-exported
meta-exported
#5202
opened Dec 9, 2025 by
Raahul46
Loading…
creating delete_rocksdb_checkpoint_dir function under KV Tensor
cla signed
fb-exported
meta-exported
#5201
opened Dec 9, 2025 by
Raahul46
Loading…
Adding returnKVTensorMetaData flag to Staging Read Strategy
cla signed
fb-exported
meta-exported
#5200
opened Dec 9, 2025 by
Raahul46
Loading…
Fix jagged_to_padded_dense autograd
cla signed
fb-exported
meta-exported
#5191
opened Dec 8, 2025 by
yunjiangster
Loading…
Add warp parallelism to populate_bucketized_permute
cla signed
fb-exported
meta-exported
#5189
opened Dec 8, 2025 by
AlbertDachiChen
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.