Rajbhandari, Samyam
16  results:
Search for persons X
?
1

System Optimizations for Enabling Training of Extreme Long ..:

, In: Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing,
 
?
2

System Optimizations for Enabling Training of Extreme Long ..:

, In: 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
Jacobs, Sam Ade ; Tanaka, Masahiro ; Zhang, Chengming... - p. 1206-1208 , 2024
 
?
3

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimiz..:

, In: Proceedings of the 37th ACM International Conference on Supercomputing,
 
?
4

DeepSpeed- Inference: Enabling Efficient Inference of Trans..:

, In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
5

1-bit LAMB: Communication Efficient Large-Scale Large-Batch..:

, In: 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC),
Li, Conglong ; Awan, Ammar Ahmad ; Tang, Hanlin.. - p. 272-281 , 2022
 
?
6

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Sca..:

, In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
7

ZeRO-infinity : breaking the GPU memory wall for extreme..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
8

ZeRO : memory optimizations toward training trillion par..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
9

DeepSpeed : System Optimizations Enable Training Deep Le..:

, In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
 
?
10

Fast LSTM Inference by Dynamic Decomposition on Cloud Syste..:

, In: 2019 IEEE International Conference on Data Mining (ICDM),
You, Yang ; He, Yuxiong ; Rajbhandari, Samyam... - p. 748-757 , 2019
 
?
11

Optimizing CNNs on Multicores for Scalability, Performance ..:

, In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems,
 
?
12

Optimizing the Four-Index Integral Transform Using Data Mov..:

, In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
 
?
13

A domain-specific compiler for a parallel multiresolution a..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
14

On fusing recursive traversals of K-d trees:

, In: Proceedings of the 25th International Conference on Compiler Construction,
 
?
15

A communication-optimal framework for contracting distribut..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
Rajbhandari, Samyam ; Nikam, Akshay ; Lai, Pai-Wei... - p. 375-386 , 2014
 
1-15