Rajbhandari, Samyam
50  results:
Search for persons X
?
1

System Optimizations for Enabling Training of Extreme Long ..:

, In: Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing,
 
?
2

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimiz..:

, In: Proceedings of the 37th ACM International Conference on Supercomputing,
 
?
3

DeepSpeed- Inference: Enabling Efficient Inference of Trans..:

, In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
4

1-bit LAMB: Communication Efficient Large-Scale Large-Batch..:

, In: 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC),
Li, Conglong ; Awan, Ammar Ahmad ; Tang, Hanlin.. - p. 272-281 , 2022
 
?
5

ZeRO-infinity : breaking the GPU memory wall for extreme..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
6

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Sca..:

, In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
7

DeepSpeed : System Optimizations Enable Training Deep Le..:

, In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
 
?
8

ZeRO : memory optimizations toward training trillion par..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
9

Fast LSTM by dynamic decomposition on cloud and distributed..:

You, Yang ; He, Yuxiong ; Rajbhandari, Samyam...
Knowledge and Information Systems.  62 (2020)  11 - p. 4169-4197 , 2020
 
?
10

Fast LSTM Inference by Dynamic Decomposition on Cloud Syste..:

, In: 2019 IEEE International Conference on Data Mining (ICDM),
You, Yang ; He, Yuxiong ; Rajbhandari, Samyam... - p. 748-757 , 2019
 
?
12

Optimizing CNNs on Multicores for Scalability, Performance ..:

Rajbhandari, Samyam ; He, Yuxiong ; Ruwase, Olatunji..
ACM SIGOPS Operating Systems Review.  51 (2017)  2 - p. 267-280 , 2017
 
?
14

Optimizing CNNs on Multicores for Scalability, Performance ..:

, In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems,
 
?
15

Optimizing the Four-Index Integral Transform Using Data Mov..:

, In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
 
1-15