Awan, Ammar Ahmad
507  Ergebnisse:
Personensuche X
?
1

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimiz..:

, In: Proceedings of the 37th ACM International Conference on Supercomputing,
 
?
2

MCR-DL: Mix-and-Match Communication Runtime for Deep Learni..:

, In: 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS),
Anthony, Quentin ; Awan, Ammar Ahmad ; Rasley, Jeff... - p. 996-1006 , 2023
 
?
3

1-bit LAMB: Communication Efficient Large-Scale Large-Batch..:

, In: 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC),
Li, Conglong ; Awan, Ammar Ahmad ; Tang, Hanlin.. - p. 272-281 , 2022
 
?
4

DeepSpeed- Inference: Enabling Efficient Inference of Trans..:

, In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
5

NV-group : link-efficient reduction for distributed deep..:

, In: Proceedings of the 34th ACM International Conference on Supercomputing,
 
?
6

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Pa..:

, In: Lecture Notes in Computer Science; High Performance Computing,
 
?
7

GEMS : GPU-enabled memory-aware model-parallelism system..:

, In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
 
?
8

Efficient Training of Semantic Image Segmentation on Summit..:

, In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
Anthony, Quentin ; Awan, Ammar Ahmad ; Jain, Arpan.. - p. 1015-1023 , 2020
 
?
11

Communication Profiling and Characterization of Deep Learni..:

, In: 2019 IEEE Symposium on High-Performance Interconnects (HOTI),
 
?
12

Performance Characterization of DNN Training using TensorFl..:

, In: 2019 IEEE International Conference on Cluster Computing (CLUSTER),
 
?
13

Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for H..:

, In: 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS),
 
?
14

High performance distributed deep learning : a beginner'..:

, In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming,
 
?
15

Optimized Broadcast for Deep Learning Workloads on Dense-GP..:

, In: Proceedings of the 25th European MPI Users' Group Meeting,
 
1-15