Search for persons
X
?
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ,
1
MCR-DL: Mix-and-Match Communication Runtime for Deep Learni..:
, In:
?
Proceedings of the 37th ACM International Conference on Supercomputing ,
2
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimiz..:
, In:
?
SC22: International Conference for High Performance Computing, Networking, Storage and Analysis ,
3
DeepSpeed- Inference: Enabling Efficient Inference of Trans..:
, In:
?
2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) ,
4
1-bit LAMB: Communication Efficient Large-Scale Large-Batch..:
, In:
?
Proceedings of the 34th ACM International Conference on Supercomputing ,
5
NV-group : link-efficient reduction for distributed deep..:
, In:
?
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis ,
6
GEMS : GPU-enabled memory-aware model-parallelism system..:
, In:
?
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ,
7
Efficient Training of Semantic Image Segmentation on Summit..:
, In:
?
Lecture Notes in Computer Science; High Performance Computing ,
8
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Pa..:
, In:
?
2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) ,
10
OMB-UM: Design, Implementation, and Evaluation of CUDA Unif..:
, In:
?
2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS) ,
11
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for H..:
, In:
?
2019 IEEE International Conference on Cluster Computing (CLUSTER) ,
12
Performance Characterization of DNN Training using TensorFl..:
, In:
?
2019 IEEE Symposium on High-Performance Interconnects (HOTI) ,
14
Communication Profiling and Characterization of Deep Learni..:
, In:
?
Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming ,
15