Last active
December 11, 2025 11:02
-
-
Save luraess/ed93cc09ba04fe16f63b4219c1811566 to your computer and use it in GitHub Desktop.
CUDA-aware MPI multi-GPU test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| using MPI | |
| using CUDA | |
| MPI.Init() | |
| comm = MPI.COMM_WORLD | |
| rank = MPI.Comm_rank(comm) | |
| # select device (specifically relevant if >1 GPU per node) | |
| # using node-local communicator to retrieve node-local rank | |
| comm_l = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank) | |
| rank_l = MPI.Comm_rank(comm_l) | |
| gpu_id = CUDA.device!(rank_l) | |
| # using default device if the scheduler exposes different GPU per rank (e.g. SLURM `--gpus-per-task=1`) | |
| # gpu_id = CUDA.device!(0) | |
| # select device | |
| size = MPI.Comm_size(comm) | |
| dst = mod(rank+1, size) | |
| src = mod(rank-1, size) | |
| println("rank=$rank rank_loc=$rank_l (gpu_id=$gpu_id), size=$size, dst=$dst, src=$src") | |
| N = 4 | |
| send_mesg = CuArray{Float64}(undef, N) | |
| recv_mesg = CuArray{Float64}(undef, N) | |
| fill!(send_mesg, Float64(rank)) | |
| CUDA.synchronize() | |
| rank==0 && println("start sending...") | |
| MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm) | |
| println("recv_mesg on proc $rank_l: $recv_mesg") | |
| rank==0 && println("done.") |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reporting issues
Please report any question or issues you may encounter related to GPU-aware MPI on either Julia at scale Discourse or as an issue on MPI.jl.