-
-
Save apdavison/36126ee26067592ee69bf51b57fd3f31 to your computer and use it in GitHub Desktop.
| """ | |
| Creates an HDF5 file with a single dataset of shape (channels, n), | |
| filled with random numbers. | |
| Writing to the different channels (rows) is parallelized using MPI. | |
| Usage: | |
| mpirun -np 8 python demo.py | |
| Small shell script to run timings with different numbers of MPI processes: | |
| for np in 1 2 4 8 12 16 20 24 28 32; do | |
| echo -n "$np "; | |
| /usr/bin/time --format="%e" mpirun -np $np python demo.py; | |
| done | |
| """ | |
| from mpi4py import MPI | |
| import h5py | |
| import numpy as np | |
| n = 100000000 | |
| channels = 32 | |
| num_processes = MPI.COMM_WORLD.size | |
| rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run) | |
| np.random.seed(746574366 + rank) | |
| f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD) | |
| dset = f.create_dataset('test', (channels, n), dtype='f') | |
| for i in range(channels): | |
| if i % num_processes == rank: | |
| #print("rank = {}, i = {}".format(rank, i)) | |
| data = np.random.uniform(size=n) | |
| dset[i] = data | |
| f.close() | |
| """ | |
| Some example timings on my workstation (32 cores): | |
| 1 61.98 70.05 64.61 63.47 | |
| 2 33.22 33.53 34.85 33.45 | |
| 4 44.6 20.38 20.3 19 | |
| 8 13.3 13.76 14.5 13.55 | |
| 12 14.62 14.98 12.75 33.24 | |
| 16 12 13.19 14.76 13.68 | |
| 20 14.75 14.82 14.46 14.33 | |
| 24 16.69 15.81 16.94 15.98 | |
| 28 17.61 18 17.56 17.78 | |
| 32 35.31 35.7 16.16 39.88 | |
| """ |
I'm not an h5py or mpi4py expert, I mostly just posted this here as an aide memoire for myself, so probably you'd be better off reading the documentation for those projects and/or experimenting. I guess, however, that it doesn't matter how many datasets you have, e.g. dset1, dset2.
I am very new to h5py/mpi4py. I am trying to write some data to a single .h5 file in such a way that 2 processes are being in if rank == 0 positive test data(positive dataset) values will be written and if rank == 1 negative test data(negative dataset) values will be written. But when I triy to run with mpiexec -n 2 python parallel_exec.py I'm getting IOError: Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'). Could you suggest some insights that could help me out. Thanks in advance.
I'm sorry I don't have any idea how to fix that problem. Maybe ask on Stack Overflow?
What if I have multiple(more than one) create_dataset like dset = f.create_dataset('test', (channels, n), dtype='f') in line 42 and one more f.create_dataset('test_2', (channels, n), dtype='f'). In that case how should I modify the code