Skip to content
Snippets Groups Projects
  • Izzard, Robert Dr (Maths & Physics)'s avatar
    495803e3
    Slurm is now basically working. · 495803e3
    I've added a new grid_option, num_processes, which is the number of processes launched by Python's multiprocessing. num_cores is used to set this:
    
    if > 0 use the number specified (as previously, so backwards compatibility is fine)
    if == 0 use the number of logical cores
    if == -1 use the number of physical cores
    
    Try running it with a command like:
    
    ---
    rm -rf /tmp/slurm ; nice python3.9 ./src/python/ensemble.py dists=Moe binaries=False r=100 verbosity=1 max_evolution_time=10 slurm_dir=/tmp/slurm slurm_partition=debug slurm_memory=100MB monte_carlo_kicks=0 save_ensemble_chunks=False num_cores=-1 slurm=1 slurm_njobs=2 num_cores=2
    ---
    
    You will want to change num_cores and slurm_njobs to suit. Each Slurm job gets num_processes cores allocated to it.
    
    Note: you should set your slurm directory to be empty. This isn't really required, but makes debugging a lot easier.
    
    You also have to set the slurm_partition by hand - this is something you need to find out based on your cluster. In the above example I use "debug" because this is the default.
    
    There are quite a few changes internally, particularly new functions to load, save and merge Population objects and their data (mostly) correctly, and updates to the dict merging functions that this required.
    
    please report bugs because there will be many!
    495803e3
    History
    Slurm is now basically working.
    I've added a new grid_option, num_processes, which is the number of processes launched by Python's multiprocessing. num_cores is used to set this:
    
    if > 0 use the number specified (as previously, so backwards compatibility is fine)
    if == 0 use the number of logical cores
    if == -1 use the number of physical cores
    
    Try running it with a command like:
    
    ---
    rm -rf /tmp/slurm ; nice python3.9 ./src/python/ensemble.py dists=Moe binaries=False r=100 verbosity=1 max_evolution_time=10 slurm_dir=/tmp/slurm slurm_partition=debug slurm_memory=100MB monte_carlo_kicks=0 save_ensemble_chunks=False num_cores=-1 slurm=1 slurm_njobs=2 num_cores=2
    ---
    
    You will want to change num_cores and slurm_njobs to suit. Each Slurm job gets num_processes cores allocated to it.
    
    Note: you should set your slurm directory to be empty. This isn't really required, but makes debugging a lot easier.
    
    You also have to set the slurm_partition by hand - this is something you need to find out based on your cluster. In the above example I use "debug" because this is the default.
    
    There are quite a few changes internally, particularly new functions to load, save and merge Population objects and their data (mostly) correctly, and updates to the dict merging functions that this required.
    
    please report bugs because there will be many!
Code owners
Assign users and groups as approvers for specific file changes. Learn more.