Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.

Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

Running COMSOL 4.x and COMSOL V4.x with MATLAB on cluster

Please login with a confirmed email address before reporting spam

is there anyone have some successful experiences on Running COMSOL 4.x and COMSOL V4.x with MATLAB on cluster.

Could you please share the PBS-job script for these two running modes?? especially for the running COMSOL with MATLAB on cluster!!


4 Replies Last Post 14 févr. 2011, 05:55 UTC−5

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 11 févr. 2011, 10:37 UTC−5
First, you don't necessarily need cluster computing node added to your model at least when you are not using parametric sweeps. This is what I have been able to verify through experience. Just be sure that you are using either MUMPS or PARDISO solvers.

Here is how I run comsol with matlab livelink through a shell script which PBS runs for executing jobs. Please note that request for assigning a job is assumed to have been made already

cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd
/comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt
/comsol41/bin/comsol server -nn 4 -mpmode owner -port 2222 > server.txt & # start comsol server at port 2222
/matlab_r2010a/bin/matlab -nosplash -nodesktop -r matlab_script > output.txt # launch matlab and run matlab_script.m
/cluster/comsol41/bin/comsol mpd allexit # close mpd when matlab job is over

Make sure that you change "-nn 4" above to correct value of physical nodes you have requested from PBS. Now in the file matlab_script.m you have to write the following commands:

addpath('/comsol41/mli') % this adds comsol libraries to matlab's path
mphstart('2222') % connect to comsol server at port 2222

That is all you need I think. I create matlab scripts from Comsol's GUI essentially so it takes care of the rest nicely. When the PBS job ends, comsol server is terminated as well so you don't need to manually close the port.

Executing a comsol batch job is even simpler. Here is how it works:

cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd
/comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt
/comsol41/bin/comsol batch -nn 4 -inputfile input.mph -outputfile output.mph > batch.log
/comsol41/bin/comsol mpd allexit

Feel free to write if something fails or I missed. Wish you all the best!
First, you don't necessarily need cluster computing node added to your model at least when you are not using parametric sweeps. This is what I have been able to verify through experience. Just be sure that you are using either MUMPS or PARDISO solvers. Here is how I run comsol with matlab livelink through a shell script which PBS runs for executing jobs. Please note that request for assigning a job is assumed to have been made already cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd /comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt /comsol41/bin/comsol server -nn 4 -mpmode owner -port 2222 > server.txt & # start comsol server at port 2222 /matlab_r2010a/bin/matlab -nosplash -nodesktop -r matlab_script > output.txt # launch matlab and run matlab_script.m /cluster/comsol41/bin/comsol mpd allexit # close mpd when matlab job is over Make sure that you change "-nn 4" above to correct value of physical nodes you have requested from PBS. Now in the file matlab_script.m you have to write the following commands: addpath('/comsol41/mli') % this adds comsol libraries to matlab's path mphstart('2222') % connect to comsol server at port 2222 That is all you need I think. I create matlab scripts from Comsol's GUI essentially so it takes care of the rest nicely. When the PBS job ends, comsol server is terminated as well so you don't need to manually close the port. Executing a comsol batch job is even simpler. Here is how it works: cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd /comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt /comsol41/bin/comsol batch -nn 4 -inputfile input.mph -outputfile output.mph > batch.log /comsol41/bin/comsol mpd allexit Feel free to write if something fails or I missed. Wish you all the best!

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 11 févr. 2011, 11:37 UTC−5
Thank you, Shakeeb.

As for the COMSOL batch jobs, I changed the solver to either MUMPS or PARDISO according to the suggestion you gave in another thread. the mph model I have is a 3D RF model and it will sweep the frequency.

I followed the job on cluster. I logged in one of the node which the job was running on. monitored the process usage.
The process usage first increased to 100~799% (I used 8 process on each node.) after around 1~2 mins, the usage maintained at 100%, and never went more than 100% which was not something I expected.

As for the memory usage, it used ~15G at each node (the job run on 3 nodes, and 8cores on each node). I think the memory usage was too large, since the job was running on 3 nodes, and I also tested the mph model on my own computer, it just used ~15G and it worked well, but just took too long time to calculate. so on cluster on the memory-distributed mode, I think it should use less than 15G on each node. So perhaps COMSOL was not running on the memory-distributed mode. Or each node run for a different frequency since the model will do parametric sweep. Anyway I really have no idea what's going on with the job.

the PBS job script I used is following, actually it is almost the same as yours except that I specify the cores will be used on each node. I really have no idea why it doesn't work correctly in our case. maybe something wrong with my mph model. Do you have any idea


#!/bin/sh
#PBS -l nodes=3:ppn=8
#PBS -l walltime=00:30:00
#PBS -o output_comsolmpirun2_0202.file
#PBS -e error_comsolmpirun2_0202.file

cd $PBS_O_WORKDIR

module load COMSOL/4.1.0.112

sort $PBS_NODEFILE |uniq > comsolenodes

NODES=`wc -l comsolenodes | awk '{print $1}'`
TOTAL_TASKS=`wc -l $PBS_NODEFILE | awk '{print $1}'`
CORES=$(($TOTAL_TASKS / $NODES))


comsol -nn $NODES -mpirsh ssh mpd boot -f comsolenodes
comsol -nn $NODES -np $CORES batch -inputfile mpitest.mph
comsol -mpirsh ssh mpd allexit



as for the COMSOL run with matlab, I will look into it later after I solve the batch problem.

really appreciate your reply.
Thank you, Shakeeb. As for the COMSOL batch jobs, I changed the solver to either MUMPS or PARDISO according to the suggestion you gave in another thread. the mph model I have is a 3D RF model and it will sweep the frequency. I followed the job on cluster. I logged in one of the node which the job was running on. monitored the process usage. The process usage first increased to 100~799% (I used 8 process on each node.) after around 1~2 mins, the usage maintained at 100%, and never went more than 100% which was not something I expected. As for the memory usage, it used ~15G at each node (the job run on 3 nodes, and 8cores on each node). I think the memory usage was too large, since the job was running on 3 nodes, and I also tested the mph model on my own computer, it just used ~15G and it worked well, but just took too long time to calculate. so on cluster on the memory-distributed mode, I think it should use less than 15G on each node. So perhaps COMSOL was not running on the memory-distributed mode. Or each node run for a different frequency since the model will do parametric sweep. Anyway I really have no idea what's going on with the job. the PBS job script I used is following, actually it is almost the same as yours except that I specify the cores will be used on each node. I really have no idea why it doesn't work correctly in our case. maybe something wrong with my mph model. Do you have any idea #!/bin/sh #PBS -l nodes=3:ppn=8 #PBS -l walltime=00:30:00 #PBS -o output_comsolmpirun2_0202.file #PBS -e error_comsolmpirun2_0202.file cd $PBS_O_WORKDIR module load COMSOL/4.1.0.112 sort $PBS_NODEFILE |uniq > comsolenodes NODES=`wc -l comsolenodes | awk '{print $1}'` TOTAL_TASKS=`wc -l $PBS_NODEFILE | awk '{print $1}'` CORES=$(($TOTAL_TASKS / $NODES)) comsol -nn $NODES -mpirsh ssh mpd boot -f comsolenodes comsol -nn $NODES -np $CORES batch -inputfile mpitest.mph comsol -mpirsh ssh mpd allexit as for the COMSOL run with matlab, I will look into it later after I solve the batch problem. really appreciate your reply.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 11 févr. 2011, 12:50 UTC−5
Well, this doesn't leave me with much ideas. Just make sure that you have the latest patch installed as that might be the difference between our respective environments. Otherwise it all looks similar. And I don't think -np switch should be of consequence; by default comsol assumes all cores available on a node if not specified a number explicitly.

I have actually not tried to run sweeps yet. Only today did I manage to solve my first ever 3D rf problem on distributed memory model both through batch and matlab interface. Does this work for you? It might be useful first to test if you get reliable results without sweeping through parameters. Although I definitely acknowledge the advantages of utilizing parametric features of comsol but that is something which in most cases could alternatively be achieved through matlab script at the cost of computational efficiency.

I would also suggest you to contact comsol support without waiting to exhaust all other avenues. They might take some time occasionally but they do try to help you out. At least that has been my experience. Make sure that you attach detail output of mpd as well. You can get it by starting mpd the way I mentioned in my post earlier.

Wish you success!
Well, this doesn't leave me with much ideas. Just make sure that you have the latest patch installed as that might be the difference between our respective environments. Otherwise it all looks similar. And I don't think -np switch should be of consequence; by default comsol assumes all cores available on a node if not specified a number explicitly. I have actually not tried to run sweeps yet. Only today did I manage to solve my first ever 3D rf problem on distributed memory model both through batch and matlab interface. Does this work for you? It might be useful first to test if you get reliable results without sweeping through parameters. Although I definitely acknowledge the advantages of utilizing parametric features of comsol but that is something which in most cases could alternatively be achieved through matlab script at the cost of computational efficiency. I would also suggest you to contact comsol support without waiting to exhaust all other avenues. They might take some time occasionally but they do try to help you out. At least that has been my experience. Make sure that you attach detail output of mpd as well. You can get it by starting mpd the way I mentioned in my post earlier. Wish you success!

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 14 févr. 2011, 05:55 UTC−5
I tested it for a single frequency without sweeping on 3 nodes, with both MUMPS and SPOOLES (I think it is SPOOLES which run distributed according the COMSOL DOC). but the same thing for CPU and memory usage happened.

Meanwhile I tested it on my own PC (with 16G) using iterative (GMRES) solver, it finished in few minutes.


Anyway, I will contact support
I tested it for a single frequency without sweeping on 3 nodes, with both MUMPS and SPOOLES (I think it is SPOOLES which run distributed according the COMSOL DOC). but the same thing for CPU and memory usage happened. Meanwhile I tested it on my own PC (with 16G) using iterative (GMRES) solver, it finished in few minutes. Anyway, I will contact support

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.