Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

Optimal setting for large 3D problem with the system of multiple cores and large amount of ram

Please login with a confirmed email address before reporting spam

I working with 32 core, 64 thread, 256GB ram and COMSOL 3.5a in windows server, 64bit system.

And I want to solve large 3D problem (1000k or 10000k of DOF)in the structrural mechanics module

with stationary solver.

With the default setting, UMPFACK or SPOOLES with preconditioner nested dissection,

It seems that the setting doesn't bring any benefit from the multi-core.

And there should be some optimal setting optimized for multi-core system.


GMRES with geometric multigrid doesn't do, neither.


Please give me any comment.

12 Replies Last Post 3 déc. 2010, 07:56 UTC−5
Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 11 nov. 2010, 22:55 UTC−5
You did not mention if you have a 64-bit system or not. Also, what is your OS ?
You did not mention if you have a 64-bit system or not. Also, what is your OS ?

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 12 nov. 2010, 01:44 UTC−5
Thank you for replying. It's 64 bit windows server system.
Thank you for replying. It's 64 bit windows server system.

Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 12 nov. 2010, 22:22 UTC−5
Your 64-bit OS will allow COMSOL to take full advantage of all your memory. Can you verify how much memory is being used on your problem ?

To use the multiple cores of your machine in windows, you must tell COMSOL how many cores you want to use by adding the ¨-np M¨ switch to the COMSOL startup, where ¨M¨ is the number of cores to be used. Perhaps the best way is to modify your shortcut to start COMSOL and add this information.
Your 64-bit OS will allow COMSOL to take full advantage of all your memory. Can you verify how much memory is being used on your problem ? To use the multiple cores of your machine in windows, you must tell COMSOL how many cores you want to use by adding the ¨-np M¨ switch to the COMSOL startup, where ¨M¨ is the number of cores to be used. Perhaps the best way is to modify your shortcut to start COMSOL and add this information.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 14 nov. 2010, 20:57 UTC−5
Actually I tried all that with '-np M ' option, but it just increases CPU usage without any improvement in reducing calculation time. Thus I am using just '-np 6' instead of using maximum '-np 64'.

My point is that I think there would be the optimized solver or solver options for tens of thread and core, or cluster system.

Actually I tried all that with '-np M ' option, but it just increases CPU usage without any improvement in reducing calculation time. Thus I am using just '-np 6' instead of using maximum '-np 64'. My point is that I think there would be the optimized solver or solver options for tens of thread and core, or cluster system.

Ivar KJELBERG COMSOL Multiphysics(r) fan, retired, former "Senior Expert" at CSEM SA (CH)

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 15 nov. 2010, 01:43 UTC−5
Hi

do not forget that not all solver can take use of the multiple CPU option, and that not all steps in a solver sequence can use multiple CPU. So depending on what you are solving you will see the CPU usage switch from 1 to n CPU's

I have also noted some drastic changes in the context switching but this sems to be related to my Win-7 OS upgrades (tahns MS), because since last OS upgrade suddenly a normal run went from >5M context switches /sec down to a few 100'000 and my multiple CPU occupation counters turn all green 100% (again as for a few months ago)

So multiple CPU is not necesarily a direct 1/n time reduction, but for most large models it gives a significant boost

--
Good luck
Ivar
Hi do not forget that not all solver can take use of the multiple CPU option, and that not all steps in a solver sequence can use multiple CPU. So depending on what you are solving you will see the CPU usage switch from 1 to n CPU's I have also noted some drastic changes in the context switching but this sems to be related to my Win-7 OS upgrades (tahns MS), because since last OS upgrade suddenly a normal run went from >5M context switches /sec down to a few 100'000 and my multiple CPU occupation counters turn all green 100% (again as for a few months ago) So multiple CPU is not necesarily a direct 1/n time reduction, but for most large models it gives a significant boost -- Good luck Ivar

Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 15 nov. 2010, 14:55 UTC−5
You did not specify that you had a cluster. I had assumed you were asking about a shared-memory system only.

COMSOL, as well as other commercial codes, are early in the game of how best to optimize solver settings for parallel processing systems. If you have a distributed parallel processing system, as in multiple cpu nodes connected by ethernet in a cluster, then only the direct solvers MUMPS and PARDISO can even take advantage of this cluster setup. GREMS does not presently work with distributed parallel processing. The iterative solvers in a distributed parallel processing are to be released later, but are being worked on.

Distributed parallel processing is still a heavily researched topic in computer science in general, not just at COMSOL.

For shared memory systems, the solver configuration is essentially the same as a single processor system.

I guess I don't understand your question.
You did not specify that you had a cluster. I had assumed you were asking about a shared-memory system only. COMSOL, as well as other commercial codes, are early in the game of how best to optimize solver settings for parallel processing systems. If you have a distributed parallel processing system, as in multiple cpu nodes connected by ethernet in a cluster, then only the direct solvers MUMPS and PARDISO can even take advantage of this cluster setup. GREMS does not presently work with distributed parallel processing. The iterative solvers in a distributed parallel processing are to be released later, but are being worked on. Distributed parallel processing is still a heavily researched topic in computer science in general, not just at COMSOL. For shared memory systems, the solver configuration is essentially the same as a single processor system. I guess I don't understand your question.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 22 nov. 2010, 00:57 UTC−5
I am not using cluster system. The reason I mentioned cluster system was that, I thought, similar solver algorithm would be used for any kind of parallel processing system including local system or cluster system.
But like you said, it turns out to be not true.

It seems that you mean COMSOL is already optimized for parallel processing, as long as I use COMSOL in one local system. And there is nothing much I can do anymore.


The point I wanted to say is "more cpu core doesn't mean less calculation time".
I am experiencing "a lot more of slow down with more cpu cores"

Thank you.
I am not using cluster system. The reason I mentioned cluster system was that, I thought, similar solver algorithm would be used for any kind of parallel processing system including local system or cluster system. But like you said, it turns out to be not true. It seems that you mean COMSOL is already optimized for parallel processing, as long as I use COMSOL in one local system. And there is nothing much I can do anymore. The point I wanted to say is "more cpu core doesn't mean less calculation time". I am experiencing "a lot more of slow down with more cpu cores" Thank you.

Ivar KJELBERG COMSOL Multiphysics(r) fan, retired, former "Senior Expert" at CSEM SA (CH)

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 22 nov. 2010, 03:40 UTC−5
Hi

check also how your CPU are being used, if you have a Win-7 system I noticed suddenly for some months that my heavy calculations took very long, due to a very heavy Context switching >5Msw/s. This suddenly dissapeard after last MS update of the OS, just as it appeard some months before. So it's not only COMSOL that are tweaking our CPU handling ;)

for parallel processing often Linux is handling better the RAM and disk swapping
--
Good luck
Ivar
Hi check also how your CPU are being used, if you have a Win-7 system I noticed suddenly for some months that my heavy calculations took very long, due to a very heavy Context switching >5Msw/s. This suddenly dissapeard after last MS update of the OS, just as it appeard some months before. So it's not only COMSOL that are tweaking our CPU handling ;) for parallel processing often Linux is handling better the RAM and disk swapping -- Good luck Ivar

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 22 nov. 2010, 16:02 UTC−5
A few ideas:

1.) You might want to check out any knowledge base article on the subject such as www.comsol.com/support/knowledgebase/830/

2.) You mentioned that your system is 32 core, 64 threads. You might want to try out shutting off hyperthreading or only setting 32 cpus as this can effect some solvers. (I've noticed that windows 7 does a good job handling this but if since you're using some other version of windows I don't know if it will matter.)

3.) You could try running in client/server mode or in batch mode to improve your workflow by working on other models or post-processing while you're waiting on results.

4.) I believe version 4 added some features with for using windows server so you might check that out.
A few ideas: 1.) You might want to check out any knowledge base article on the subject such as http://www.comsol.com/support/knowledgebase/830/ 2.) You mentioned that your system is 32 core, 64 threads. You might want to try out shutting off hyperthreading or only setting 32 cpus as this can effect some solvers. (I've noticed that windows 7 does a good job handling this but if since you're using some other version of windows I don't know if it will matter.) 3.) You could try running in client/server mode or in batch mode to improve your workflow by working on other models or post-processing while you're waiting on results. 4.) I believe version 4 added some features with for using windows server so you might check that out.

Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 22 nov. 2010, 17:12 UTC−5
There is certainly some overhead associated with parallel processing that causes a nonlinear speedup. the overhead is greater for distributed parallel processing than shared memory parallel processing. It is also easier to implement a shared-memory parallel processing capability than the distributed parallel processing. Hence, COMSOL has released the shared-memory capability prior to the distributed memory capability.

On the hyperthereading capability of most modern Intel processors, I have found it to be worth about 1/3 a core per hyperthread. So, if you have 32 physical cores, and 32 hyperthreaded cores, you might be able to achieve a speedup of 32 + 32/3 ~ 42.7 in theory if you had no overhead associated with the shared-memory parallel processing.

I would suggest an experiment with your system that can be easily run with COMSOL. Take an example problem that uses a reasonable amount of your memory, and spends most of the time in the cpu solving, Then run consecutive runs

comsol -nn 1 -np 1
comsol -nn 1 -np 2
.
.
.
comsol -nn 1 -np 64

and record the cpu time of each run. You can automate this using COMSOL in batch mode. Then compute the ratio of the Nth core to a single core of cpu times you have recorded. (-np N/ -np 1). Then plot this up as a function of the N cores and send it to us on this disscussion forum and we can help you make a judgement of whether COMSOL is behaving as it should or not. I would expect a slow down between processors, but between cores on a processor, I expect linear speedup.

Your hardware may have an intermediate peak where COMSOL cannot improve in performance in a shared memory machine.
There is certainly some overhead associated with parallel processing that causes a nonlinear speedup. the overhead is greater for distributed parallel processing than shared memory parallel processing. It is also easier to implement a shared-memory parallel processing capability than the distributed parallel processing. Hence, COMSOL has released the shared-memory capability prior to the distributed memory capability. On the hyperthereading capability of most modern Intel processors, I have found it to be worth about 1/3 a core per hyperthread. So, if you have 32 physical cores, and 32 hyperthreaded cores, you might be able to achieve a speedup of 32 + 32/3 ~ 42.7 in theory if you had no overhead associated with the shared-memory parallel processing. I would suggest an experiment with your system that can be easily run with COMSOL. Take an example problem that uses a reasonable amount of your memory, and spends most of the time in the cpu solving, Then run consecutive runs comsol -nn 1 -np 1 comsol -nn 1 -np 2 . . . comsol -nn 1 -np 64 and record the cpu time of each run. You can automate this using COMSOL in batch mode. Then compute the ratio of the Nth core to a single core of cpu times you have recorded. (-np N/ -np 1). Then plot this up as a function of the N cores and send it to us on this disscussion forum and we can help you make a judgement of whether COMSOL is behaving as it should or not. I would expect a slow down between processors, but between cores on a processor, I expect linear speedup. Your hardware may have an intermediate peak where COMSOL cannot improve in performance in a shared memory machine.

Jim Freels mechanical side of nuclear engineering, multiphysics analysis, COMSOL specialist

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 22 nov. 2010, 17:19 UTC−5
I forgot to also mention that it could be a function of your operating system. You could stick a Ubunty live CD into your system without effecting the contents of your hard drive or change anything associated with your operating system. Then run a test to see if COMSOL runs better in shared-memory parallel processing than the Windows system. I know that the Linux kernel has specific kernel settings that need to be compiled in to take advantage of large-memory machines. There may be something similar in windows; a windows switch or boot up option that needs to be set to take full advantage of your hardware. I know that Microsoft now sells a HPC version of their server software to run distributed parallel processing. I am an exclusive Linux user, and I have not seen a turn around in the speedup of a shared memory machine running COMSOL, but I also have not seen a 32-core machine. We have a 12-core (24 hyperthreaded), and then a 64-core cluster.
I forgot to also mention that it could be a function of your operating system. You could stick a Ubunty live CD into your system without effecting the contents of your hard drive or change anything associated with your operating system. Then run a test to see if COMSOL runs better in shared-memory parallel processing than the Windows system. I know that the Linux kernel has specific kernel settings that need to be compiled in to take advantage of large-memory machines. There may be something similar in windows; a windows switch or boot up option that needs to be set to take full advantage of your hardware. I know that Microsoft now sells a HPC version of their server software to run distributed parallel processing. I am an exclusive Linux user, and I have not seen a turn around in the speedup of a shared memory machine running COMSOL, but I also have not seen a 32-core machine. We have a 12-core (24 hyperthreaded), and then a 64-core cluster.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago 3 déc. 2010, 07:56 UTC−5
Thank you for your answer.

I'll spend some time on them and feed back again!
Thank you for your answer. I'll spend some time on them and feed back again!

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.