Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

high-end computer-guidance garbage

Please login with a confirmed email address before reporting spam

A couple weeks ago we asked for some guidance from COMSOL on a computer upgrade - for running big problems - 10 to 20M DOFs.

We were directed to this page
www.comsol.com/support/knowledgebase/866/

where we are told that speed is primarily dependent on the product of the number of sockets and the number of memory channels per processor.

We followed that advice (also from several other COMSOL sources) and spent $23K on a computer upgrade that is about 30% slower than the cheap computer we'd been using for about two years. (But of course, we first had to bring our subscription up to date to transfer our license to the new computer.)

Our previous computer was a single socket, i7-3930K, (6 cores, 3.2 GHz) 64GB RAM.

The new computer is a 4-socket E5-4627 v2 (32 cores, 3.3 GHz) with 512GB RAM.

Does anyone at COMSOL care how much their customers waste on computers because of bad advice?

Is anyone at COMSOL concerned about helping customers who need to run really big problems improve the performance of COMSOL on high-end computers?



10 Replies Last Post 9 mars 2016, 04:04 UTC−5
Edgar J. Kaiser Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 11 avr. 2015, 03:58 UTC−4
David,

sorry to hear about this experience. I would recommend to keep talking to COMSOL support. There may still be options to tune your system and/or your models.
Would you mind telling us which mainboard your system is based on and which kind of studies you are planning to run?

Thank you and best regards
Edgar J. Kaiser

emPhys Physical Technology
www.emphys.com
David, sorry to hear about this experience. I would recommend to keep talking to COMSOL support. There may still be options to tune your system and/or your models. Would you mind telling us which mainboard your system is based on and which kind of studies you are planning to run? Thank you and best regards Edgar J. Kaiser emPhys Physical Technology http://www.emphys.com

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 12 avr. 2015, 17:21 UTC−4
Edgar,
Thanks for the interest (and sorry for my negative subject title).
1 Supermicro Barebone Tower/4U S-2011 f/ 4x E5-4600
MFG Part Number: SYS-8047R-7RFT+
4 Intel 3.30GHz Xeon E5-4627 v2 8-Core, S-2011
16 32GB load reduced dims 1866MHz
1 SSD Drive 960GB, SanDisk Extreme PRO 960GB SATA 6.0GB/s
2 Windows Server 2012 R2

We just got the system up late Friday and I'm sure a number of things weren't optimized, including some of the preferences in COMSOL, but it was indeed very disappointing.

We're solving very complex rf problems (over 5000 surfaces, over 10M DOFs, nearly 1000 nodes in the geometry sequence, several hundred parameters and variables, etc.). Currently using GMRES, iterative solvers. Will be looking at direct solvers now that we have sufficient RAM to consider such.

We'll be trying various things over the next few days, and hoping to get some good suggestions from someone. We'll pass along what we learn.

David




Edgar, Thanks for the interest (and sorry for my negative subject title). 1 Supermicro Barebone Tower/4U S-2011 f/ 4x E5-4600 MFG Part Number: SYS-8047R-7RFT+ 4 Intel 3.30GHz Xeon E5-4627 v2 8-Core, S-2011 16 32GB load reduced dims 1866MHz 1 SSD Drive 960GB, SanDisk Extreme PRO 960GB SATA 6.0GB/s 2 Windows Server 2012 R2 We just got the system up late Friday and I'm sure a number of things weren't optimized, including some of the preferences in COMSOL, but it was indeed very disappointing. We're solving very complex rf problems (over 5000 surfaces, over 10M DOFs, nearly 1000 nodes in the geometry sequence, several hundred parameters and variables, etc.). Currently using GMRES, iterative solvers. Will be looking at direct solvers now that we have sufficient RAM to consider such. We'll be trying various things over the next few days, and hoping to get some good suggestions from someone. We'll pass along what we learn. David

Edgar J. Kaiser Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 12 avr. 2015, 18:55 UTC−4
Hi David,

thanks for the details. I am planning to invest in a bigger machine and so I am interested in other user's experiences.
I think you compared the smaller i7 machine and the new machine using some benchmarking model that fitted into the small machine.
Depending on model and solver characteristics smaller models aren't necessarily solving quicker on bigger and massively parallel machines. Some model setups just cannot draw much benefit from many cpus and cores. Small time-dependent studies are an example. I think iterative solvers also draw less benefit from a parallel architecture than direct solvers.

You obviously need a big machine due to the memory demands but smaller problems may not benefit as much from the parallel architecture or may even suffer from it due to increased overhead.

It could be interesting to limit the number of available cores for the small model on the big machine, to force the model into one single CPU and reduce overhead. This might provide a better comparison to the single CPU system.

It would be interesting to define one or several benchmark models that would allow to compare the hardware different people are using. I am also working on RF systems and would be interested in RF/microwave frequency domain studies using iterative and direct solvers.

Good luck with the new hardware, despite your first frustration I have to admit I am a bit envious ;-)
Cheers
Edgar

--
Edgar J. Kaiser
emPhys Physical Technology
www.emphys.com
Hi David, thanks for the details. I am planning to invest in a bigger machine and so I am interested in other user's experiences. I think you compared the smaller i7 machine and the new machine using some benchmarking model that fitted into the small machine. Depending on model and solver characteristics smaller models aren't necessarily solving quicker on bigger and massively parallel machines. Some model setups just cannot draw much benefit from many cpus and cores. Small time-dependent studies are an example. I think iterative solvers also draw less benefit from a parallel architecture than direct solvers. You obviously need a big machine due to the memory demands but smaller problems may not benefit as much from the parallel architecture or may even suffer from it due to increased overhead. It could be interesting to limit the number of available cores for the small model on the big machine, to force the model into one single CPU and reduce overhead. This might provide a better comparison to the single CPU system. It would be interesting to define one or several benchmark models that would allow to compare the hardware different people are using. I am also working on RF systems and would be interested in RF/microwave frequency domain studies using iterative and direct solvers. Good luck with the new hardware, despite your first frustration I have to admit I am a bit envious ;-) Cheers Edgar -- Edgar J. Kaiser emPhys Physical Technology http://www.emphys.com

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 12 avr. 2015, 22:04 UTC−4
Hi Edgar,
The initial benchmark wasn't exactly small - about 9M DOF's. Ran in ~15 minutes using the iterative solver on the smaller machine. Took more than 20 minutes on the new machine. In both cases, about 2/3 of total time was preconditioning, as we only ran to a relative tolerance of 0.1, which is more than sufficient for our purposes. Yes, it should run faster on the big machine using a direct solver, and there are probably other tweaks needed to both the machine and COMSOL. We'll see what we can figure out.
Cheers,
David
Hi Edgar, The initial benchmark wasn't exactly small - about 9M DOF's. Ran in ~15 minutes using the iterative solver on the smaller machine. Took more than 20 minutes on the new machine. In both cases, about 2/3 of total time was preconditioning, as we only ran to a relative tolerance of 0.1, which is more than sufficient for our purposes. Yes, it should run faster on the big machine using a direct solver, and there are probably other tweaks needed to both the machine and COMSOL. We'll see what we can figure out. Cheers, David

Walter Frei COMSOL Employee

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 13 avr. 2015, 11:30 UTC−4
Dear David and Edgar,

Yes, hardware selection can be a difficult topic. The single most important factor above all others should be that you have enough RAM to solve the models you want to solve in memory. We have posted some general guidelines on how to predict the memory your models will require here:
www.comsol.com/blogs/much-memory-needed-solve-large-comsol-models/

I see that your new machine has 512GB, which should be more than adequate for 20 million dof RF problems. Your 64GB machine, on the other hand, would not be able to solve such large problems in RAM and would be (much) slower, possibly to the point of being impractical for your needs. So first and foremost, it is critical that you have enough RAM.

Now once you know that you have enough RAM the secondary criterion in hardware selection would be computational speed. Computational speed based upon hardware specifications is quite difficult to predict or measure reliably. Both hardware and software are changing several times per year, so any benchmarks you might find would only have a practical lifetime of a few months and are only useful in terms of rough trends.

Although there are many other factors involved beyond this, for very large models having high memory bandwidth is beneficial. One thing that you will want to do is to compare the actual (rather than the advertised) memory speed between these two machines. There are several utilities online that can measure this, and that data tell you if a performance bottleneck exists.

We are aware that this is a quite complicated topic and we will work on re-writing that knowledgebase to be more clear that memory should be the most important consideration when selecting hardware. Please also feel free to contact your COMSOL Support Team if you have any questions you want to address one-on-one.

Best Regards,
Walter


Dear David and Edgar, Yes, hardware selection can be a difficult topic. The single most important factor above all others should be that you have enough RAM to solve the models you want to solve in memory. We have posted some general guidelines on how to predict the memory your models will require here: http://www.comsol.com/blogs/much-memory-needed-solve-large-comsol-models/ I see that your new machine has 512GB, which should be more than adequate for 20 million dof RF problems. Your 64GB machine, on the other hand, would not be able to solve such large problems in RAM and would be (much) slower, possibly to the point of being impractical for your needs. So first and foremost, it is critical that you have enough RAM. Now once you know that you have enough RAM the secondary criterion in hardware selection would be computational speed. Computational speed based upon hardware specifications is quite difficult to predict or measure reliably. Both hardware and software are changing several times per year, so any benchmarks you might find would only have a practical lifetime of a few months and are only useful in terms of rough trends. Although there are many other factors involved beyond this, for very large models having high memory bandwidth is beneficial. One thing that you will want to do is to compare the actual (rather than the advertised) memory speed between these two machines. There are several utilities online that can measure this, and that data tell you if a performance bottleneck exists. We are aware that this is a quite complicated topic and we will work on re-writing that knowledgebase to be more clear that memory should be the most important consideration when selecting hardware. Please also feel free to contact your COMSOL Support Team if you have any questions you want to address one-on-one. Best Regards, Walter

Robert Koslover Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 14 avr. 2015, 13:25 UTC−4
Since you have so much RAM, use a direct solver instead of GMRES. I suggest you try PARDISO.
Since you have so much RAM, use a direct solver instead of GMRES. I suggest you try PARDISO.

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 14 avr. 2015, 16:41 UTC−4
We did. On the big machine (512GB, 32 cores), a sweep of two frequencies of a 9.5M dof rf problem took over 10 hours (used ~190GB RAM), using PARDISO.

The same problem on our small machine (64GB, 6 cores) took ~30 minutes, using GMRES.

David Doty
We did. On the big machine (512GB, 32 cores), a sweep of two frequencies of a 9.5M dof rf problem took over 10 hours (used ~190GB RAM), using PARDISO. The same problem on our small machine (64GB, 6 cores) took ~30 minutes, using GMRES. David Doty

Walter Frei COMSOL Employee

Please login with a confirmed email address before reporting spam

Posted: 10 years ago 16 avr. 2015, 16:55 UTC−4
Dear David, Edgar, and Robert,

There appear to be several slight misunderstandings here, so let me try to address this from the latest to the first.

With respect to using the PARDISO direct solver: This will not lead to faster solution times. The direct solvers will use more memory than the iterative solvers for problems with the same # of dofs. They will almost always also be slower. Direct solvers can be faster if you have a problem with a high condition number. Physically this could mean that you have a high contrast in material properties or a highly anisotropic media, for example. For RF problems, the direct solvers are necessary for problems with Floquet periodicity and the direct solvers will be faster for eigenvalue problems. Otherwise, the iterative solver is preferred.
If the problem is solving with an iterative solver, then there is no reason to use a direct solver.

With respect to the performance of the system your are describing, there are several factors that may affect performance. It is, for example, important to know the actual memory bandwidth to perform valid comparisons between systems.
Another issue can be parallelization. Each problem can have an optimal number of processors in terms of parallel speedup. The reasons for this are introduced at a conceptual level here: www.comsol.com/blogs/understanding-parallel-computing/
In short, using too many cores on a particular problem may lead to a slowdown. This is quite problem and hardware dependent.

Now, with respect to this larger 512 GB RAM machine and the relative performance: It seems that the primary motivation behind purchasing this machine is to solve problems approaching 50 million dof. Such large models are not solvable on a 64 GB RAM machine in any practical sense. In that respect you have made a very good hardware investment.

Best Regards,
Walter
Dear David, Edgar, and Robert, There appear to be several slight misunderstandings here, so let me try to address this from the latest to the first. With respect to using the PARDISO direct solver: This will not lead to faster solution times. The direct solvers will use more memory than the iterative solvers for problems with the same # of dofs. They will almost always also be slower. Direct solvers can be faster if you have a problem with a high condition number. Physically this could mean that you have a high contrast in material properties or a highly anisotropic media, for example. For RF problems, the direct solvers are necessary for problems with Floquet periodicity and the direct solvers will be faster for eigenvalue problems. Otherwise, the iterative solver is preferred. If the problem is solving with an iterative solver, then there is no reason to use a direct solver. With respect to the performance of the system your are describing, there are several factors that may affect performance. It is, for example, important to know the actual memory bandwidth to perform valid comparisons between systems. Another issue can be parallelization. Each problem can have an optimal number of processors in terms of parallel speedup. The reasons for this are introduced at a conceptual level here: http://www.comsol.com/blogs/understanding-parallel-computing/ In short, using too many cores on a particular problem may lead to a slowdown. This is quite problem and hardware dependent. Now, with respect to this larger 512 GB RAM machine and the relative performance: It seems that the primary motivation behind purchasing this machine is to solve problems approaching 50 million dof. Such large models are not solvable on a 64 GB RAM machine in any practical sense. In that respect you have made a very good hardware investment. Best Regards, Walter

Please login with a confirmed email address before reporting spam

Posted: 9 years ago 15 sept. 2015, 16:47 UTC−4
Sorry to hear about your computing challenges. I work for Nor-Tech. We are a software integration partner with COMSOL. Nor-Tech provides High Performance Computing (HPC) solutions. It sounds like your jobs may have scaled beyond the architecture of a single computer.

www.nor-tech.com/solutions/hpc...-partners/comsol-multiphysics/

Nor-Tech has demo cluster loaded with COMSOL available for testing If you are interested in seeing how your jobs might scale beyond a single computer and what type of performance increase a cluster might offer. please contact me directly at 800-601-5250. We can run the job multiple ways on the cluster to see which configuration might offer the best results.

Thanks

Bob
Sorry to hear about your computing challenges. I work for Nor-Tech. We are a software integration partner with COMSOL. Nor-Tech provides High Performance Computing (HPC) solutions. It sounds like your jobs may have scaled beyond the architecture of a single computer. www.nor-tech.com/solutions/hpc...-partners/comsol-multiphysics/ Nor-Tech has demo cluster loaded with COMSOL available for testing If you are interested in seeing how your jobs might scale beyond a single computer and what type of performance increase a cluster might offer. please contact me directly at 800-601-5250. We can run the job multiple ways on the cluster to see which configuration might offer the best results. Thanks Bob

Please login with a confirmed email address before reporting spam

Posted: 9 years ago 9 mars 2016, 04:04 UTC−5

[...] as we only ran to a relative tolerance of 0.1, which is more than sufficient for our purposes. [...]


That's interesting! I wonder what the global error estimate turns out to be after you verify your solutions? (I would have never thought that such a loose local tolerance could give you meaningful results.)
[QUOTE] [...] as we only ran to a relative tolerance of 0.1, which is more than sufficient for our purposes. [...] [/QUOTE] That's interesting! I wonder what the global error estimate turns out to be after you verify your solutions? (I would have never thought that such a loose local tolerance could give you meaningful results.)

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.