Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

CPU Comparison for Numerical computing

Please login with a confirmed email address before reporting spam

Dear all,

We are working in my lab on numerical modeling for physical problems with Comsol software. Recently, we have bought a new Dell computing server, based on Intel Xeon E7-4820 v3 @ 1.90GHz, 10 cores (2 logical cores per physical). We are surprised to see with some simulations (3D magnetostatic, 2D natural convection...) that this new server is almost twice slower than another one based on Intel Xeon E5-1620 v2 @ 3.7GHz, 4 cores (although prices are really different!).
We do not think it is due to memory, as the simulation only uses about 10 Gb of RAM, far from the 64Gb of the server. We are working with Linux Debian for both servers, so I don't think it is an OS issue. I have thought at the single-thread/multi-thread problem, but with the "htop" linux command I can see that all (virtual) cores are used for the simulations (8 for the E5, 20 for the E7) !
In this case, I do not understand why the E7 (that moreover works with 2 SSD in RAID) is slower than the E5... and I do not know what I can do to find where is the problem, and how to resolve it :-(
Moreover, CPU benchmark on the internet show that the E74820 have largely higher performances thant E51620. I have ask Intel, but they suggest me to see with Comsol support... Do you have any idea that could explain such result?
Thank you.

Best regards,

3 Replies Last Post 19 oct. 2016, 12:28 UTC−4
Edgar J. Kaiser Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 8 years ago 19 oct. 2016, 07:57 UTC−4
Your E5 machine has almost twice the clock speed as the E7 machine. Small models frequently don't benefit much from multicore machines. The SSD has no impact on computational speed as long as the model fits into the RAM.

--
Edgar J. Kaiser
emPhys Physical Technology
www.emphys.com
Your E5 machine has almost twice the clock speed as the E7 machine. Small models frequently don't benefit much from multicore machines. The SSD has no impact on computational speed as long as the model fits into the RAM. -- Edgar J. Kaiser emPhys Physical Technology http://www.emphys.com

Please login with a confirmed email address before reporting spam

Posted: 8 years ago 19 oct. 2016, 11:46 UTC−4
Thank you for your answer. I am not sure to understand what you call "small models", but in our simulations we are often working with more than 2.000.000 DOF. As explained, I "see" (with htop command) all the cores working in both cases. What I do not understand is why 20 cores working together are slower than 8 cores. I see that the frequency is bigger with E5, but then in which case is the E7 better (as show in every benchmark you can find on the Internet... and as explained by its price) ?
Thank you for your answer. I am not sure to understand what you call "small models", but in our simulations we are often working with more than 2.000.000 DOF. As explained, I "see" (with htop command) all the cores working in both cases. What I do not understand is why 20 cores working together are slower than 8 cores. I see that the frequency is bigger with E5, but then in which case is the E7 better (as show in every benchmark you can find on the Internet... and as explained by its price) ?

Edgar J. Kaiser Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 8 years ago 19 oct. 2016, 12:28 UTC−4
Your example was about a 10 GB big model, which is small in my terms. Here the clock frequency is key. Whether a multicore machine is faster depends very much on the task. Many models cannot easily be parallelized and thus don't benefit from many cores.
My priority regarding hardware is:
1. Maximum clock frequency
2. Maximum memory bandwidth/core
3. number of cores

Published benchmarks are frequently tailored to the new cpu to make it look better than the cheaper ones and thus justify the price.
I do most of my work with a 3.6 GHz core i7 machine with 64 GB RAM and I frequently find that much bigger multicore and dual socket machines (from my customers) are just equal or even inferior than my <2000 $ machine.
Of course for large models you need a bigger machine with lots of RAM.

To some degree it is like in real live: With a van you can move a complete household, if it is just about moving something small you are better off with a small fast car.

Most of my work would benefit more from higher clock frequencies than from parallel concepts. Unfortunately the silicon based cpus seem to be stuck below 4 GHz and something significantly faster is not in sight. The cpu manufacturers don't have an alternative to parallelizing because the clock speed is stuck.

--
Edgar J. Kaiser
emPhys Physical Technology
www.emphys.com
Your example was about a 10 GB big model, which is small in my terms. Here the clock frequency is key. Whether a multicore machine is faster depends very much on the task. Many models cannot easily be parallelized and thus don't benefit from many cores. My priority regarding hardware is: 1. Maximum clock frequency 2. Maximum memory bandwidth/core 3. number of cores Published benchmarks are frequently tailored to the new cpu to make it look better than the cheaper ones and thus justify the price. I do most of my work with a 3.6 GHz core i7 machine with 64 GB RAM and I frequently find that much bigger multicore and dual socket machines (from my customers) are just equal or even inferior than my

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.