kvs wrote:No. You are wrong. The Intel and AMD x64 is wired at the CPU level for 64 bit floating point. The Elbrus is clearly using
32 bit floating point and cannot do 64 bit natively. It can do 24 floating point operations in a single cycle for 32 bits
and 12 for 64 bits. I am quoting MCST itself so you can send them an letter telling them that they do not know what they
are talking about. Originally it was claimed that the Elbrus would do 24 floating point operations per cycle for double
precision (64 bits).
Your are also engaging in obfuscation with the vector MMX unit. The Elbrus does not even have an MMX type multiple FPU
capability. What relevance does 4 x 32 have for the Elbrus? The problem is that it cannot do 1 x 64 in a single effective cycle
and requires two, thus the exact factor of two GFLOPs reduction.
https://wiki.tuflow.com/index.php?title=Hardware_Benchmarking_Topic_Single_Precision_VS_Double_Precision
The primary difference between the SP and DP numbers in the above tests is due to main memory hits. It costs more
to push 64 bit words between RAM and the caches than 32 bit words. There are also hits from FDIV and FMUL total
cycle differences between single and double precision.
The Elbrus can do 12 off 64bit multiplication in one cycle, or 24 off 32bit multiplication.
If you divide the 125 gflops with the frequency, number of cores then you get the wide of the ELBRUS core FPU vector registers.
It is 768 bite wide (strange number)
On the AVX-512 the vector registers are 512 bit width , and capale to hold 8 off 64bit float.
4 float important because of you do 3d transformations (rotation, move, projective transformation, mirro, eigenvaule calcualtion or whatever) you need 4 dimensional vecotrs.
The 4 float/register is the most common representation, and it is like this since the AMD 3DNOW isntruction set (float32 expansion of the original integer MMX registers).
The intel version is the SSE, introduced with the Pentium III CPUs.
If you have speed issues with your program try to use the SSE registers ( 20+ years old).
There are C header files that can utilise this registers.