It looks like a problem of GCC. Intel compiler gives the expected result.
$ g++ -I ~/program/include/eigen3 -std=c++11 -O3 a.cpp -o a && ./a
test1_us: 200087
test2_us: 320033
test3_us: 44539
$ icpc -I ~/program/include/eigen3 -std=c++11 -O3 a.cpp -o a && ./a
test1_us: 214537
test2_us: 23022
test3_us: 42099
Compared to the icpc
version, gcc
seems to have problem optimizing your test2
.
For more precise result, you may want to turn off the debug assertions by -DNDEBUG
as shown here.
EDIT
For question 1
@ggael gives an excellent answer that gcc
fails vectorizing the sum loop. My experiment also find that test2
is as fast as the hand-written naive for-loop, both with gcc
and icc
, suggesting that vectorization is the reason, and no temporary memory allocation is detected in test2
by the method mentioned below, suggesting that Eigen evaluate the expression correctly.
For question 2
Avoiding the intermediate memory is the main purpose that Eigen use expression templates. So Eigen provides a macro EIGEN_RUNTIME_NO_MALLOC and a simple function to enable you check whether an intermediate memory is allocated during calculating the expression. You can find a sample code here. Please note this may only work in debug mode.
EIGEN_RUNTIME_NO_MALLOC - if defined, a new switch is introduced which
can be turned on and off by calling set_is_malloc_allowed(bool). If
malloc is not allowed and Eigen tries to allocate memory dynamically
anyway, an assertion failure results. Not defined by default.
For question 3
There is a way to use intermediate variables, and to get the performance improvement introduced by lazy evaluation/expression templates at the same time.
The way is to use intermediate variables with correct data type. Instead of using Eigen::Matrix/Array
, which instructs the expression to be evaluated, you should use the expression type Eigen::MatrixBase/ArrayBase/DenseBase
so that the expression is only buffered but not evaluated. This means you should store the expression as intermediate, rather than the result of the expression, with the condition that this intermediate will only be used once in the following code.
As determing the template parameters in the expression type Eigen::MatrixBase/...
could be painful, you could use auto
instead. You could find some hints on when you should/should not use auto
/expression types in this page. Another page also tells you how to pass the expressions as function parameters without evaluating them.
According to the instructive experiment about .abs2()
in @ggael 's answer, I think another guideline is to avoid reinventing the wheel.