Alignment matters not only for performance, but also for correctness. Some architectures will fail with an processor trap if the data is not aligned correctly, or access the wrong memory location. On others, access to unaligned variables is broken into multiple accesses and bitshifts (often inside the hardware, sometimes by OS trap handler), losing atomicity.
The advice to sort members in descending order of size is for optimal packing / minimum space wasted by padding, not for alignment or speed. Members will be correctly aligned no matter what order you list them in, unless you request non-conformant layout using specialized pragmas (i.e. the non-portable #pragma pack
) or keywords. Although total structure size is affected by padding and also affects speed, often there is another ordering that is optimal.
For best performance, you should try to get members which are used together into the same cache line, and members that are accessed by different threads into different cache lines. Sometimes that means a lot of padding to get a cross-thread shared variable alone in its own cache line. But that's better than taking a performance hit from false sharing.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…