c++ - Undefined high-order of uint64_t while shifting and masking 32-bit values

Question

Welcome To Ask or Share your Answers For Others

c++ - Undefined high-order of uint64_t while shifting and masking 32-bit values

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Undefined high-order of uint64_t while shifting and masking 32-bit values

I have some undefined behaviour in a seemingly innocuous function which is parsing a double value from a buffer. I read the double in two halves, because I am reasonably certain the language standard says that shifting char values is only valid in a 32-bit context.

inline double ReadLittleEndianDouble( const unsigned char *buf )
{
    uint64_t lo = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
    uint64_t hi = (buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4];
    uint64_t val = (hi << 32) | lo;
    return *(double*)&val;
}

Since I am storing 32-bit values into 64-bit variables lo and hi, I reasonably expect that the high-order 32-bits of these variables will always be 0x00000000. But sometimes they contain 0xffffffff or other non-zero rubbish.

The fix is to mask it like this:

uint64_t val = ((hi & 0xffffffffULL) << 32) | (lo & 0xffffffffULL);

Alternatively, it seems to work if I mask during the assignment instead:

uint64_t lo = ((buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0]) & 0xffffffff;
uint64_t hi = ((buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4]) & 0xffffffff;

I would like to know why this is necessary. All I can think of to explain this is that my compiler is doing all the shifting and combining for lo and hi directly on 64-bit registers, and I might expect undefined behaviour in the high-order 32-bits if this is the case.

Can someone please confirm my suspicions or otherwise explain what is happening here, and comment on which (if any) of my two solutions is preferable?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:26:46+0000

If you try to shift a char or unsigned char you're leaving yourself at the mercy of the standard integer promotions. You're better off casting the values yourself, before you try to shift them. You don't have to separate the lower and upper halves if you do so.

inline double ReadLittleEndianDouble( const unsigned char *buf )
{
    uint64_t val = ((uint64_t)buf[7] << 56) | ((uint64_t)buf[6] << 48) | ((uint64_t)buf[5] << 40) | ((uint64_t)buf[4] << 32) |
                   ((uint64_t)buf[3] << 24) | ((uint64_t)buf[2] << 16) | ((uint64_t)buf[1] << 8) | (uint64_t)buf[0];
    return *(double*)&val;
}

All this is necessary only if the CPU is big-endian or if the buffer might not be properly aligned for the CPU architecture, otherwise you can simplify this greatly:

    return *(double*)buf;

Categories

c++ - Undefined high-order of uint64_t while shifting and masking 32-bit values

c++ - Undefined high-order of uint64_t while shifting and masking 32-bit values

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags