2013-04-23 00:23:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

[ Ugh, resending because I had mistakenly set the html bit and all the
mailing lists just refused the original... ]

So I was playing with sparse again, because the MIPS people had hit a
bug with the fact that they had made PAGE_MASK an *unsigned* type, and
then doing the ~ (binary not) operation on it does the wrong thing
when you operate on bigger types, like hardware 36-bit physical
addresses.

See commit 3b5e50edaf50 (and commit c17a6554 that it reverts).

So the issue is that let's say that you have a constant (or variable,
for that matter) that is of type "unsigned int", and then you use that
to mask a variable of a larger size (say it's an "unsigned long" on a
64-bit arch, or it's a phys_addr_t on a 32-bit arch with PAE). So the
code looks basically like a variation of something like this:

u64 value = ...;

value &= ~bitmask;

What happens?

What happens is that the "~bitmask" is done in the *narrower* type,
and then - because the narrower type is unsigned - the cast to the
wider type is done as an *unsigned* cast, so what you *think* happens
is that it clears the bits that are set in "bitmask", but what
*actually* happens is that yes, you clear the bits that are set in
:bitmask", but you *also* clear the upper bits of value.

Now, why am I posting about this MIPS-specific small detail on to x86 people?

Because I hacked up a sparse patch that looks for the pattern of
unsigned casts of a (bit) negation to a wider type, and there's a fair
number of them.

Now, it may well be that my sparse hack (and it really is a hack) is a
broken piece of crap, but from a quick look the warnings it gives look
reasonable.

And there's quite a lot of them. Even in my (fairly small) config I
use on my desktop. And the first warnings I see are in x86 code:

arch/x86/kernel/traps.c:405:16: warning: implicit unsigned
widening cast of a '~' expression
arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit
unsigned widening cast of a '~' expression

that particular one is something that has an *explicit* cast to "u64",
and then it does binary 'and' operations with these things that are of
a 32-bit unsigned type with a binary not in front of them. IOW, the
types in that expression are *very* confused.

Here's a ext4 code snippet that looks like an actual bug (but seems to
only hit read-ahead):

ext4_fsblk_t b, block;

b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1);

where "b" actually ends up having the upper bits cleared, because the
s_inode_readahead_blks thing is an unsigned int, so you're masking off
not just the low bits, but the high bits too. Ted? Of course, it's
just read-ahead, so it probably doesn't matter, but.

We have a number of generic code examples where this kind of thing
seems harmless:

*vm_flags &= ~VM_MERGEABLE;

(we only have flags in the low 32 bits, and the only reason we get
warnings for VM_MERGEABLE is because 0x80000000 is an implicitly
unsigned constant, while 0x40000000 is *not*), or

kernel/trace/trace.c:2910:32: warning: implicit unsigned widening
cast of a '~' expression

where "trace_flags" is "unsigned long" and "mask" is "unsigned int",
and the expression

trace_flags &= ~mask;

actually clears the upper 32 bits too, but presumably they are always
clear anyway so we don't *really care. But it's an example of code
that is potentially very subtly dangerous.

Now, a lot of the other warnings I get seem to be more benign - the
networking code results in a lot of these because of code like this:

#define NLMSG_ALIGNTO 4U
#define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )

which technically has the same problem if "len" is 64-bit (which
sizeof() is on x86-64), but we don't really *care* because it turns
out that the sizeof values will always have the high bits clear
*anyway*. So I'm not sure the hacky sparse warning is useful, because
my code isn't smart enough to figure out when this kind of widening
cast is a problem, and when it isn't.

That said, I'm cc'ing David and netdev too, just in case. There is
likely some *reason* why it uses an "unsigned int" for a constant that
is then commonly expanded with a binary "not" and the upper bits end
up being surprising. So this thing doesn't look like a bug, but it
does cause these warnings:

net/netlink/af_netlink.c:1889:38: warning: implicit unsigned
widening cast of a '~' expression

and maybe the networking people care about this and maybe they don't.

(It turns out that any use of "UINT_MAX" for a "long" value also
results in this warning, because we define it as "~0ul", so there are
other cases of spurious things where we *intentionally* drop the high
bits).

ANYWAY.

Only a few of the warnings looked like they might be bugs, but I'm
attaching my sparse patch here in case somebody wants to play with the
hacky thing..

Linus


Attachments:
sparse.diff (3.71 kB)

2013-04-23 09:00:26

by David Laight

[permalink] [raw]
Subject: RE: Unsigned widening casts of binary "not" operations..

> What happens is that the "~bitmask" is done in the *narrower* type,
> and then - because the narrower type is unsigned - the cast to the
> wider type is done as an *unsigned* cast, so what you *think* happens
> is that it clears the bits that are set in "bitmask", but what
> *actually* happens is that yes, you clear the bits that are set in
> :bitmask", but you *also* clear the upper bits of value.

If the narrower type is signed it is probably even more confusing!
The high bits will be preserved unless you are masking off bit 31.

David

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-04-23 14:29:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 1:59 AM, David Laight <[email protected]> wrote:
>
> If the narrower type is signed it is probably even more confusing!
> The high bits will be preserved unless you are masking off bit 31.

Yes. However, that case doesn't trigger with the normal case of small
values. So "~4" works fine with widening, in a way that "~4u" does
not.

Which doesn't mean that you aren't right, it only means that it's
harder to check for in sparse. The hacky little patch I sent out with
already resulted in a lot of noise for things like "~0u" (UINT_MAX)
and the "~4u" use in NLMSG_ALIGNTO, I'd dread to do the same for
signed values.

That said, with the most minimal value analysis (ie only looking at
constants), maybe it wouldn't be too bad. I started out just worrying
about the PAGE_MASK case, though, where we're talking about (somewhat
complicated) generated constants, and then the signed case is largely
irrelevant (although a signed "1 << 31" would - as you say - trigger
this same thing too).

Linus

2013-04-23 15:26:08

by David Laight

[permalink] [raw]
Subject: RE: Unsigned widening casts of binary "not" operations..

> > If the narrower type is signed it is probably even more confusing!
> > The high bits will be preserved unless you are masking off bit 31.
>
> Yes. However, that case doesn't trigger with the normal case of small
> values. So "~4" works fine with widening, in a way that "~4u" does
> not.

Thinks ... converting:
foo &= ~bar;
to:
foo = ~(~foo | bar);
would generally DTRT.
Whether the compiler has the relevant patterns to optimise
it is another question.

David

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-04-23 15:42:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 8:24 AM, David Laight <[email protected]> wrote:
>
> Thinks ... converting:
> foo &= ~bar;
> to:
> foo = ~(~foo | bar);
> would generally DTRT.

Quite frankly, I'd rather just try to make people (re)move the widening cast.

So while

foo &= ~bar;

is unsafe if bar is narrower than foo, it's unsafe simply because the
implicit cast from the C type conversions comes *after* the binary
not.

An explicit cast fixes it, and shows that you were aware of the issue:

foo &= ~(foo_t)bar;

and gcc will generate the right logic. Of course, casts then have
their own problems, which your thing avoids (as would just having a
"andn" operation in C)

The best case is for code that does bitmasking ops like this to avoid
any casts (implicit or explicit) by just avoiding mixed types. That
was, to a large degree, my hope for the sparse patch, but it's
nontrivial. Many types are rather "natural" (ie constants have a
natural "int" type, sizeof() is size_t, etc) and in other cases you
want to do the same ops for two different types (with the case that
caused me to start to look at it being the "align to page boundary"
for a virtual address vs a PAE phys_addr_t) so you can't really avoid
mixing things in some circumstances.

Linus

2013-04-23 15:52:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 08:42:49AM -0700, Linus Torvalds wrote:
> The best case is for code that does bitmasking ops like this to avoid
> any casts (implicit or explicit) by just avoiding mixed types. That
> was, to a large degree, my hope for the sparse patch, but it's
> nontrivial. Many types are rather "natural" (ie constants have a
> natural "int" type, sizeof() is size_t, etc) and in other cases you
> want to do the same ops for two different types (with the case that
> caused me to start to look at it being the "align to page boundary"
> for a virtual address vs a PAE phys_addr_t) so you can't really avoid
> mixing things in some circumstances.

Maybe it's worth creating a magic helper function, called something
like mask_out() that handles the casting automatically, and it makes
it clear to a reader what you're trying to do?

- Ted

2013-04-23 16:05:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 8:52 AM, Theodore Ts'o <[email protected]> wrote:
>
> Maybe it's worth creating a magic helper function, called something
> like mask_out() that handles the casting automatically, and it makes
> it clear to a reader what you're trying to do?

We have that for some things. Like the aligning code (see
include/{uapi/}linux/kernel.h). We have that whole ALIGN()
infrastructure that casts the alignment to the result type, exactly
because people got things wrong so often (and exactly the "& ~mask"
thing in particular).

But doing so becomes *less* readable when the types already match, and
it's another extra complication in kernel programming to know all the
"oh, don't use the standard C &~ constructs because it has some subtle
type handling under *some* circumstances". So for aligning things up,
we've been able to do it, because people hate doing that duplicated "
(val + mask) & ~mask" thing anyway (especially since "mask" tends to
be something like "size-1". IOW, having a helper function/macro ends
up solving more than just the type issue. But if it's just the type,
it ends up being very inconvenient.

We don't have the equivalent "align down" macro, for example, exactly
because aligning things down is *just* the logical mask, and so in
most cases it's actually just more pain to have a macro helper. So
ALIGN() only aligns upwards, even though it doesn't even say so in the
name. Inconsistent? Unclear? Yes, but there's a reason for it.

Linus

2013-04-23 17:37:36

by David Miller

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

From: Linus Torvalds <[email protected]>
Date: Tue, 23 Apr 2013 08:42:49 -0700

> An explicit cast fixes it, and shows that you were aware of the issue:
>
> foo &= ~(foo_t)bar;
>
> and gcc will generate the right logic. Of course, casts then have
> their own problems, which your thing avoids (as would just having a
> "andn" operation in C)

I just want to mention that this is dangerous in different ways, we
just recently got a patch in the networking that removed such a cast.
The problem is when the cast narrows, f.e.:

~(u8)0

doesn't do what you think it does. That doesn't evaluate to 0xff.

You all are very bright and probably know this already.

So,if it widens, which is the situation we're talking about, you're
good. But until I saw the above u8 thing I never suspected that
narrowing in this kind of expression was dangerous.

2013-04-23 17:52:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 10:37 AM, David Miller <[email protected]> wrote:
>
> I just want to mention that this is dangerous in different ways, we
> just recently got a patch in the networking that removed such a cast.
> The problem is when the cast narrows, f.e.:
>
> ~(u8)0
>
> doesn't do what you think it does. That doesn't evaluate to 0xff.

Yeah, sparse will get that right, but won't warn about it even with my
patch. The normal "all arithmetic is done in *at*least* 'int'" will
always kick any C expression like that up to 'int' before the binary
not op is done. So in your example, the implicit cast is widening the
value *before* the binary not, not after.

Linus

2013-04-23 17:57:03

by David Miller

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

From: Linus Torvalds <[email protected]>
Date: Tue, 23 Apr 2013 10:52:33 -0700

> On Tue, Apr 23, 2013 at 10:37 AM, David Miller <[email protected]> wrote:
>>
>> I just want to mention that this is dangerous in different ways, we
>> just recently got a patch in the networking that removed such a cast.
>> The problem is when the cast narrows, f.e.:
>>
>> ~(u8)0
>>
>> doesn't do what you think it does. That doesn't evaluate to 0xff.
>
> Yeah, sparse will get that right, but won't warn about it even with my
> patch. The normal "all arithmetic is done in *at*least* 'int'" will
> always kick any C expression like that up to 'int' before the binary
> not op is done. So in your example, the implicit cast is widening the
> value *before* the binary not, not after.

If you're not bored, and could add a check for that kind of narrowing
situation, I'd really appreciate it.

2013-04-23 18:21:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 10:56 AM, David Miller <[email protected]> wrote:
>
> If you're not bored, and could add a check for that kind of narrowing
> situation, I'd really appreciate it.

It's pretty non-trivial.

For example, this is perfectly normal C:

char a,b;
...
a &= ~b;

and the arithmetic is technically done in "int" (because C really
never does any arithmetic in a narrower type). So the value of 'b'
will be widened to int before doing the binary not, and then the
binary 'and' will be done in 'int', and then in the end it will be
cast down to 'char' again and written back to 'a'.

And none of that matters. All the normal integer operations (both
arithmetic and logical) are "stable" in the smaller size. The upper
bits can be basically random uninteresting crap (ie do zero or sign
extension depending on the smaller type), but they don't matter
because they go away in the end. In fact, the compiler will usually
then end up doing the actual arithmetic/logical op in the narrower
type, even though *conceptually* there were lots of casts going on
back and forth.

The point being that at the actual time of the widening, it's hard to
say whether "~(u8)0" is problematic or not - it will depend on how it
ends up being used. At a much later point, sparse will have optimized
away the casts and will indeed linearize it to a simple 8-bit
operation, but by then sparse will *also* have simplified away the
obviously constant arithmetic, so by the time the casts back and forth
are gone, so is the logical "not", and all you have left of the
"~(u8)0" is either a 8-bit value (255) or an "int" (-1) depending on
how it was used.

Put another way: it's not possible to say that "~(u8)0" is wrong early
on (because it may be part of something that ends up only doing byte
arithmetic) and it is *also* not possible to say that it's invalid
much later on, because by then we will have munged it into something
totally different.

I'm not saying it's impossible to do. You could certainly do it with
sparse, but it would involve a fair amount of effort. You'd have to
add a whole new separate phase of "type narrowing": sparse *does* do
that as part of the simplification, but right now it's done while it
does all the *other* simplification too, so ..

Now, if you only wanted the warning when there is an *explicit* cast,
that would be different. It would be easy enough to find the pattern
where we do a "~" op on an explicit cast to something smaller than
"int" (which means that there would be an implicit cast after the
explicit one). But quite frankly, you could likely do that almost
equally well with just a judicious use of "grep".

Linus

2013-04-24 12:36:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Unsigned widening casts of binary "not" operations..

On Tue, Apr 23, 2013 at 7:37 PM, David Miller <[email protected]> wrote:
> From: Linus Torvalds <[email protected]>
> Date: Tue, 23 Apr 2013 08:42:49 -0700
>
>> An explicit cast fixes it, and shows that you were aware of the issue:
>>
>> foo &= ~(foo_t)bar;
>>
>> and gcc will generate the right logic. Of course, casts then have
>> their own problems, which your thing avoids (as would just having a
>> "andn" operation in C)
>
> I just want to mention that this is dangerous in different ways, we
> just recently got a patch in the networking that removed such a cast.
> The problem is when the cast narrows, f.e.:
>
> ~(u8)0
>
> doesn't do what you think it does. That doesn't evaluate to 0xff.

This is the definition of MAC802154_CHAN_NONE?

We _should_ have noticed this earlier, as old gcc (e.g. 4.1.2) emits a
warning when comparing it to a u8:

net/mac802154/monitor.c: In function ‘mac802154_monitor_xmit’:
net/mac802154/monitor.c:49: warning: comparison is always false due to
limited range of data type
net/mac802154/wpan.c: In function ‘mac802154_wpan_xmit’:
net/mac802154/wpan.c:323: warning: comparison is always false due to
limited range of data type

Interestingly, none of this is seen in the build logs of the linux-next build
service, which uses gcc 4.2.3, 4.2.4, 4.5.1, and 4.6...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds