LinuxLists.cc - detecting integer constant expressions in macros

2018-03-20 22:20:27

Subject: detecting integer constant expressions in macros

Hi Linus,

here is an idea:

a test for integer constant expressions which returns an
integer constant expression itself which should be suitable
for passing to __builtin_choose_expr might be:

#define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) :
(int*)1)))

This also does not evaluate x itself on gcc although this is
not guaranteed by the standard. (And I haven't tried any older
gcc.)

Best,
Martin

2018-03-20 23:08:39

by Uecker, Martin

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

talking of crazy ideas, here is another way to preserve
integer const expressions in macros by storing it a
VLA type (only for positive integers I guess):

#define MAX(a, b) sizeof(*({    \
        typedef char _Ta[a];    \
        typedef char _Tb[b];    \
        (char(*)[sizeof(_Ta) > sizeof(_Tb) ? sizeof(_Ta) :
sizeof(_Tb)])0; }))

Am Dienstag, den 20.03.2018, 23:13 +0100 schrieb Martin Uecker:
> Hi Linus,
>
> here is an idea:
>
> a test for integer constant expressions which returns an
> integer constant expression itself which should be suitable
> for passing to __builtin_choose_expr might be:
>
> #define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) :
> (int*)1)))
>
> This also does not evaluate x itself on gcc although this is
> not guaranteed by the standard. (And I haven't tried any older
> gcc.)
>
> Best,
> Martin

2018-03-20 23:10:03

by Linus Torvalds

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

On Tue, Mar 20, 2018 at 3:13 PM, Uecker, Martin
<[email protected]> wrote:
>
> here is an idea:

That's not "an idea".

That is either genius, or a seriously diseased mind.

I can't quite tell which.

> a test for integer constant expressions which returns an
> integer constant expression itself which should be suitable
> for passing to __builtin_choose_expr might be:
>
> #define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))

Ok, so I can see that (void *)((x)*0l)) turns into NULL when x is an
ICE. Fine. So with a constant, we have

sizeof( 1 ? NULL : (int *) 1)

and the rule is that if one of the sides of a ternary operation with
pointers is NULL, the end result is the other type (int *).

So yes, the above returns 'sizeof(int)'.

And if it is *not* an ICE that first pointer is still of type '(void
*)', but it is not NULL.

And yes, the type conversion rules for a ternary op with two non-NULL
pointers is different, and it now returns "void *".

So now the end result is (sizeof(*(void *)(x)), which on gcc is
generally *different* from 'int'.

So I see two issues:

- "sizeof(*(void *)1)" is not necessalily well-defined. For gcc it is
1. But it could cause warnings.

- this will break the minds of everybody who ever sees that expression.

Those two issues might be fine, though.

> This also does not evaluate x itself on gcc although this is
> not guaranteed by the standard. (And I haven't tried any older
> gcc.)

Oh, I think it's guaranteed by the standard that 'sizeof()' doesn't
evaluate the argument value, only the type.

I'm in awe of your truly marvelously disgusting hack. That is truly a
work of art.

I'm sure it doesn't work or causes warnings for various reasons, but
it's still a thing of beaty.

Linus

2018-03-20 23:11:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

On Tue, Mar 20, 2018 at 4:07 PM, Uecker, Martin
<[email protected]> wrote:
>
> talking of crazy ideas, here is another way to preserve
> integer const expressions in macros by storing it a
> VLA type (only for positive integers I guess):
>
>
> #define MAX(a, b) sizeof(*({ \
> typedef char _Ta[a]; \
> typedef char _Tb[b]; \
> (char(*)[sizeof(_Ta) > sizeof(_Tb) ? sizeof(_Ta) :
> sizeof(_Tb)])0; }))

I liked your previous hack more. This is much more limited and not
nearly as disgustingly subtle.

Linus

2018-03-21 00:19:11

by Uecker, Martin

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

Am Dienstag, den 20.03.2018, 16:08 -0700 schrieb Linus Torvalds:
> On Tue, Mar 20, 2018 at 3:13 PM, Uecker, Martin
> <[email protected]> wrote:
> >
> > here is an idea:
>
> That's not "an idea".
>
> That is either genius, or a seriously diseased mind.
>
> I can't quite tell which.
>
> > a test for integer constant expressions which returns an
> > integer constant expression itself which should be suitable
> > for passing to __builtin_choose_expr might be:
> >
> > #define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) :
> > (int*)1)))

...
> So now the end result is (sizeof(*(void *)(x)), which on gcc is
> generally *different* from 'int'.
>
> So I see two issues:
>
> - "sizeof(*(void *)1)" is not necessalily well-defined. For gcc it
> is
> 1. But it could cause warnings.

It is a documented extension which enables pointer arithmetic
on void pointers, so I am sure neither gcc nor
clang has any problem with it. But one could also use
__builtin_types_compatible_p instead.

> - this will break the minds of everybody who ever sees that
> expression.
>
> Those two issues might be fine, though.
>
> > This also does not evaluate x itself on gcc although this is
> > not guaranteed by the standard. (And I haven't tried any older
> > gcc.)
>
> Oh, I think it's guaranteed by the standard that 'sizeof()' doesn't
> evaluate the argument value, only the type.

It has to evaluate the expression for the length of an array,
but it is not specified whether this is done if it does not have
any effect on the result. I would assume that any sane compiler
does not.

> I'm in awe of your truly marvelously disgusting hack. That is truly a
> work of art.
>
> I'm sure it doesn't work or causes warnings for various reasons, but
> it's still a thing of beaty.

I thought you might like it ;-)

Martin

2018-03-21 00:32:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

On Tue, Mar 20, 2018 at 5:10 PM, Uecker, Martin
<[email protected]> wrote:
>
>>
>> So I see two issues:
>>
>> - "sizeof(*(void *)1)" is not necessalily well-defined. For gcc it
>> is
>> 1. But it could cause warnings.
>
> It is a documented extension which enables pointer arithmetic
> on void pointers, so I am sure neither gcc nor
> clang has any problem with it.

Well, so sparse does the whole "void pointer arithmetic" thing too,
but sparse actually complains about use of sizeof(void).

We could remove that warning, though. But:

> But one could also use __builtin_types_compatible_p instead.

That might be the right approach, even if I like how it only used
standard C (although _disgusting_ standard C) without it apart from
the small issue of sizeof(void)

So something like

#define __is_constant(a) \
__builtin_types_compatible_p(int *, typeof(1 ? ((void*)((a) *
0l)) : (int*)1 ) )

if I counted the parentheses right..

Linus

2018-03-21 09:53:28

by Uecker, Martin

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

Am Dienstag, den 20.03.2018, 17:30 -0700 schrieb Linus Torvalds:
> On Tue, Mar 20, 2018 at 5:10 PM, Uecker, Martin
> <[email protected]> wrote:

>
> > But one could also use __builtin_types_compatible_p instead.
>
> That might be the right approach, even if I like how it only used
> standard C (although _disgusting_ standard C) without it apart from
> the small issue of sizeof(void)
>
> So something like
>
> #define __is_constant(a) \
>         __builtin_types_compatible_p(int *, typeof(1 ? ((void*)((a) *
> 0l)) : (int*)1 ) )
>
> if I counted the parentheses right..

This seems to work fine on all recent compilers. Sadly, it
produces false positives on 4.4.7 and earlier when
tested on godbolt.org

Surprisingly, the MAX macro as defined below still seems
to do the right thing with respect to avoiding the VLA
even on the old compilers.

I am probably missing something... or there are two
compiler bugs cancelling out, or the __builting_choose_expr
changes things.

Martin

My test code:

#define ICE_P(x) (__builtin_types_compatible_p(int*, __typeof__(1 ?
((void*)((x) * 0l)) : (int*)1)))

#define SIMPLE_MAX(a, b) ((a) > (b) ? (a) : (b))
#define SAFE_MAX(a, b) ({ __typeof(a) _a = (a); __typeof(b) _b = (b);
SIMPLE_MAX(_a, _b); })
#define MAX(a, b) (__builtin_choose_expr(ICE_P(a) && ICE_P(b),
SIMPLE_MAX(a, b), SAFE_MAX(a, b)))

int foo(int x)
{
    int a[MAX(3, 4)];
    //int a[MAX(3, x)];
    //int a[SAFE_MAX(3, 4)];
    //return ICE_P(MAX(3, 4));
    return ICE_P(MAX(3, x));
}

2018-03-21 10:23:24

by Uecker, Martin

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

Am Mittwoch, den 21.03.2018, 10:51 +0100 schrieb Martin Uecker:
>
> Am Dienstag, den 20.03.2018, 17:30 -0700 schrieb Linus Torvalds:
> > On Tue, Mar 20, 2018 at 5:10 PM, Uecker, Martin
> > <[email protected]> wrote:
> >
> > > But one could also use __builtin_types_compatible_p instead.
> >
> > That might be the right approach, even if I like how it only used
> > standard C (although _disgusting_ standard C) without it apart from
> > the small issue of sizeof(void)
> >
> > So something like
> >
> > #define __is_constant(a) \
> >         __builtin_types_compatible_p(int *, typeof(1 ? ((void*)((a)
> > *
> > 0l)) : (int*)1 ) )
> >
> > if I counted the parentheses right..
>
> This seems to work fine on all recent compilers. Sadly, it
> produces false positives on 4.4.7 and earlier when
> tested on godbolt.org
>
> Surprisingly, the MAX macro as defined below still seems
> to do the right thing with respect to avoiding the VLA
> even on the old compilers.
>
> I am probably missing something... or there are two
> compiler bugs cancelling out, or the __builting_choose_expr
> changes things.

Nevermind, of course it avoids the VLA if it produces a false
positive and uses the simple version. So it is unsafe to use
on very old compilers.

Martin

> Martin
>
> My test code:
>
> #define ICE_P(x) (__builtin_types_compatible_p(int*, __typeof__(1 ?
> ((void*)((x) * 0l)) : (int*)1)))
>
> #define SIMPLE_MAX(a, b) ((a) > (b) ? (a) : (b))
> #define SAFE_MAX(a, b) ({ __typeof(a) _a = (a); __typeof(b) _b = (b);
> SIMPLE_MAX(_a, _b); })
> #define MAX(a, b) (__builtin_choose_expr(ICE_P(a) && ICE_P(b),
> SIMPLE_MAX(a, b), SAFE_MAX(a, b)))
>
>
>
> int foo(int x)
> {
>     int a[MAX(3, 4)];
>     //int a[MAX(3, x)];
>     //int a[SAFE_MAX(3, 4)];
>     //return ICE_P(MAX(3, 4));
>     return ICE_P(MAX(3, x));
> }

2018-03-21 10:36:25

by David Laight

[permalink] [raw]

Subject: RE: detecting integer constant expressions in macros

From: Uecker, Martin
> Sent: 21 March 2018 10:22
> Am Mittwoch, den 21.03.2018, 10:51 +0100 schrieb Martin Uecker:
> >
> > Am Dienstag, den 20.03.2018, 17:30 -0700 schrieb Linus Torvalds:
> > > On Tue, Mar 20, 2018 at 5:10 PM, Uecker, Martin
> > > <[email protected]> wrote:
> > >
> > > > But one could also use __builtin_types_compatible_p instead.
> > >
> > > That might be the right approach, even if I like how it only used
> > > standard C (although _disgusting_ standard C) without it apart from
> > > the small issue of sizeof(void)
> > >
> > > So something like
> > >
> > > #define __is_constant(a) \
> > > __builtin_types_compatible_p(int *, typeof(1 ? ((void*)((a) * 0l)) : (int*)1 ) )
> > >
> > > if I counted the parentheses right..
> >
> > This seems to work fine on all recent compilers. Sadly, it
> > produces false positives on 4.4.7 and earlier when
> > tested on godbolt.org
> >
> > Surprisingly, the MAX macro as defined below still seems
> > to do the right thing with respect to avoiding the VLA
> > even on the old compilers.
> >
> > I am probably missing something... or there are two
> > compiler bugs cancelling out, or the __builting_choose_expr
> > changes things.
>
> Nevermind, of course it avoids the VLA if it produces a false
> positive and uses the simple version. So it is unsafe to use
> on very old compilers.

False positives with old compilers don't matter when max() is being used
for an on-stack array.
The compilations with a new compiler will detect real VLA, the old compiler
will generate valid code with a constant sized VLA.

OTOH these horrid:
long buf[max(sizeof (struct foo), sizeof (struct bar)) + 7 / 8];
would be better replaced with:
union buf { struct foo foo; struct bar bar; };

David

2018-03-21 21:12:31

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

On 2018-03-21 00:08, Linus Torvalds wrote:
> On Tue, Mar 20, 2018 at 3:13 PM, Uecker, Martin
> <[email protected]> wrote:
>>
>> here is an idea:
>
> That's not "an idea".
>
> That is either genius, or a seriously diseased mind.
>
> I can't quite tell which.
>
>> a test for integer constant expressions which returns an
>> integer constant expression itself which should be suitable
>> for passing to __builtin_choose_expr might be:
>>
>> #define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))

Wow. This is absolutely awesome.

>> This also does not evaluate x itself on gcc although this is
>> not guaranteed by the standard. (And I haven't tried any older
>> gcc.)
>
> Oh, I think it's guaranteed by the standard that 'sizeof()' doesn't
> evaluate the argument value, only the type.

Even if there's some hypothetical scenario where x ends up containing
something of variably-modified type, could one just use 0? instead of 1?
. That doesn't affect the type of the whole expression.

> I'm in awe of your truly marvelously disgusting hack. That is truly a
> work of art.

I'm not really worthy of commenting or trying to improve on this, but I
think it's important that 'git blame' will forever give Martin the
credit for this, so two minor suggestions:

- Cast (x) to (long) to allow x to have type u64 on a 32-bit platform
without giving "warning: cast to pointer from integer of different
size". That also allows x to have pointer type (it's theoretically
useful to take max() of two pointers into an array), and not so
important for the kernel, floating point type.

- Maybe use (int*)4 to avoid the compiler complaining about creating a
non-aligned pointer.

> I'm sure it doesn't work or causes warnings for various reasons, but
> it's still a thing of beaty.

I tried putting the below ugly test code through various gcc versions.
In no cases could I get different optimization options to generate
different results. I also couldn't get -Wall -Wextra to produce any
warnings that wasn't just due to the silly input (i.e., yes, I know I
have a comma expression with no side effects....). Most importantly,
ICE_P was always accepted as an ICE itself. Everything from 4.6 onwards
gives the same and expected output, namely 1 in the first four lines, 0
otherwise. gcc 4.4 instead produces this:

4 1
sizeof(long) 1
5ull - 3u 1
3.2 1
square(2) 0
square(argc) 0
squarec(2) 1
squarec(argc) 1
1+argc-argc 1
1+argc+argc+1-argc-argc 1
bignum() - 1 0
0*bignum() 0
0*bignumc() 1
sizeof(foo) 0
p 1
p < q 1
p++ 0
main 1
malloc(8) 0
v = malloc(8) 0
v 1
x++ 0
y++ 0
(3, 2, 1) 0
({x++; 0; }) 0
({square(y--); 0; }) 0
(square(x), 3) 0
(squarec(x), 3) 0
({squarec(x); 3;}) 0
({squarec(x);}) 1

So it seems to accept/treat (x)*0l as NULL as long as x has no
side-effects - where a function call is treated as having side effects
unless the function is declared with the const attribute,
comma-expressions are hard-coded as having side-effects [and what else
would they be good for, so that's not totally unreasonable...], and
statement expressions have side effects if they have more than one
statement.

Rasmus

#include <stdio.h>
#include <stdlib.h>

static inline __attribute__((__const__)) unsigned squarec(unsigned n)
{
return n*n;
}

static inline unsigned square(unsigned n)
{
return n*n;
}

static inline unsigned long long bignum(void)
{
return 1000000000000ULL;
}

static inline __attribute__((__const__)) unsigned long long bignumc(void)
{
return 1000000000000ULL;
}

/* #define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((long)(x) *
0l)) : (int*)1))) */
#define ICE_P(x) (__builtin_types_compatible_p(typeof(0 ?
((void*)((long)(x) * 0l)) : (int*)1), int*))

#define t(x) printf("%-20s\t%d\n", #x, __builtin_choose_expr(ICE_P(x),
1, 0))

int main(int argc, char *argv[])
{
char foo[argc++];
char **p, **q;
int x = 5, y = 8;
void *v;

p = &argv[3];
q = &argv[6];

t(4);
t(sizeof(long));
t(5ull - 3u);
t(3.2);
t(square(2));
t(square(argc));
t(squarec(2));
t(squarec(argc));
t(1+argc-argc);
t(1+argc+argc+1-argc-argc);
t(bignum() - 1);
t(0*bignum());
t(0*bignumc());
t(sizeof(foo));
t(p);
t(p < q);
t(p++);
t(main);
t(malloc(8));
t(v = malloc(8));
t(v);
t(x++);
t(y++);
t((3, 2, 1));
t(({x++; 0; }));
t(({square(y--); 0; }));
t((square(x), 3));
t((squarec(x), 3));
t(({squarec(x); 3;}));
t(({squarec(x);}));

/* Shutup distracting warnings. */
if (argc > 100000) {
printf("%llu", square(argc) + squarec(argc) + bignum() + bignumc());
}

return 0;
}

2018-03-21 22:29:17

by Linus Torvalds

[permalink] [raw]

Subject: Re: detecting integer constant expressions in macros

On Wed, Mar 21, 2018 at 2:10 PM, Rasmus Villemoes
<[email protected]> wrote:
>>
>> Oh, I think it's guaranteed by the standard that 'sizeof()' doesn't
>> evaluate the argument value, only the type.
>
> Even if there's some hypothetical scenario where x ends up containing
> something of variably-modified type, could one just use 0? instead of 1?
> . That doesn't affect the type of the whole expression.

Fair enough.

> I'm not really worthy of commenting or trying to improve on this, but I
> think it's important that 'git blame' will forever give Martin the
> credit for this

You're saying "please don't ever blame me".

Gotcha ;)

> two minor suggestions:
>
> - Cast (x) to (long) to allow x to have type u64 on a 32-bit platform
> without giving "warning: cast to pointer from integer of different
> size". That also allows x to have pointer type (it's theoretically
> useful to take max() of two pointers into an array), and not so
> important for the kernel, floating point type.

Fair enough.

> - Maybe use (int*)4 to avoid the compiler complaining about creating a
> non-aligned pointer.

Yes.

> I tried putting the below ugly test code through various gcc versions.
> In no cases could I get different optimization options to generate
> different results. I also couldn't get -Wall -Wextra to produce any
> warnings that wasn't just due to the silly input (i.e., yes, I know I
> have a comma expression with no side effects....). Most importantly,
> ICE_P was always accepted as an ICE itself. Everything from 4.6 onwards
> gives the same and expected output, namely 1 in the first four lines, 0
> otherwise. gcc 4.4 instead produces this:

Ok, so since we're going to get rid of 4.4 support anyway because of
our requirement of "asm goto", and since 4.5 was apparently never
shipped in any distros, I think the minimum compiler version we care
about is gcc-4.6.

So the above sounds fine, and I won't worry about the fact that
gcc-4.4 apparently does expression simplifications early.

> So it seems to accept/treat (x)*0l as NULL as long as x has no
> side-effects - where a function call is treated as having side effects
> unless the function is declared with the const attribute,

That's actually _largely_ the main thing we care about anyway in most
places, so even the 4.4 behavior is probably acceptable in practice.

For example, for "min/max()", it's true that constants are what we're
looking at, but at the same time, the expression simplification really
only cares about "no side effects".

In some other cases, we really do want to see constants, though (the
example I mentioned earlier with build-time simplification of bswap
comes to mind).

> comma-expressions are hard-coded as having side-effects [and what else
> would they be good for, so that's not totally unreasonable...]

Actually, I think one of our more common uses of a comma expression is
not for side effects, but because of things like type warnings.

I guess that is a "side effect" too, although it's really a build-time
one, not a run-time one.

> and statement expressions have side effects if they have more than one
> statement.

Ok, that's just odd semantics.

But I guess it's actually a very natural result of "do obvious
simplifications early", ie one of those expression simplifications
that gcc-4.4 does on the early parse-tree level is to just turn a
single-statement statement expression directly into just the
expression.

So I think the gcc-4.4 behavior is "understandable", even if it
happens to be very much non-standard, and happens to break the "x*0"
trick that Martin's masterpiece uses.

If we knew exactly what gcc ends up simplifying, we might be able to
work around it. For example, is the front-end tree simplification
smart enough to realize that

1+(unsigned long)(x)+~(unsigned long)(x)

is also 0?

For a true *constant*, that's very easy for a compiler to see (just
evaluate the damn thing). For an expression that isn't constant, it's
a bit less obvious, and depends on knowing that "~x+1" is the same as
"-x" in 2's complement.

Of course, compilers *do* have that logic, but it's often deeper down
than at parse-tree time.

Doing some simple testing on godbolt, I see that gcc-4.1 actually
accepts this program:

unsigned long x;
void *ptr= (void *)(x-x);

and clearly considers "(void *)(x-x)" to be NULL.

But if you write it as

unsigned long x;
void *ptr= (void *)(1+x+~x);

it fails with 4.1.

But then it succeeds again with 4.4 and sees that it's just an odd way
to write 0.

Damn clever compiler writers to hell, they clearly started doing that
simplification early.

There are other weird ways to calculate 0, though. We're not *just*
limited to "*0" and trivial bit tricks. I bet we could come up with
something that always evaluates to zero, but is too hard for gcc to
turn into zero symbolically.

For example, gcc-4.4 does not realize that

#define rightmost(x) ((x)&-(x))
#define lowmask(x) (rightmost(x)-1)
unsigned long x;
void *ptr= (void *)(x&lowmask(x));

is a *really* weird way to write a constant zero. But if 'x' really is
a constant, it's immediately obvious because it just evaluates the
expression and get zero.

So if you use "#define x 0x1234", it turns into NULL again.

So we *could* make this work even for gcc-4.4.

Linus