2023-06-23 04:36:43

by Dave Airlie

[permalink] [raw]
Subject: arm32 build warnings in workqueue.c

Not sure what changed (maybe I ended up with -Werror on recently), but
my 32-bit arm build started to fail. 6.4.0-rc7.

gcc version 13.1.1 20230519 (Red Hat Cross 13.1.1-2) (GCC)

/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c: In
function ‘get_work_pwq’:
/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c:713:24:
error: cast to pointer from integer of different size
[-Werror=int-to-pointer-cast]
713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
| ^
/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c: In
function ‘get_work_pool’:
/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c:741:25:
error: cast to pointer from integer of different size
[-Werror=int-to-pointer-cast]
741 | return ((struct pool_workqueue *)
| ^
/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c: In
function ‘get_work_pool_id’:
/home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c:763:25:
error: cast to pointer from integer of different size
[-Werror=int-to-pointer-cast]
763 | return ((struct pool_workqueue *)
| ^

Just a drive-by, I'll disable Werror on my 32-bit arm build for now.

Dave.


2023-06-23 18:49:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Thu, 22 Jun 2023 at 20:57, Dave Airlie <[email protected]> wrote:
>
> Not sure what changed (maybe I ended up with -Werror on recently), but
> my 32-bit arm build started to fail. 6.4.0-rc7.
>
> gcc version 13.1.1 20230519 (Red Hat Cross 13.1.1-2) (GCC)
>
> /home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c: In
> function ‘get_work_pwq’:
> /home/airlied/devel/kernel/dim/drm-fixes/kernel/workqueue.c:713:24:
> error: cast to pointer from integer of different size
> [-Werror=int-to-pointer-cast]
> 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
> | ^

Ok, that's some nasty code, but I'm not entirely sure why gcc has
started complaining about it now.

'data' is of type 'unsigned long', and we cast things to pointers all the time..

Now, WORK_STRUCT_WQ_DATA_MASK is this odd enum type, and very very
arguably that is absolutely *horrible*, but the code does

enum {
[..]
WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT +
WORK_STRUCT_COLOR_BITS,
[..]
WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1,
WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
}

and while the above is absolutely disgusting and the type is not very
well defined, an enum type should be wide enough to contain all the
values of the enum. It should all default to 'int', but gcc has some
extensions.

The 'intent' here is that WORK_STRUCT_FLAG_MASK is of type UL, but
that's not actually how enum types necessarily work. Again, the enum
should be large enough to contain the *values* of the elements, not
necessarily the *types* of the elements. The type is always going to
be that enum.

(And again, I think according to the standard, it's always of type
'int', but every compiler does some kind of type widening for larger
values as an extension, and may use a smaller type for storage).

But even if WORK_STRUCT_FLAG_MASK isn't an unsigned long, and is
something smaller, the expression '~WORK_STRUCT_FLAG_MASK' must be *at
least* an 'int', and would be a negative one if so. That's just how C
integer expressions work - the '~' operator is guaranteed to make it
at *least* an int.

Now, again, the final type of an enum is not determined by the types
of the element initializers, but by their *values*. But that means
that the final type of an enum will have two choices:

- either ~WORK_STRUCT_FLAG_MASK was extended to be 'int', and the
value is negative, and the enum type has to be a signed type that can
contain that negative value.

IOW, the enum might actually be of type 'signed char', but then
doing that 'data & enum' will sign-extend the mask and everything is
fine.

- *or* the type of WORK_STRUCT_FLAG_MASK was interpreted to be an
unsigned type >= int, and the negation made it a really big unsigned
value, and that enum must thus be a large unsigned type to fit it

IOW, the enum might be an unsigned type, but it will be at LEAST of
size 'int', and I don't see why it would be a problem on 32-bit ARM.
Making it be 'unsigned long long' sounds insane. But maybe that is
what happened.

In either case, the code above should do the right thing, at least on
32-bit, but the warning is, I feel, valid.

Because as you can tell, the 'type' of an enum really isn't very obvious.

I think the whole 'type of an enum' is not only a very very bad thing
to rely on, I think it's something that C23 ends up codifying new
rules for. It may be why gcc started complaining now.

Anyway, that code absolutely has to be fixed. Using enums for types is
wrong, wrong, wrong. You should consider an enum to be a weakly typed
integer expression with some specific advantages (the automatic
incrementing at definition time, and the compiler being able to limit
ranges in switch() statements and maybe doing limited type warnings
elsewhere)

Any time you think it has a specific type size, you're setting
yourself up for pain and suffering. Gcc is right to warn when we do
odd arithmetic with it and rely on the width of the result.

So I really think that code needs fixing, and the gcc warning was very valid.

Maybe something like the attached. Does this fix the gcc warning?
Tejun, comments?

Linus


Attachments:
patch.diff (2.99 kB)

2023-06-23 19:23:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

[ Adding clang people. See this for background:

https://lore.kernel.org/all/CAHk-=wi=eDN4Ub0qSN27ztBAvHSXCyiY2stu3_XbTpYpQX4x7w@mail.gmail.com/

where that patch not only cleans things up, but seems to make a
difference to clang ]

On Fri, 23 Jun 2023 at 11:24, Linus Torvalds
<[email protected]> wrote:
>
> So I really think that code needs fixing, and the gcc warning was very valid.
>
> Maybe something like the attached. Does this fix the gcc warning?
> Tejun, comments?

Whee. Inexplicably, that patch also improves code generation with
clang, with things like this:

- movabsq $137438953440, %rcx # imm = 0x1FFFFFFFE0
- andq %rax, %rcx
- movabsq $68719476704, %rdx # imm = 0xFFFFFFFE0
- cmpq %rdx, %rcx
+ shrq $5, %rax
+ cmpl $2147483647, %eax # imm = 0x7FFFFFFF

in several places.

Or, even more amusingly, this:

- movabsq $68719476704, %rax # imm = 0xFFFFFFFE0
- orq $1, %rax
+ movabsq $68719476705, %rax # imm = 0xFFFFFFFE1

where the old code was some truly crazy stuff.

I have *no* idea what drugs clang is on, but clearly clang does some
really really bad things with large enums, and doesn't simplify things
correctly.

That "I can't even do a constant 'or' at compile time when it involves
an enum" is all kinds of odd.

Does this matter in the big picture? No. But I think the take-away
here should be that you really shouldn't use enums for random things.
Compilers know enums are used as small enumerated constants, and get
pissy and confused when you use them as some kind of generic storage
pool for values.

My guess is that clang keeps an enum as an enum as long as possible -
including past some (really) simple simplification phases of the
optimizer.

With gcc, code generation didn't change from that patch with my
defconfig on x86-64 (apart from line numbers changing).

Now, clang improving code generation with that patch is obviously a
good thing for the patch, but it does mean that clang really messed up
before.

Linus

2023-06-23 19:43:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, 23 Jun 2023 at 12:16, Nick Desaulniers <[email protected]> wrote:
>
> >
> > - movabsq $137438953440, %rcx # imm = 0x1FFFFFFFE0
> > - andq %rax, %rcx
> > - movabsq $68719476704, %rdx # imm = 0xFFFFFFFE0
> > - cmpq %rdx, %rcx
> > + shrq $5, %rax
> > + cmpl $2147483647, %eax # imm = 0x7FFFFFFF
> >
> > in several places.
>
> Sorry, are those equivalent? Before looks to me like:
>
> if (0xFFFFFFFE0 - (0x1FFFFFFFE0 & rax))
>
> and after
>
> if (0x7FFFFFFF - (rax >> 5))

I cut off some of the other changes, with the old (confused) code also doing

- shrq $5, %rax

later.

And yes, as far as I can tell, the comparisons are 100% equivalent.
Look at what happens when you shift that big mask down by 5..

> > My guess is that clang keeps an enum as an enum as long as possible -
> > including past some (really) simple simplification phases of the
> > optimizer.
>
> I don't think so.
> https://godbolt.org/z/M8746c49z
> That's LLVM IR as soon as it leaves the front end.

Hmm. I have

clang version 15.0.7 (Fedora 15.0.7-2.fc37)

and you can probably check the code generation of the current kernel
with and without that patch.

It might depend on exact config file - I'll send you mine in a
separate email to not pollute the mailing list.

Linus

2023-06-23 19:45:12

by Nick Desaulniers

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, Jun 23, 2023 at 11:52 AM Linus Torvalds
<[email protected]> wrote:
>
> [ Adding clang people. See this for background:
>
> https://lore.kernel.org/all/CAHk-=wi=eDN4Ub0qSN27ztBAvHSXCyiY2stu3_XbTpYpQX4x7w@mail.gmail.com/
>
> where that patch not only cleans things up, but seems to make a
> difference to clang ]
>
> On Fri, 23 Jun 2023 at 11:24, Linus Torvalds
> <[email protected]> wrote:
> >
> > So I really think that code needs fixing, and the gcc warning was very valid.
> >
> > Maybe something like the attached. Does this fix the gcc warning?
> > Tejun, comments?
>
> Whee. Inexplicably, that patch also improves code generation with
> clang, with things like this:
>
> - movabsq $137438953440, %rcx # imm = 0x1FFFFFFFE0
> - andq %rax, %rcx
> - movabsq $68719476704, %rdx # imm = 0xFFFFFFFE0
> - cmpq %rdx, %rcx
> + shrq $5, %rax
> + cmpl $2147483647, %eax # imm = 0x7FFFFFFF
>
> in several places.

Sorry, are those equivalent? Before looks to me like:

if (0xFFFFFFFE0 - (0x1FFFFFFFE0 & rax))

and after

if (0x7FFFFFFF - (rax >> 5))

>
> Or, even more amusingly, this:
>
> - movabsq $68719476704, %rax # imm = 0xFFFFFFFE0
> - orq $1, %rax
> + movabsq $68719476705, %rax # imm = 0xFFFFFFFE1
>
> where the old code was some truly crazy stuff.

Yeah, that's stupid. Which symbol's disassembly are you looking at? I
couldn't repro with a quick test: https://godbolt.org/z/dz1fEY9Wx.

>
> I have *no* idea what drugs clang is on, but clearly clang does some
> really really bad things with large enums, and doesn't simplify things
> correctly.
>
> That "I can't even do a constant 'or' at compile time when it involves
> an enum" is all kinds of odd.
>
> Does this matter in the big picture? No. But I think the take-away
> here should be that you really shouldn't use enums for random things.
> Compilers know enums are used as small enumerated constants, and get
> pissy and confused when you use them as some kind of generic storage
> pool for values.
>
> My guess is that clang keeps an enum as an enum as long as possible -
> including past some (really) simple simplification phases of the
> optimizer.

I don't think so.
https://godbolt.org/z/M8746c49z
That's LLVM IR as soon as it leaves the front end. Notice the use of
`i32` types. AFAIK, the IR does not contain the notion of enums. Clang
lowers enums to integral values of a specific width in LLVM IR.

>
> With gcc, code generation didn't change from that patch with my
> defconfig on x86-64 (apart from line numbers changing).
>
> Now, clang improving code generation with that patch is obviously a
> good thing for the patch, but it does mean that clang really messed up
> before.
>
> Linus



--
Thanks,
~Nick Desaulniers

2023-06-23 20:02:40

by Tejun Heo

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

Hello,

On Fri, Jun 23, 2023 at 11:24:16AM -0700, Linus Torvalds wrote:
...
> enum {
> [..]
> WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT +
> WORK_STRUCT_COLOR_BITS,
> [..]
> WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1,
> WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
> }
>
> and while the above is absolutely disgusting and the type is not very
> well defined, an enum type should be wide enough to contain all the
> values of the enum. It should all default to 'int', but gcc has some
> extensions.

What gcc used to do was determining the type for each enum entry in a way
that avoids surprises with sign extensions.

* If the initializer is signed, it just uses the smallest fitting integer
type. e.g. Even if the initializer is -1LL, it'd just be an int as the
negative range fits within int's range.

* If the intializer is unsigned. The size promotion rules are a bit tricky
but the compiler makes sure that it doesn't end up in a situation where an
unsigned initialized enums are sign extended through arithmetic type
promotions.

Here are results from clang 17.0.0 which I believe shows the same behavior
as older gcc's demonstrating how the types actually end up. The '-'s are
initialized with -1 or -1[L]U accordingly (source code at the end). TOGETHER
is them being defined in the same enum while SEPARATE is each enum in its
own enum block.

TOGETHER S32+ S32- U32+ U32- S64+ S64- U64+ U64-
sizeof: 4 4 4 8 4 4 4 8
signed: 1 0 1 1

SEPARATE S32+ S32- U32+ U32- S64+ S64- U64+ U64-
sizeof: 4 4 4 4 4 4 4 8
signed: 1 0 1 0

So, while it is a bit difficult to tell what exact type you'd end up with
(especially with the U64- but that also generates a warning indicating that
the enum initailizer is beyond max range), these shouldn't cause correctness
issues, although the fact that U32 with the highest bit set gets promoted to
the larger size when declared together is rather unfortunate and it's kidna
mysterious why that'd differ when each enum is defined separately.

gcc-13 dropped this behavior and uses the same widest type needed for the
entire enum definition block.

TOGETHER S32+ S32- U32+ U32- S64+ S64- U64+ U64-
sizeof: 16 16 16 16 16 16 16 16
signed: 1 0 1 0 (IGNORE)

SEPARATE S32+ S32- U32+ U32- S64+ S64- U64+ U64-
sizeof: 4 4 4 4 4 4 4 8
signed: 1 0 1 0

Because everything got pushed to 128bit for TOGETHER, the signed result
doesn't indicate anything, so please ignore that line.

Anyways, yeah, it's not great but it shouldn't cause correctness problems.
While the sizes can be difficult to expect, the signedness does get
respected.

> I think the whole 'type of an enum' is not only a very very bad thing
> to rely on, I think it's something that C23 ends up codifying new
> rules for. It may be why gcc started complaining now.

The behavior change in gcc-13 is fine on its own but it's frustrating
because there's no way to obtain previous behavior - ie. it can be
challenging to please both gcc-12 and gcc-13 with the same code when enums
are used this way.

> Anyway, that code absolutely has to be fixed. Using enums for types is
> wrong, wrong, wrong. You should consider an enum to be a weakly typed
> integer expression with some specific advantages (the automatic
> incrementing at definition time, and the compiler being able to limit
> ranges in switch() statements and maybe doing limited type warnings
> elsewhere)

The only benefit I care about is it being visible in the type system and
thus in debug info whether that's the usual one or BTF. This makes a huge
difference in tracing, monitoring and debugging. For one off's, it's fine to
track down the define values. For something more sophisticated (e.g.
non-trivial drgn, bcc and other BPF programs), the macro constants are a
source of sadness and a really dumb and insidious one.

I don't know why ppl didn't just add explicit types to enums or the size
promotion rules are like that, but we have to trade off between trivial
visibility and i-dont-know-how-big-this-is. It is really dumb but it is what
it is.

> Any time you think it has a specific type size, you're setting
> yourself up for pain and suffering. Gcc is right to warn when we do
> odd arithmetic with it and rely on the width of the result.
>
> So I really think that code needs fixing, and the gcc warning was very valid.
>
> Maybe something like the attached. Does this fix the gcc warning?
> Tejun, comments?

So, this is where I'm currently at. Given the circumstances, I don't think
we can get uniform or pretty. If we can enum our way, we should. If not,
macros. That means that we'll sometimes have to bother about these compiler
warnings and other issues and spend energy on them. One thing that's
positive is that this shouldn't lead to correctness problems. Does this make
sense to you?

For this particular case, some of the initializer values have highest bit
set, so gcc-13 must have pushed all of them to 64bit and then generated
warnings when we asked it to downcast it to a pointer.

> +/* Convenience constants - of type 'unsigned long', not 'enum'! */
> +#define WORK_OFFQ_CANCELING (1ul << __WORK_OFFQ_CANCELING)
> +#define WORK_OFFQ_POOL_NONE ((1ul << WORK_OFFQ_POOL_BITS) - 1)
> +#define WORK_STRUCT_NO_POOL (WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT)
> +
> +#define WORK_STRUCT_FLAG_MASK ((1ul << WORK_STRUCT_FLAG_BITS) - 1)
> +#define WORK_STRUCT_WQ_DATA_MASK (~WORK_STRUCT_FLAG_MASK)

So, yeah, as we have the "source" values in enums, this works for both
purposes. It's not difficult to recalculate these values from outside from
the source values and we get to specify the exact type internally. I'd be
happy to route it through the workqueue tree w/ patch-originally-from if
that's okay with you.

Thanks.

--
tejun

2023-06-23 20:17:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, 23 Jun 2023 at 12:22, Arnd Bergmann <[email protected]> wrote:
>
> The short explanation of the change is that with the previous
> gcc and clang behavior, the type of 'enum foo' would be determined
> separately from the type of each individual constant, while the
> new behavior in gcc-13 makes them all have the same type.

Oh, I actually thought that gcc already did the new behavior long ago.
It's the only sane one. Enums should all have the same type, that's
the whole point of it.

But it explains why the warning only showed up on 32-bit, where the
enum presumably ended up as 'long long', which is the same size as a
pointer on 64-bit (so no complaint), but not on 32-bit.

Anyway, our kernel code was disgusting, and apparently relied on that
horrendously wrong model of enum types.

Linus

2023-06-23 20:19:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, Jun 23, 2023, at 20:52, Linus Torvalds wrote:
> [ Adding clang people. See this for background:
>
>
> https://lore.kernel.org/all/CAHk-=wi=eDN4Ub0qSN27ztBAvHSXCyiY2stu3_XbTpYpQX4x7w@mail.gmail.com/
>
> where that patch not only cleans things up, but seems to make a
> difference to clang ]
>
> On Fri, 23 Jun 2023 at 11:24, Linus Torvalds
> <[email protected]> wrote:
>>
>> So I really think that code needs fixing, and the gcc warning was very valid.
>>
>> Maybe something like the attached. Does this fix the gcc warning?
>> Tejun, comments?
>
> Whee. Inexplicably, that patch also improves code generation with
> clang, with things like this:
>
> - movabsq $137438953440, %rcx # imm = 0x1FFFFFFFE0
> - andq %rax, %rcx
> - movabsq $68719476704, %rdx # imm = 0xFFFFFFFE0
> - cmpq %rdx, %rcx
> + shrq $5, %rax
> + cmpl $2147483647, %eax # imm = 0x7FFFFFFF
>
> in several places.
>
> Or, even more amusingly, this:
>
> - movabsq $68719476704, %rax # imm = 0xFFFFFFFE0
> - orq $1, %rax
> + movabsq $68719476705, %rax # imm = 0xFFFFFFFE1
>
> where the old code was some truly crazy stuff.
>
> I have *no* idea what drugs clang is on, but clearly clang does some
> really really bad things with large enums, and doesn't simplify things
> correctly.

I sent a fix for the warning earlier this year, and it's been
in linux-next for a while but hasn't made it to your tree yet,
see https://lore.kernel.org/all/[email protected]/
for the initial submission.

I went with the minimal change there, your more elaborate version
looks fine as well. There were a handful of bugs that came up with
the changed behavior, in other cases they caused the code to actually
misbehave rather than just causing a build warning.

After having looked at all the other instances, I still feel that
the new gcc behavior is the sanest way to handle this, but it was
not communicated well in the gcc-13 release notes, and I would have
preferred to see a warning for source code that had a change in code
generation because of this.

The short explanation of the change is that with the previous
gcc and clang behavior, the type of 'enum foo' would be determined
separately from the type of each individual constant, while the
new behavior in gcc-13 makes them all have the same type.
IIRC with a definition of

enum {
A = -1,
B = UINT_MAX,
} var;

you had 'typeof(A)' as an 'int', 'typeof(B)' as an 'unsigned int'
and 'typeof(var)' as a 'long long', now they are all 'long long'.

Arnd

2023-06-23 22:20:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, 23 Jun 2023 at 12:48, Tejun Heo <[email protected]> wrote:
>
> The behavior change in gcc-13 is fine on its own but it's frustrating
> because there's no way to obtain previous behavior

No, the previous gcc behavior was just buggy and wrong.

I'm actually surprised at what gcc did. It means that each enum, has a
different type. That's just completely wrong, and means that there is
no such thing as a proper enum type.

And the types are entirely random. They are *not* the types of the
initializers. Try this:

enum {
A = (char) 1,
B = (unsigned long) 2,
C = (long long) 3,
D = (unsigned long) 0x123456789,
};

int tA(void) { return sizeof(A); }
int tB(void) { return sizeof(B); }
int tC(void) { return sizeof(C); }
int tD(void) { return sizeof(D); }
int T(void) { return sizeof(enum bad); }
int crazy(void) { return sizeof(typeof(A)); }

and look at the insanity when you compile it with "gcc -S".

So no. There is NO WAY we want to ever "obtain previous behavior".
That is just garbage. Any code that depends on that behavior is just
actively wrong.

I had to go look at what sparse does, because I didn't think I would
ever have made it match that crazy gcc behavior.

But it does - because a few years ago Luc added the logic to match
gcc, and it never triggered me.

Oh, how very horrible.

Linus

2023-06-23 22:27:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri, 23 Jun 2023 at 15:15, Linus Torvalds
<[email protected]> wrote:
>
> I had to go look at what sparse does, because I didn't think I would
> ever have made it match that crazy gcc behavior.
>
> But it does - because a few years ago Luc added the logic to match
> gcc, and it never triggered me.
>
> Oh, how very horrible.

Yeah, I just went back and checked an older version of sparse - and
that older version actually DTRT, and gave every enum the proper type
(ie in that example I posted, every enum size was 8 bytes).

So even sparse now gets this wrong, because sparse was explicitly made
to match that horrid gcc behavior. I guess we got warnings from
sparse, and then that caused Luc to "fix" sparse and nobody realized
that the warnings were good.

Linus

2023-06-24 01:34:14

by Tejun Heo

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

Hello, Linus.

On Fri, Jun 23, 2023 at 03:15:51PM -0700, Linus Torvalds wrote:
> On Fri, 23 Jun 2023 at 12:48, Tejun Heo <[email protected]> wrote:
> > The behavior change in gcc-13 is fine on its own but it's frustrating
> > because there's no way to obtain previous behavior
>
> No, the previous gcc behavior was just buggy and wrong.
>
> I'm actually surprised at what gcc did. It means that each enum, has a
> different type. That's just completely wrong, and means that there is
> no such thing as a proper enum type.

I agree that the previous behavior is out there. It's just that it's been
like that forever and that's been the only way to make constants externally
visible.

> And the types are entirely random. They are *not* the types of the
> initializers. Try this:
>
> enum {
> A = (char) 1,
> B = (unsigned long) 2,
> C = (long long) 3,
> D = (unsigned long) 0x123456789,
> };
>
> int tA(void) { return sizeof(A); }
> int tB(void) { return sizeof(B); }
> int tC(void) { return sizeof(C); }
> int tD(void) { return sizeof(D); }
> int T(void) { return sizeof(enum bad); }
> int crazy(void) { return sizeof(typeof(A)); }
>
> and look at the insanity when you compile it with "gcc -S".

Oh yeah, the sizes make no senes.

> So no. There is NO WAY we want to ever "obtain previous behavior".
> That is just garbage. Any code that depends on that behavior is just
> actively wrong.

Nothing is willingly depending on the crazy type behavior. However, enums
are the only thing we've got if we want to make the constant values visible
externally, so we have to make do with them somehow. It has always been iffy
but also mostly bearable. Recent round of problems stem from the fact that
gcc-13 changed the behavior and it sometimes gets nasty to satisfy both
interpretations with the same code. Hypothetically, if we were to declare
that we no longer supported old craziness, cleaning things up would be
straight-forward.

That day will come in the future but, in the meantime, we have to deal with
enums whose sizes are going to be different depending on the complier
version. It's pretty unpleasant but likely manageable in practice. The only
way I can think of to keep the before and after behaviors the same is
defining every enum in its own enum block. ie. Explicitly force each to be
its own type. With a macro helper, code would probably look okay and we'd be
explicitly telling the compiler to pick the type that can contain the value
while maintaining signedness. That would give us consistent behaviors across
the gcc-12/13 boundary. It sure doesn't feel great tho.

Unless we declare that we don't care about external visibility and are done
with enums, none of the available options aren't great, which isn't too
surprising given the sad state enums have been in.

Thanks.

--
tejun

2023-07-31 08:38:49

by Petr Mladek

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Fri 2023-06-23 09:47:56, Tejun Heo wrote:
> Hello,
>
> On Fri, Jun 23, 2023 at 11:24:16AM -0700, Linus Torvalds wrote:
> ...
> > enum {
> > [..]
> > WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT +
> > WORK_STRUCT_COLOR_BITS,
> > [..]
> > WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1,
> > WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
> > }
> >
> > Anyway, that code absolutely has to be fixed. Using enums for types is
> > wrong, wrong, wrong. You should consider an enum to be a weakly typed
> > integer expression with some specific advantages (the automatic
> > incrementing at definition time, and the compiler being able to limit
> > ranges in switch() statements and maybe doing limited type warnings
> > elsewhere)
>
> The only benefit I care about is it being visible in the type system and
> thus in debug info whether that's the usual one or BTF. This makes a huge
> difference in tracing, monitoring and debugging. For one off's, it's fine to
> track down the define values. For something more sophisticated (e.g.
> non-trivial drgn, bcc and other BPF programs), the macro constants are a
> source of sadness and a really dumb and insidious one.

Would it help to use const variables?

> > +/* Convenience constants - of type 'unsigned long', not 'enum'! */
> > +#define WORK_OFFQ_CANCELING (1ul << __WORK_OFFQ_CANCELING)

static const WORK_OFFQ_CANCELING = 1ul << __WORK_OFFQ_CANCELING;

It is kind of nasty as well. But it defines the right type and should
be visible in debuginfo. Well, it is usable only for local variables.

Best Regards,
Petr

2023-07-31 19:58:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Mon, 31 Jul 2023 at 12:14, Tejun Heo <[email protected]> wrote:
>
> PAGE_SIZE is easily available through _SC_PAGE_SIZE, so that particular one
> is never a real problem (and a lot of tools have pre-defined helpers for it
> and similarly important constants) but yeah there are other constants which
> I sometimes wish were available through debug info.

We do have that

scripts/gdb/linux/constants.py.in

thing. Which seems to be the logical place do deal with this all.

That's where other - and arguably much more fundamental - kernel
#define's are dealt with.

Now, looking at the particular constants that are listed, I get the
feeling that the people who have done that script may be mostly
interested in filesystems, but I don't see why it wouldn't be
appropriate for the workstruct stuff too...

Linus

2023-07-31 19:59:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

On Mon, 31 Jul 2023 at 01:20, Petr Mladek <[email protected]> wrote:
>
> Would it help to use const variables?

I guess it would _work_ these days (I think modern C has picked up
what C++ idiom of "const variable with an initializer is a constant"),
although with old compilers I would actually not be at all surprised
to see 'const variable' uses being turned into "load from memory",

But I don't see any real advantage, and it gets very nasty for non-static ones.

I think people just need to expand some things manually if they use a debugger.

It's not like we don't use macros absolutely EVERYWHERE ELSE, and any
gdb user already has to deal with it. The workqueue flags are the
least of your problems.

So I think the whole "gdb debug info" argument is complete garbage,
and was never true at all. If you want gdb to know about these
constants, you just do the same thing that gdb users already have to
do for other things.

We have all those gdb python helper scripts for other kernel magic -
some of it *much* deeper magic than a trivial "combine a few
constants".

And honestly, I don't understand why anybody seriously believes that
those WORK_STRUCT constants are somehow very important. We have many
*much* more fundamental constants that are #define's. Thinking that
WORK_OFFQ_CANCELING needs special gdb understanding when we have
PAGE_SIZE that does not seems entirely crazy to me.

Linus

2023-07-31 20:16:59

by Tejun Heo

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

Hello,

On Mon, Jul 31, 2023 at 11:40:04AM -0700, Linus Torvalds wrote:
> So I think the whole "gdb debug info" argument is complete garbage,
> and was never true at all. If you want gdb to know about these
> constants, you just do the same thing that gdb users already have to
> do for other things.

When manually debugging with gdb, figuring out these constants might not add
too much overhead on top but nowadays there are many other tools which can
benefit from having the source of truth available to them directly from the
kernel.

For example, it's a lot easier to write non-trivial debug scripts with drgn
(https://github.com/osandov/drgn) and it's not too unusual to have a
collection of bcc / bpftrace scripts for diagnosing common problems. They of
course don't have any interface stability expectations. However, there's no
reason to make using them unnecessarily painful given how useful they are.
Having constant values included in the kernel debug info makes it a lot
easier to use these tools and removes a silly source of subtle problems.

> And honestly, I don't understand why anybody seriously believes that
> those WORK_STRUCT constants are somehow very important. We have many
> *much* more fundamental constants that are #define's. Thinking that
> WORK_OFFQ_CANCELING needs special gdb understanding when we have
> PAGE_SIZE that does not seems entirely crazy to me.

PAGE_SIZE is easily available through _SC_PAGE_SIZE, so that particular one
is never a real problem (and a lot of tools have pre-defined helpers for it
and similarly important constants) but yeah there are other constants which
I sometimes wish were available through debug info.

It may be difficult to argue specifically for WORK_OFFQ_CANCELING in
isolation as it is directly derived from __WORK_OFFQ_CANCELNG, so no problem
there, but I hope you can see that having these constants available in debug
info in general is useful.

Even here, the value of __WORK_OFFQ_CANCELING is dependent on
CONFIG_DEBUG_OBJECTS_WORK. It's not a strong case as the option is pretty
specifically for debugging but if one is to actually debug or monitor
workqueue using external visibility tools, this can easily lead to mistakes.

Thanks.

--
tejun

2023-08-04 00:29:54

by Tejun Heo

[permalink] [raw]
Subject: Re: arm32 build warnings in workqueue.c

Hello, Linus.

On Mon, Jul 31, 2023 at 12:22:06PM -0700, Linus Torvalds wrote:
> We do have that
>
> scripts/gdb/linux/constants.py.in
>
> thing. Which seems to be the logical place do deal with this all.
>
> That's where other - and arguably much more fundamental - kernel
> #define's are dealt with.
>
> Now, looking at the particular constants that are listed, I get the
> feeling that the people who have done that script may be mostly
> interested in filesystems, but I don't see why it wouldn't be
> appropriate for the workstruct stuff too...

It can be used but there are some drawbacks:

* It requires maintaining external code, which usually doesn't scale that
well, especially over extended period of time.

* It requires pre-processing header files and then evaluating the resulting
C expressions. It's not a showstopper and some tools can already do so
but, for example, to use this with drgn, we'd have to generate this into
python expressions using something and then source it. Maybe we can
prepackage the pre-processed results but things do get more complicated
and likely more error-prone.

* This likely will cover some of the constants but there are also many which
aren't readily visible in public header files. It is possible to try to
preprocess .c files too but then now each machine has to have full source
deployed and so on.

So, not that it's impossible but if you contrast this to having the
constants available in the debug info:

* There's only one source of truth. No external maintenance is needed.

* The tools already rely heavily on having debug info available for type
information and the symbols. As needing debug info is pretty common, most
distros and prod environments provide ready access to them, whether kernel
or applications.

* Having constants in debug info doesn't add any complexity or point of
possible mistakes. As long as the binary matches the debug info, which is
a property that tools and environments watch and maintain closely and easy
to notice when broken, you know that you're using the same numers as the
running kernel, which is a huge relief.

The downside of using enums is that C's enum is not great in terms of type
predictability. I personally would much prefer specifying the types
explicitly, but, as long as the rules are stable, I think it's something we
can live with. While a bit annoying, the benefits of using them outweighs
the annoyances in general.

It's just so silly that constants of all the stuff we need for monitoring
and debugging are the one thing that we can't provide through debug info and
requires all the special, potentially fragile, workarounds.

Thanks.

--
tejun