2022-03-28 21:51:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] pin control bulk changes for v5.18

On Mon, Mar 28, 2022 at 6:08 AM Linus Walleij <[email protected]> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
> tags/pinctrl-v5.18-1

Hmm.

This clashes badly with the fact that we in the meantime have enabled
-Warray-bounds, and I got

drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c: In function ‘npcmgpio_irq_handler’:
./include/linux/find.h:40:23: error: array subscript ‘long unsigned
int[0]’ is partly outside array bounds of ‘u32[1]’ {aka ‘unsigned
int[1]’} [-Werror=array-bounds]
40 | val = *addr & GENMASK(size - 1, offset);
| ^~~~~
drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c:219:13: note: while
referencing ‘sts’
219 | u32 sts, en, bit;
| ^~~

as a result.

Was this not in linux-next?

Or was the array bounds checking not there?

Anyway, that cast to "const void *"

for_each_set_bit(bit, (const void *)&sts,

in that driver is completely wrong.

The bit operations are defined in arrays of 'unsigned long', and you
can't just cast the issue away, because the end result is not the same
on a big-endian machine.

I fixed it up in the merge, but what really confuses me (apart from
the apparent lack of testing in linux-next) is that I don't actually
see what made this happen now, and not before. Maybe that's why it
didn't show up in linux-next: it's some odd gcc heisenbug.

Because there seems to be no actual changes to that driver that would
explain why I get the warning now, but not before the pull.

There *was* a change from

- dev_dbg(bank->gc.parent, "...
+ dev_dbg(chip->parent_device, }...

in that function, but nothing else I notice.

VERY funky.

Kees, any idea?

Linus


2022-03-28 23:04:39

by J. Neuschäfer

[permalink] [raw]
Subject: Re: [GIT PULL] pin control bulk changes for v5.18

Hi,

On Mon, Mar 28, 2022 at 12:11:49PM -0700, Linus Torvalds wrote:
> On Mon, Mar 28, 2022 at 6:08 AM Linus Walleij <[email protected]> wrote:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
> > tags/pinctrl-v5.18-1
>
> Hmm.
>
> This clashes badly with the fact that we in the meantime have enabled
> -Warray-bounds, and I got
>
> drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c: In function ‘npcmgpio_irq_handler’:
> ./include/linux/find.h:40:23: error: array subscript ‘long unsigned
> int[0]’ is partly outside array bounds of ‘u32[1]’ {aka ‘unsigned
> int[1]’} [-Werror=array-bounds]
> 40 | val = *addr & GENMASK(size - 1, offset);
> | ^~~~~
> drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c:219:13: note: while
> referencing ‘sts’
> 219 | u32 sts, en, bit;
> | ^~~
>
> as a result.
>
> Was this not in linux-next?

It was — but when it was noticed, the fix went through the IRQ tree, on
top of a refactoring that happened there (the switch to generic_handle_domain_irq):

https://lore.kernel.org/lkml/164751044707.389.16417510835118111853.tip-bot2@tip-bot2/

So… the issue should resolve itself when the IRQ tree is pulled.

> Or was the array bounds checking not there?
>
> Anyway, that cast to "const void *"
>
> for_each_set_bit(bit, (const void *)&sts,
>
> in that driver is completely wrong.
>
> The bit operations are defined in arrays of 'unsigned long', and you
> can't just cast the issue away, because the end result is not the same
> on a big-endian machine.
>
> I fixed it up in the merge, but what really confuses me (apart from
> the apparent lack of testing in linux-next) is that I don't actually
> see what made this happen now, and not before. Maybe that's why it
> didn't show up in linux-next: it's some odd gcc heisenbug.
>
> Because there seems to be no actual changes to that driver that would
> explain why I get the warning now, but not before the pull.

When I added the pinctrl-wpcm450 driver, I changed the pinctrl/nuvoton/
directory to obj-y and exposed the pinctrl-npcm7xx to CI bot testing.
The bug existed, untouched, since the driver was added a few years ago.


Jonathan


Attachments:
(No filename) (2.27 kB)
signature.asc (849.00 B)
Download all attachments

2022-03-28 23:06:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] pin control bulk changes for v5.18

On Mon, Mar 28, 2022 at 2:38 PM Jonathan Neuschäfer
<[email protected]> wrote:
>
> It was — but when it was noticed, the fix went through the IRQ tree, on
> top of a refactoring that happened there (the switch to generic_handle_domain_irq):
>
> https://lore.kernel.org/lkml/164751044707.389.16417510835118111853.tip-bot2@tip-bot2/

Ok, that ends up at least being identical to what I did, so thats' good.

> When I added the pinctrl-wpcm450 driver, I changed the pinctrl/nuvoton/
> directory to obj-y and exposed the pinctrl-npcm7xx to CI bot testing.
> The bug existed, untouched, since the driver was added a few years ago.

Ahh. Good. So not a heisenbug, just happened to be hidden before.

Thanks for clarifying,

Linus

2022-03-30 14:09:07

by Linus Walleij

[permalink] [raw]
Subject: Re: [GIT PULL] pin control bulk changes for v5.18

On Mon, Mar 28, 2022 at 11:41 PM Linus Torvalds
<[email protected]> wrote:
> On Mon, Mar 28, 2022 at 2:38 PM Jonathan Neuschäfer
> <[email protected]> wrote:
> >
> > It was — but when it was noticed, the fix went through the IRQ tree, on
> > top of a refactoring that happened there (the switch to generic_handle_domain_irq):
> >
> > https://lore.kernel.org/lkml/164751044707.389.16417510835118111853.tip-bot2@tip-bot2/
>
> Ok, that ends up at least being identical to what I did, so thats' good.

Ah thanks for hashing this out both of you, I got a bit sweaty there since I
spent the first week in the merge window trying to hash out issues in this
driver.

I suppose there will be a different interesting merge conflict with the IRQ
tree now but the solution should be obvious I hope.

Yours,
Linus Walleij