2022-02-17 23:49:09

by Jakob Koschel

[permalink] [raw]
Subject: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

It is unsafe to assume that &req->req != _req can only evaluate
to false if the break within the list iterator is hit.

When the break is not hit, req is set to an address derived from the
head element. If _req would match with that value of req it would
allow continuing beyond the safety check even if the _req was never
found within the list.

Signed-off-by: Jakob Koschel <[email protected]>
---
drivers/usb/gadget/udc/gr_udc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/udc/gr_udc.c b/drivers/usb/gadget/udc/gr_udc.c
index 4b35739d3695..18ae2c7a1656 100644
--- a/drivers/usb/gadget/udc/gr_udc.c
+++ b/drivers/usb/gadget/udc/gr_udc.c
@@ -1695,6 +1695,7 @@ static int gr_dequeue(struct usb_ep *_ep, struct usb_request *_req)
struct gr_udc *dev;
int ret = 0;
unsigned long flags;
+ bool found = false;

ep = container_of(_ep, struct gr_ep, ep);
if (!_ep || !_req || (!ep->ep.desc && ep->num != 0))
@@ -1711,10 +1712,12 @@ static int gr_dequeue(struct usb_ep *_ep, struct usb_request *_req)

/* Make sure it's actually queued on this endpoint */
list_for_each_entry(req, &ep->queue, queue) {
- if (&req->req == _req)
+ if (&req->req == _req) {
+ found = true;
break;
+ }
}
- if (&req->req != _req) {
+ if (!found) {
ret = -EINVAL;
goto out;
}
--
2.25.1


2022-02-24 00:44:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 9:43 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, Feb 23, 2022 at 12:25 PM Arnd Bergmann <[email protected]> wrote:
> >
> > I looked at the gcc documentation for this flag, and it tells me that
> > it's default-enabled for std=c99 or higher. Turning it on for --std=gnu89
> > shows the same warning, so at least it doesn't sound like the actual
> > behavior changed, only the warning output. clang does not warn
> > for this code at all, regardless of the --std= flag.
>
> Ok, so we should be able to basically convert '--std=gnu89' into
> '--std=gnu11 -Wno-shift-negative-value' with no expected change of
> behavior.

Yes, I think that is correct.

> Of course, maybe we need to make -Wno-shift-negative-value be
> conditional on the compiler supporting it in the first place?

I think they all do. I discussed this with Nathan Chancellor on IRC, to
see what clang does, and he points out that the warning was made
conditional on -fwrapv there a while ago:

https://github.com/llvm/llvm-project/commit/59802321785b4b9fc31b10456c62ba3a06d3a631

So the normal behavior on clang is to always warn about it, but since
we explicitly ask for sane signed integer behavior, it doesn't warn for
the kernel.

Arnd

2022-02-24 00:56:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 1:46 PM Arnd Bergmann <[email protected]> wrote:
>
> > Ok, so we should be able to basically convert '--std=gnu89' into
> > '--std=gnu11 -Wno-shift-negative-value' with no expected change of
> > behavior.
>
> Yes, I think that is correct.

Ok, somebody please remind me, and let's just try this early in the
5.18 merge window.

Because at least for me, doing

- -std=gnu89
+ -std=gnu11 -Wno-shift-negative-value

for KBUILD_CFLAGS works fine both in my gcc and clang builds. But
that's obviously just one version of each.

(I left the host compiler flags alone - I have this memory of us
having had issues with people having old host compilers and wanting
headers for tools still build with gcc-4)

Linus

2022-02-24 00:59:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 12:25 PM Arnd Bergmann <[email protected]> wrote:
>
> I looked at the gcc documentation for this flag, and it tells me that
> it's default-enabled for std=c99 or higher. Turning it on for --std=gnu89
> shows the same warning, so at least it doesn't sound like the actual
> behavior changed, only the warning output. clang does not warn
> for this code at all, regardless of the --std= flag.

Ok, so we should be able to basically convert '--std=gnu89' into
'--std=gnu11 -Wno-shift-negative-value' with no expected change of
behavior.

Of course, maybe we need to make -Wno-shift-negative-value be
conditional on the compiler supporting it in the first place?

I really would love to finally move forward on this, considering that
it's been brewing for many many years.

I think the loop iterators are the biggest user-visible thing, but
there might be others.

And some googling seems to show that the reason for
-Wshift-negative-value is that with C99 the semantics changed for
targets that weren't two's complement. We *really* don't care.

Of course, the C standard being the bunch of incompetents they are,
they in the process apparently made left-shifts undefined (rather than
implementation-defined). Christ, they keep on making the same mistakes
over and over. What was the definition of insanity again?

Linus

2022-02-24 01:05:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 8:23 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, Feb 23, 2022 at 10:47 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > Arnd - remind me, please.. Was there some other problem than just gcc-4.9?


I'm pretty sure this was the only thing. And not even because gcc-4.9 didn't
support later standards, but because it caused some false-postivee
warnings like

In file included from ../drivers/usb/core/notify.c:16:0:
../include/linux/notifier.h:117:9: error: initializer element is not constant
struct blocking_notifier_head name = \
^
../drivers/usb/core/notify.c:21:8: note: in expansion of macro
'BLOCKING_NOTIFIER_HEAD'
static BLOCKING_NOTIFIER_HEAD(usb_notifier_list);
^
../include/linux/notifier.h:117:9: error: (near initialization for
'usb_notifier_list.rwsem.wait_lock')
struct blocking_notifier_head name = \
^
../drivers/usb/core/notify.c:21:8: note: in expansion of macro
'BLOCKING_NOTIFIER_HEAD'
static BLOCKING_NOTIFIER_HEAD(usb_notifier_list);
^

> Hmm. Interesting. I decided to just try it and see for the compiler I
> have, and changing the gnu89 to gnu99 I get new warnings
> (-Werror=shift-negative-value).

FWIW, I think we can go straight to gnu11 in this case even with gcc-5, no
need to take gnu99 as an intermediate step. I don't know if there is a
practical difference between the two though.

gcc-8 and higher also support --std=gnu18 and --std=gnu2x, which also
work for building the kernel but would break gcc-5/6/7 support

> Very annoying. Especially since negative values are in many contexts
> actually *safer* than positive ones when used as a mask, because as
> long as the top bit is set in 'int', if the end result is then
> expanded to some wider type, the top bit stays set.
>
> Example:
>
> unsigned long mask(unsigned long x)
> { return x & (~0 << 5); }
>
> unsigned long badmask(unsigned long x)
> { return x & (~0u << 5); }
>
> One does it properly, the other is buggy.
>
> Now, with an explicit "unsigned long" like this, some clueless
> compiler person might say "just use "~0ul", but that's completely
> wrong - because quite often the type is *not* this visible, and the
> signed version works *regardless* of type.
>
> So this Werror=shift-negative-value warning seems to be actively
> detrimental, and I'm not seeing the reason for it. Can somebody
> explain the thinking for that stupid warning?
>
> That said, we seem to only have two cases of it in the kernel, at
> least by a x86-64 allmodconfig build. So we could examine the types
> there, or we could just add '-Wno-shift-negative-value".

I looked at the gcc documentation for this flag, and it tells me that
it's default-enabled for std=c99 or higher. Turning it on for --std=gnu89
shows the same warning, so at least it doesn't sound like the actual
behavior changed, only the warning output. clang does not warn
for this code at all, regardless of the --std= flag.

Arnd

2022-02-24 01:28:24

by David Laight

[permalink] [raw]
Subject: RE: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

From: Linus Torvalds
> Sent: 23 February 2022 20:55
>
> On Wed, Feb 23, 2022 at 12:43 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Of course, the C standard being the bunch of incompetents they are,
> > they in the process apparently made left-shifts undefined (rather than
> > implementation-defined). Christ, they keep on making the same mistakes
> > over and over. What was the definition of insanity again?
>
> Hey, some more googling on my part seems to say that somebody saw the
> light, and it's likely getting fixed in newer C standard version.
>
> So it was just a mistake, not actual malice. Maybe we can hope that
> the tide is turning against the "undefined" crowd that used to rule
> the roost in the C standards bodies. Maybe the fundamental security
> issues with undefined behavior finally convinced people how bad it
> was?

IIRC UB includes 'fire an ICBM at the writer of the standards document'.
There isn't an 'undefined value' or even 'undefined value or program trap'
option.

It also seems to me that the compiler people are picking on things
in the standard that are there to let 'obscure machines conform'
and then using them to break perfectly reasonable programs.

Signed arithmetic is not required to wrap so that cpu (eg DSP)
can do saturating maths - not so the compiler can remove some
conditionals.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-02-24 01:28:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 12:43 PM Linus Torvalds
<[email protected]> wrote:
>
> Of course, the C standard being the bunch of incompetents they are,
> they in the process apparently made left-shifts undefined (rather than
> implementation-defined). Christ, they keep on making the same mistakes
> over and over. What was the definition of insanity again?

Hey, some more googling on my part seems to say that somebody saw the
light, and it's likely getting fixed in newer C standard version.

So it was just a mistake, not actual malice. Maybe we can hope that
the tide is turning against the "undefined" crowd that used to rule
the roost in the C standards bodies. Maybe the fundamental security
issues with undefined behavior finally convinced people how bad it
was?

Linus

2022-02-24 12:28:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Thu, Feb 24, 2022 at 11:46:40AM +0100, Cristiano Giuffrida wrote:
> I think the "problem" with this solution is that it doesn't prevent
> `tmp` from being used outside the loop still (and people getting it
> wrong again)? It would be good to have 'struct gr_request *tmp;' being
> visible only inside the loop (i.e., declared in the macro).

That is a larger change, one that we can hopefully make when changing to
introduce the temp variable in the loop_for_each() macro as Linus
described elsewhere in the thread.

But for now, it should be pretty obvious to not touch tmp again after
the loop is done. That should make it easier to check for stuff like
this as well in an automated fashion (i.e. the loop variable is only
touched inside the loop.)

thanks,

greg k-h

2022-02-24 16:44:28

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 01:53:39PM -0800, Linus Torvalds wrote:
> On Wed, Feb 23, 2022 at 1:46 PM Arnd Bergmann <[email protected]> wrote:
> >
> > > Ok, so we should be able to basically convert '--std=gnu89' into
> > > '--std=gnu11 -Wno-shift-negative-value' with no expected change of
> > > behavior.
> >
> > Yes, I think that is correct.
>
> Ok, somebody please remind me, and let's just try this early in the
> 5.18 merge window.
>
> Because at least for me, doing
>
> - -std=gnu89
> + -std=gnu11 -Wno-shift-negative-value
>
> for KBUILD_CFLAGS works fine both in my gcc and clang builds. But
> that's obviously just one version of each.

I ran that diff through my set of clang builds on
v5.17-rc5-21-g23d04328444a and only found one issue:

https://github.com/ClangBuiltLinux/linux/issues/1603

I think that should be fixed on the clang side. Once it is, I think we
could just disable that warning in those translation units for older
versions of clang to keep the status quo.

Cheers,
Nathan

2022-02-26 01:58:45

by Uecker, Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

Am Mittwoch, den 23.02.2022, 20:54 +0000 schrieb Linus Torvalds:
> On Wed, Feb 23, 2022 at 12:43 PM Linus Torvalds
> <[email protected]> wrote:
> > Of course, the C standard being the bunch of incompetents they are,
> > they in the process apparently made left-shifts undefined (rather than
> > implementation-defined). Christ, they keep on making the same mistakes
> > over and over. What was the definition of insanity again?

Implementation-defined only means that it needs to be
documented (and clang does not even do this), so
I am not sure what difference this would make.

> Hey, some more googling on my part seems to say that somebody saw the
> light, and it's likely getting fixed in newer C standard version.

I don't think it is changed. But C23 will require
integers to be repreeted using two's complement,
so there is a better chance to fix things
like this in the future.

> So it was just a mistake, not actual malice. Maybe we can hope that
> the tide is turning against the "undefined" crowd that used to rule
> the roost in the C standards bodies. Maybe the fundamental security
> issues with undefined behavior finally convinced people how bad it
> was?

The right people to complain to are the
compiler vendors, because they decide what
UB does in their implementation. In the
standard body the same people argue that
the standard has to codify existing
practice. Even in cases where the standard
defines behavior, compilers sometimes simply
ignore this (e.g. pointer comparison or
pointer-to-integer round trips). So the
power is really with the compiler writers.


Martin


2022-02-26 02:36:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Fri, Feb 25, 2022 at 1:36 PM Uecker, Martin
<[email protected]> wrote:
>
> Implementation-defined only means that it needs to be
> documented (and clang does not even do this), so
> I am not sure what difference this would make.

No, implementation-defined means something *much* more than "undefined".

Yes, it means that the behavior has to be documented.

But that's not actually the big deal from a user standpoint (although
it might be a big annoyance from a compiler writer's standpoint -
compiler writers would probably much prefer "let me do the obvious
sane thing" over "let me do the obvious sane thing and then have to
write documentation about how obviously sane it is").

From a user standpoint, the big thing is that "implementation-defined"
means that the behavior has to have *SOME* well-defined behavior.
Documentation is irrelevant. But RELIABILITY is relevant.

See?

That's a big big deal. The documentation is almost incidental - the
important part is that the code acts the same on the same
architecture, regardless of compiler options, and regardless of the
phase of the moon. When you upgrade your compiler to a new version,
the end result doesn't suddenly change.

In contrast, "undefined" means that the same C code can do two totally
different things with different optimization options, or based on just
any random thing - like some random register allocation issue.

So "undefined" behavior means that changes to code that isn't even
near the code in question can change what the code generation for that
code is. And the compiler may not even report it.

That is a complete disaster. It's a disaster from a security
standpoint, it's a disaster from a maintenance standpoint, it's just
*bad*.

And the C standards committee has traditionally used to think it was a
good thing, because they listened to compiler writers that claimed
that it allowed them to do wild optimizations and improve code
generation.

Example: the completely broken type-based aliasing rules that changed
semantics of C for the worse.

Reality: it doesn't actually improve code generation all that much.
Instead it just results in a lot of problems, and any sane C code base
that cares about security and stability will just turn that off.

Same goes for integer overflow etc.

The only really valid use-case for "undefined" is when you're doing
things like accessing past your own allocations. The compiler can't do
much about things like that.

But the C standards body has actually actively screwed the pooch over
the years, and turned perfectly traditional code into "undefined" code
for no good reason. "-1 << 1" is just one particularly simplistic
example.

> > Hey, some more googling on my part seems to say that somebody saw the
> > light, and it's likely getting fixed in newer C standard version.
>
> I don't think it is changed. But C23 will require
> integers to be repreeted using two's complement,
> so there is a better chance to fix things
> like this in the future.

If integers are guaranteed to be two's complement, then the only
possible explanation for "left shift is undefined" goes away. So
presumably the thing goes hand-in-hand with just making (the de-facto)
negative shifting well-defined.

(Btw, shifting *by* a negative number - or by amounts bigger than the
width of the type - is different. There are at least real reasons to
complain about that, although I think "implementation defined" would
still be hugely preferred over "undefined")

> The right people to complain to are the
> compiler vendors, because they decide what
> UB does in their implementation.

No. The bug was in the spec to begin with. The people to complain
about are the incompetents that literally changed the C standard to be
less well-specified.

As far as I know, no actual compiler has ever made integer left-shift
done anything bad. You'd literally have to do extra work for it, so
there's no reason to do so.

But because the standards committee messed up, the compiler writers
*did* do the extra work to warn about the fact (for the trivial
"negative constant" case - never mind that technically it's undefined
for non-constants: that would have been too much work).

End result: the warning is stupid. The warning only exists for
artificially stupid reasons.

And those reasons are literally "broken C standards committee behavior".

Linus

2022-02-26 02:43:36

by Martin Uecker

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

Am Freitag, den 25.02.2022, 14:02 -0800 schrieb Linus Torvalds:
> On Fri, Feb 25, 2022 at 1:36 PM Uecker, Martin
> <[email protected]> wrote:
> > Implementation-defined only means that it needs to be
> > documented (and clang does not even do this), so
> > I am not sure what difference this would make.
>
> No, implementation-defined means something *much* more than "undefined".
>
> Yes, it means that the behavior has to be documented.
>
> But that's not actually the big deal from a user standpoint (although
> it might be a big annoyance from a compiler writer's standpoint -
> compiler writers would probably much prefer "let me do the obvious
> sane thing" over "let me do the obvious sane thing and then have to
> write documentation about how obviously sane it is").
>
> From a user standpoint, the big thing is that "implementation-defined"
> means that the behavior has to have *SOME* well-defined behavior.
> Documentation is irrelevant. But RELIABILITY is relevant.
>
> See?

In principle, a compiler writer could document
implementation-defined behavior to do something
crazy and promise that something the standard
leaves undefined has useful and consistent behavior.

But I think you are right that this still makes
a huge difference in practice because the two
notions encourage different things.

...

> In contrast, "undefined" means that the same C code can do two totally
> different things with different optimization options, or based on just
> any random thing - like some random register allocation issue.
>
> So "undefined" behavior means that changes to code that isn't even
> near the code in question can change what the code generation for that
> code is. And the compiler may not even report it.

Yes, it means the compiler is free to do this.

But it does not mean a compiler *has* to do this.

Compiler writers represented in the committee want
to have the UB because some customers seem to like
those optimizations. At the same time, a programmer
who complains about some undesired effect of such
an UB-based optimization is typically told that the
program has UB according to the C standard so
nothing can be done (bug report closed).

You see what is happening here?

Roughly the same group of people / companies that
write the compilers also control what goes into the
standard. They then like to point to the standard
and reject all responsibility for *their*
implementation choices. By complaining about
the standard, one detracts from the fact that
the compiler writers make these choices and
also largely control what goes in the standard.


> That is a complete disaster. It's a disaster from a security
> standpoint, it's a disaster from a maintenance standpoint, it's just
> *bad*.

It is sometimes pure necessity. An unbounded write
has unbounded consequences and there is no realistic
way to give it defined behavior (in the context
of C). But for many other cases of UB, I completely
agree with you.

> And the C standards committee has traditionally used to think it was a
> good thing, because they listened to compiler writers that claimed
> that it allowed them to do wild optimizations and improve code
> generation.

There are many compiler people in the standards committee,
so there is no surprise here.

> Example: the completely broken type-based aliasing rules that changed
> semantics of C for the worse.

I do not really know the history of this, but
I think these rules existed even before compilers
really exploited them for optimization.

> Reality: it doesn't actually improve code generation all that much.
> Instead it just results in a lot of problems, and any sane C code base
> that cares about security and stability will just turn that off.
>
> Same goes for integer overflow etc.

For signed overflow, I am not entirely sure what the
right choice is. Wrapping for signed overflow also seems
dangerous. I use UBsan to find such issues in my code, and
this would not really work if signed overflow was defined
to wrap.

...
> > > Hey, some more googling on my part seems to say that somebody saw the
> > > light, and it's likely getting fixed in newer C standard version.
> >
> > I don't think it is changed. But C23 will require
> > integers to be repreeted using two's complement,
> > so there is a better chance to fix things
> > like this in the future.
>
> If integers are guaranteed to be two's complement, then the only
> possible explanation for "left shift is undefined" goes away. So
> presumably the thing goes hand-in-hand with just making (the de-facto)
> negative shifting well-defined.

I think there is a fairy good chance to get this
changed.

> (Btw, shifting *by* a negative number - or by amounts bigger than the
> width of the type - is different. There are at least real reasons to
> complain about that, although I think "implementation defined" would
> still be hugely preferred over "undefined")
>
> > The right people to complain to are the
> > compiler vendors, because they decide what
> > UB does in their implementation.
>
> No. The bug was in the spec to begin with. The people to complain
> about are the incompetents that literally changed the C standard to be
> less well-specified.
>
> As far as I know, no actual compiler has ever made integer left-shift
> done anything bad. You'd literally have to do extra work for it, so
> there's no reason to do so.

I agree that the standard should be changed. But
this example also shows that UB in the standard
does not inevitably causes the compilers to do
something stupid. It is still their choice.

> But because the standards committee messed up, the compiler writers
> *did* do the extra work to warn about the fact (for the trivial
> "negative constant" case - never mind that technically it's undefined
> for non-constants: that would have been too much work).

Compilers do *not* warn about a lot of undefined behavior
(even where it would be easily possible and make perfect
sense). So why here?


Martin


2022-02-27 18:46:24

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Sat, Feb 26, 2022 at 3:43 AM Martin Uecker <[email protected]> wrote:
>
> Roughly the same group of people / companies that
> write the compilers also control what goes into the
> standard. They then like to point to the standard

Indeed, at least to a substantial degree.

> For signed overflow, I am not entirely sure what the
> right choice is. Wrapping for signed overflow also seems
> dangerous. I use UBsan to find such issues in my code, and
> this would not really work if signed overflow was defined
> to wrap.

UBsan and similar tooling may still be used to find whatever behavior
one wants, whether defined or not. UBSan already has non-UB checks.

Cheers,
Miguel

2022-02-27 21:39:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Sun, Feb 27, 2022 at 12:22 PM Segher Boessenkool
<[email protected]> wrote:
>
> Requiring to annotate every place that has UB (or *can* have UB!) by the
> user is even less friendly than having so much UB is already :-(

Yeah, I don't think that's the solution. In fact, I don't think that's
even practically the _issue_.

Honestly, a lot of "undefined behavior" in C is quite often of the
kind "the programmer knows what he wants".

Things like word size or byte order issues etc are classic "undefined
behavior" in the sense that the C compiler really doesn't understand
them. The C compiler won't silently fix any silly behavior you get
from writing files in native byte order, and them not working on other
platforms.

Same goes for things like memory allocators - they often need to do
things that the standard doesn't really cover, and shouldn't even
*try* to cover. It's very much a core example of where people do odd
pointer arithmetic and change the type of pointers.

The problem with the C model of "undefined behavior" is not that the
behavior ends up being architecture-specific and depending on various
in-memory (or in-register) representation of the data. No, those
things are often very much intentional (even if in the case of byte
order, the "intention" may be that the programmer simply does not
care, and "knows" that all the world is little endian).

If the C compiler just generates reliable code that can sanely be
debugged - including very much using tools that look for "hey, this
behavior can be surprising", ie all those "look for bad patterns at
run-time", then that would be 100% FINE.

But the problem with the C notion of undefined behavior is NOT that
"results can depend on memory layout and other architecture issues
that the compiler doesn't understand".

No, the problem is that the C standards people - and compiler people -
have then declared that "because this can be surprising, and the
compiler doesn't understand what is going on, now the compiler can do
something *else* entirely".

THAT is the problem.

The classic case - and my personal "beat a dead horse" - is the
completely broken type-based aliasing rules. The standards people
literally said "the compiler doesn't understand this, it can expose
byte order dependencies that shouldn't be explained, SO THE COMPILER
CAN NOW DO SOMETHING COMPLETELY INSANE INSTEAD".

And compiler people? They rushed out to do completely broken garbage -
at least some of them did.

You can literally find compiler people who see code like this (very
traditional, traditionally very valid and common, although):

// Return the least significant 16 bits of 'a' on LE machines
#define low_16_bits(x) (*(unsigned short *)&(x))

and say "oh, because that's undefined, I can now decide to not do what
the programmer told me to do AT ALL".

Note that the above wasn't actually even badly defined originally. It
was well-defined, it was used, and it was done by programmers that
knew what they were doing.

And then the C standards people decided that "because our job isn't to
describe all the architectural issues you can hit, we'll call it
undefined, and in the process let compiler people intentionally break
it".

THAT is a problem.

Undefined results are are often intentional in system software - or
they can be debugged using smart tools (including possibly very
expensive run-time code generation) that actively look for them.

But compilers that randomly do crazy things because the standard was
bad? That's just broken.

If compilers treated "undefined" as the same as
"implementation-defined, but not explicitly documented", then that
would be fine. But the C standards people - and a lot of compiler
people - really don't seem to understand the problems they caused.

And, btw, caused for no actual good reason. The HPC people who wanted
Fortran-style aliasing could easily have had that with an extension.
Yes, "restrict" is kind of a crappy one, but it could have been
improved upon. Instead, people said "let's just break the language".

Same exact thing goes for signed integer overflow.

Linus

2022-02-28 02:41:11

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Sun, Feb 27, 2022 at 9:19 PM Segher Boessenkool
<[email protected]> wrote:
>
> Requiring to annotate every place that has UB (or *can* have UB!) by the
> user is even less friendly than having so much UB is already :-(

Sure, but I did not suggest to annotate every place -- at least not in C.

What I said is that in cases like your division by zero example, there
is no one-fits-all solution. For instance, for integer arithmetic,
leaving the choice (e.g. wrapping, saturating, unchecked...) to the
user is a better approach.

> You need a VM like Java's to get even *close* to that. This is not the
> C target: it is slower than wanted/expected, it is hosted instead of
> embedded, and it comes with a whole host of issues of its own. One of
> the strengths of C is its tiny runtime, a few kB is a lot already!
>
> I completely agree that if you design a new "systems" language, you want
> to have much less undefined behaviour than C has. But it is self-
> delusion to think you can eradicate all (or even most).

Nobody is suggesting to remove "all UB" in that sense or to use
VM-supported languages.

However, you can """eradicate all UB""" in a sense: you can offer to
write most code in a subset that does not allow any potential-UB
operations. This can even be "all" code, depending on how you count
(e.g. all application code).

Obviously, you still need the unchecked operations to implement the
safe APIs. This is why I mentioned them.

> And there are much bigger problems in any case! If you think that if
> programmers could no longer write programs that invoke undefined
> behaviour they will write much better programs, programs with fewer
> serious functionality or security problems, even just a factor of two
> better, well...

Actually, according to several reports from Google, Microsoft, etc.,
it _is_ the biggest problem (~70%) when talking about security bugs.

So it is a good bet that it will translate into "better" programs, at
least on that axis, unless it is showed that removing UB somehow
increases the rate of other errors.

As for functionality problems, there are several ways to improve upon
C too, though it is harder to show data on that.

Cheers,
Miguel

2022-02-28 12:31:32

by David Laight

[permalink] [raw]
Subject: RE: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

From: Linus Torvalds
> Sent: 27 February 2022 21:05
...
> And then the C standards people decided that "because our job isn't to
> describe all the architectural issues you can hit, we'll call it
> undefined, and in the process let compiler people intentionally break
> it".
>
> THAT is a problem.

I'm waiting for them to decide that memset(ptr, 0, len) of
any structure that contains a pointer is UB (because a NULL
pointer need not be the all zero bit pattern) so decide
to discard the call completely (or some such).

Non-zero NULL pointers is the only reason arithmetic on NULL
pointers isn't valid.

Or maybe that character range tests are UB because '0' to '9'
don't have to be adjacent - they are even adjacent in EBCDIC.

Some of the 'strict aliasing' bits are actually useful since
they let the compiler reorder reads and writes.
But the definition is brain-dead.
Sometimes it would be nice to have byte writes reordered,
but even using int:8 doesn't work.

I have never worked out what 'restrict' actually does,
in any places I've tried it did nothing.
Although I may have been hoping it would still help when
the function got inlined.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-02-28 18:01:54

by Martin Uecker

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

Am Sonntag, den 27.02.2022, 19:12 +0100 schrieb Miguel Ojeda:
> On Sat, Feb 26, 2022 at 3:43 AM Martin Uecker <[email protected]> wrote:
> > Roughly the same group of people / companies that
> > write the compilers also control what goes into the
> > standard. They then like to point to the standard
>
> Indeed, at least to a substantial degree.
>
> > For signed overflow, I am not entirely sure what the
> > right choice is. Wrapping for signed overflow also seems
> > dangerous. I use UBsan to find such issues in my code, and
> > this would not really work if signed overflow was defined
> > to wrap.
>
> UBsan and similar tooling may still be used to find whatever behavior
> one wants, whether defined or not. UBSan already has non-UB checks.

Technically, this is true but not really in practice. If signed
overflow would be defined to wrap, then code would start to
rely on it and detecting it becomes useless because there are
too many false positives. In your own small controlled code
base it could work though.

Martin

2022-02-28 18:04:53

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Mon, Feb 28, 2022 at 8:08 AM Martin Uecker <[email protected]> wrote:
>
> Technically, this is true but not really in practice. If signed
> overflow would be defined to wrap, then code would start to
> rely on it and detecting it becomes useless because there are
> too many false positives. In your own small controlled code
> base it could work though.

Either code is written with signed overflow in mind, or not. That is
what actually matters for detection, not whether it is UB in the
standard or not.

If a project starts relying on overflow wrapping, then they are taking
the same stance as projects that already rely on `-fwrapv`. That is
their choice, same as today.

But making it non-UB in the standard does not force a project to
consider it "not an error", which is what actually matters for being
able to use UBSan effectively or not.

In fact, if you are just concerned about detectability or `-fwrapv`
being the wrong default, there is still a way out without keeping it
UB: we could consider it an error (thus people is not encouraged to
rely on it), yet not UB. That is what Rust does, and why I suggested
the past exploring the move of some existing UB in C into an
"erroneous behavior, yet defined" area.

And, for the cherry on top, if users had a way to write exactly what
they mean (per operation or per type), then we can flag exactly the
cases that are not intentional, and users can still use unchecked
operations for performance sensitive code, etc.

Cheers,
Miguel

2022-03-02 14:10:51

by Martin Uecker

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

Am Dienstag, den 01.03.2022, 12:26 -0800 schrieb Linus Torvalds:
> On Mon, Feb 28, 2022 at 5:50 AM Miguel Ojeda
> <[email protected]> wrote:
> > But making it non-UB in the standard does not force a project to
> > consider it "not an error", which is what actually matters for being
> > able to use UBSan effectively or not.
>
> Absolutely.
>
> I think people should treat UBsan and friends a bit like "runtime lint".
>
> "lint" traditionally doesn't necessarily check for just *incorrect* C.
>
> It checks for things that can be confusing to humans, even if they are
> 100% completely conforming standard C.
>
> Classic example: indentation. Having the wrong indentation is not in
> any shape of form "undefined behavior" from a C standpoint, but it
> sure is something that makes sense checking for anyway.

You can automatically re-indent code form
other sources without breaking it. Assume you
have code that relis on signed integer wrapping,
but you want to use UBSan to screen for possible
signed arithmetic errors and/or have it trap
in production to protect against exploits. You
would then have to carefully analyze each
individual case of signed arithmetic whether
it makes sense, and then somehow add an
annotation that it is actually ok (or rewrite
it which introduces new risks). This does not
seem comparable to indentation at all.

On the other hand, if you have a self-contained
code base and like wrapping signed integer, you
can now use a compiler flag and also get what
you want.

So I am still not yet convinced that the
standard was wrong making it undefined.

Whether it is wise for compilers to use it
aggressively for optimization is a different
question...


Martin

2022-03-02 17:10:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 03/13] usb: remove the usage of the list iterator after the loop

On Mon, Feb 28, 2022 at 5:50 AM Miguel Ojeda
<[email protected]> wrote:
>
> But making it non-UB in the standard does not force a project to
> consider it "not an error", which is what actually matters for being
> able to use UBSan effectively or not.

Absolutely.

I think people should treat UBsan and friends a bit like "runtime lint".

"lint" traditionally doesn't necessarily check for just *incorrect* C.

It checks for things that can be confusing to humans, even if they are
100% completely conforming standard C.

Classic example: indentation. Having the wrong indentation is not in
any shape of form "undefined behavior" from a C standpoint, but it
sure is something that makes sense checking for anyway.

I think "integer overflow" should be considered the exact same thing.
It should *not* be treated as "undefined behavior", and it should not
give the compiler the option to generate code that doesn't match what
the programmer wrote.

But having a checking tool that says "This looks wrong - you just had
an integer overflow"? THAT makes 100% sense.

The C standard rules "undefined behavior" really is a problem in the standard.

Linus