2022-02-18 00:01:50

by Jakob Koschel

[permalink] [raw]
Subject: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

It is unsafe to assume that tmp != mdev can only evaluate to false
if the break within the list iterator is hit.

When the break is not hit, tmp is set to an address derived from the
head element. If mdev would match with that value of tmp it would allow
continuing beyond the safety check even if mdev was never found within
the list

Signed-off-by: Jakob Koschel <[email protected]>
---
drivers/vfio/mdev/mdev_core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b314101237fe..e646ba5036f4 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -339,14 +339,17 @@ int mdev_device_remove(struct mdev_device *mdev)
{
struct mdev_device *tmp;
struct mdev_parent *parent = mdev->type->parent;
+ bool found = false;

mutex_lock(&mdev_list_lock);
list_for_each_entry(tmp, &mdev_list, next) {
- if (tmp == mdev)
+ if (tmp == mdev) {
+ found = true;
break;
+ }
}

- if (tmp != mdev) {
+ if (!found) {
mutex_unlock(&mdev_list_lock);
return -ENODEV;
}
--
2.25.1


2022-02-18 15:19:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Thu, Feb 17, 2022 at 07:48:20PM +0100, Jakob Koschel wrote:
> It is unsafe to assume that tmp != mdev can only evaluate to false
> if the break within the list iterator is hit.
>
> When the break is not hit, tmp is set to an address derived from the
> head element. If mdev would match with that value of tmp it would allow
> continuing beyond the safety check even if mdev was never found within
> the list

I think due to other construction this is not actually possible, but I
guess it is technically correct

This seems like just a straight up style fix with nothing to do with
speculative execution though. Why not just send it as a proper patch?

Jason

2022-02-23 20:55:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 12:19 PM Rasmus Villemoes
<[email protected]> wrote:
>
> I have often wished that the iterator macros would consistently set the
> loop variable to NULL upon reaching the end.

I really think the rule should be that to a 99% approximation, we
should strive only ever use the iterated-upon value *inside* the loop.

No, that's now how we do it now. But I think the "break out and do the
work outside the loop" case is kind of broken anyway. It makes you
test the condition twice - and while a compiler might be smart enough
to optimize the second test away, it's still just plain ugly.

Linus

2022-02-24 00:45:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 12:15 PM Jakob <[email protected]> wrote:
>
> in such a case you would still have to set the iterator value to
> NULL when reaching the terminating condition or am I missing something?

No.

Make the rule be "you never use the iterator outside the loop".

IOW, the code sequence is

some_struct *ptr, *iter;

ptr = NULL;
list_for_each_entry(iter, ...) {
if (iter_matches_condition(iter)) {
ptr = iter;
break;
}
}

.. never use 'iter' here - you use 'ptr' and check it for NULL ..

See? Same number of variables as using a separate 'bool found' flag,
but simpler code, and it matches the rule of 'don't use iter outside
the loop'.

This is how you'd have to do it anyway if we start using a C99 style
'declare iter _in_ the loop' model.

And as mentioned, it actually tends to lead to better code, since the
code outside the loop only has one variable live, not two.

Of course, compilers can do a lot of optimizations, so a 'found'
variable can be made to generate good code too - if the compiler just
tracks it and notices, and turns the 'break' into a 'goto found', and
the fallthrough into the 'goto not_found'.

So 'better code generation' is debatable, but even if the compiler can
do as good a job with a separate 'bool' variable and some cleverness,
I think we should strive for code where we make it easy for the
compiler to DTRT - and where the generated code is easier to match up
with what we wrote.

Linus

2022-02-24 00:45:27

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On 23/02/2022 20.31, Linus Torvalds wrote:
> On Wed, Feb 23, 2022 at 11:12 AM Jason Gunthorpe <[email protected]> wrote:
>>
>> Yes, this is what I had put together as well about this patch, and I
>> think it is OK as-is. In this case the list head is in the .bss of a
>> module so I don't think it is very likely that the type confused
>> container_of() will alias a kalloc result, but it is certainly
>> technically wrong as-is.
>
> I think that the pattern we should strive to use is not top use a
> 'bool' with the
>
> - initialize to false, and then in loop: do 'found = true;' if found
>
> - use the iterator variable if 'found'.
>
> but instead strive to
>
> - either only use the iterator variable inside the loop
>
> - if you want to use it after the loop, have a externally visible
> separate pointer initialized to NULL, and set it to the iterator
> variable inside the loop
>
> IOW, instead of having a variable that is a 'bool', just make that
> variable _be_ the pointer you want. It's clearer, and it avoids using
> the iterator variable outside the loop.
>
> It also is likely to generate better code, because there are fewer
> live variables - outside the loop now only uses that one variable,
> rather than using the 'bool' variable _and_ the iterator variable.

I have often wished that the iterator macros would consistently set the
loop variable to NULL upon reaching the end. It could be done by
changing the stop condition from

i != HEAD

to

i != HEAD || (i = NULL, 0)

[the comma operator mostly for clarity, we want the side effect of the
assignment but the whole RHS to still evaluate to false, which "i=NULL"
by itself of course would].

In the vast majority of cases where the iterator variable is not used
after the loop, the compiler should see through it and just drop the
dead assignment. And it would make the test for "did we break out early
finding what we looked for, or did we iterate till the end" quite
natural, without any auxiliary bool or extra 'struct foo*' variable. But
it will probably break too much existing code.

Rasmus

2022-02-24 00:56:12

by Jakob Koschel

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop



> On 23. Feb 2022, at 21:22, Linus Torvalds <[email protected]> wrote:
>
> On Wed, Feb 23, 2022 at 12:15 PM Jakob <[email protected]> wrote:
>>
>> in such a case you would still have to set the iterator value to
>> NULL when reaching the terminating condition or am I missing something?
>
> No.
>
> Make the rule be "you never use the iterator outside the loop".
>
> IOW, the code sequence is
>
> some_struct *ptr, *iter;

with C99 iter would be defined within the loop instead right?

>
> ptr = NULL;
> list_for_each_entry(iter, ...) {
> if (iter_matches_condition(iter)) {
> ptr = iter;
> break;
> }
> }
>
> .. never use 'iter' here - you use 'ptr' and check it for NULL ..
>
> See? Same number of variables as using a separate 'bool found' flag,
> but simpler code, and it matches the rule of 'don't use iter outside
> the loop'.

ah yes this does make sense. I missed the part of using a separate
'ptr' variable. Thanks for clarifying.
I think this is a great idea.

There are cases where pos->member is used (the only legitimate way to
use it right now). I suppose those turn into something like this
(this example is inspired by dev_add_offload() (net/core/gro.c:38)):

some_struct *ptr, *iter;
list_head *list_ptr;

ptr = NULL;
list_for_each_entry(iter, head, list) {
if (iter_matches_condition(iter)) {
ptr = iter;
break;
}
}


if (ptr)
list_ptr = head->prev;
else
list_ptr = iter->list.prev;
list_add(..., list_ptr);

before it was simply
list_add(..., iter->list.prev);


The other possibility I suppose would be:

if (!ptr)
ptr = container_of(head, typeof(*ptr), list)
list_add(..., ptr->list.prev);

which leaves you with the same type confusion as before, being far from
ideal.

> This is how you'd have to do it anyway if we start using a C99 style
> 'declare iter _in_ the loop' model.
>
> And as mentioned, it actually tends to lead to better code, since the
> code outside the loop only has one variable live, not two.
>
> Of course, compilers can do a lot of optimizations, so a 'found'
> variable can be made to generate good code too - if the compiler just
> tracks it and notices, and turns the 'break' into a 'goto found', and
> the fallthrough into the 'goto not_found'.
>
> So 'better code generation' is debatable, but even if the compiler can
> do as good a job with a separate 'bool' variable and some cleverness,
> I think we should strive for code where we make it easy for the
> compiler to DTRT - and where the generated code is easier to match up
> with what we wrote.
>
> Linus

If there is interest, I'm happy to send a new patch set once the fixes are clear.

Jakob

2022-02-24 01:19:11

by Jakob Koschel

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop



> On 18. Feb 2022, at 16:12, Jason Gunthorpe <[email protected]> wrote:
>
> On Thu, Feb 17, 2022 at 07:48:20PM +0100, Jakob Koschel wrote:
>> It is unsafe to assume that tmp != mdev can only evaluate to false
>> if the break within the list iterator is hit.
>>
>> When the break is not hit, tmp is set to an address derived from the
>> head element. If mdev would match with that value of tmp it would allow
>> continuing beyond the safety check even if mdev was never found within
>> the list
>
> I think due to other construction this is not actually possible, but I
> guess it is technically correct
>
> This seems like just a straight up style fix with nothing to do with
> speculative execution though. Why not just send it as a proper patch?
>
> Jason

Thank you for your feedback.

I've raised some confusion here, I'm sorry about that.
The idea was to change list_for_each_entry() to set 'pos' to NULL
when the list terminates to avoid invalid usage in speculation.

This will break this code and I therefore included the suggested change
in this RFC. This RFC was not intended to be merged as is.

However, in this example, 'tmp' will be a out-of-bounds pointer
if the loop did finish without hitting the break, so the check past the
loop *could* match 'mdev' even though no break was ever met.

I've now realized that this is probably not realistic iff mdev always
points to a valid struct mdev_device.
(It's a slightly different scenario on PATCH 03/13).

Jakob

2022-02-24 01:20:41

by Jakob Koschel

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop



> On 23. Feb 2022, at 20:31, Linus Torvalds <[email protected]> wrote:
>
> On Wed, Feb 23, 2022 at 11:12 AM Jason Gunthorpe <[email protected]> wrote:
>>
>> Yes, this is what I had put together as well about this patch, and I
>> think it is OK as-is. In this case the list head is in the .bss of a
>> module so I don't think it is very likely that the type confused
>> container_of() will alias a kalloc result, but it is certainly
>> technically wrong as-is.
>
> I think that the pattern we should strive to use is not top use a
> 'bool' with the
>
> - initialize to false, and then in loop: do 'found = true;' if found
>
> - use the iterator variable if 'found'.
>
> but instead strive to
>
> - either only use the iterator variable inside the loop
>
> - if you want to use it after the loop, have a externally visible
> separate pointer initialized to NULL, and set it to the iterator
> variable inside the loop

in such a case you would still have to set the iterator value to
NULL when reaching the terminating condition or am I missing something?

Since the iterator value will always be set to *something* when
calling list_for_each_entry().

>
> IOW, instead of having a variable that is a 'bool', just make that
> variable _be_ the pointer you want. It's clearer, and it avoids using
> the iterator variable outside the loop.

I completely agree and was intending to do it in such a way.
However I couldn't find a way to do that without breaking the kernel
between commits.

Either you need to first set the iterator to NULL when terminating
which breaks the current code here

or

patch the code location first to do if(iterator), which does not
work with the current list_for_each_entry().


>
> It also is likely to generate better code, because there are fewer
> live variables - outside the loop now only uses that one variable,
> rather than using the 'bool' variable _and_ the iterator variable.
>
> Linus

2022-02-24 01:41:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 6:18 AM Jakob <[email protected]> wrote:
>
> However, in this example, 'tmp' will be a out-of-bounds pointer
> if the loop did finish without hitting the break, so the check past the
> loop *could* match 'mdev' even though no break was ever met.

So just as context for others, since I was hit with the same confusion
and didn't see what the relevance was for type speculation, when these
patches seemed to be not about speculation at all.

The background for this is that the list_for_each_entry() will set the
iterator variable (here 'tmp') to be not the actual internal list
pointer, but the pointer to the containing type (which is the whole
'entry' part of the name, of course).

And that is all good and true, but it's true only *WITHIN* that loop.
At the exit condition, it will have hit the 'head' of the list, and
the type that contains the list head is *not* necessarily the same
type as the list entries.

So that's where the type confusion comes from: if you access the list
iterator outside the loop, and it could have fallen off the end of the
loop, the list iterator pointer is now not actually really a valid
pointer of that 'entry' type at all.

And as such, you not only can't dereference it, but you also shouldn't
even compare pointer values - because the pointer arithmetic that was
valid for loop entries is not valid for the HEAD entry that is
embedded in another type. So the pointer arithmetic might have turned
it into a pointer outside the real container of the HEAD, and might
actually match something else.

Now, there are a number of reasons I really dislike the current RFC
patch series, so I'm not claiming the patch is something we should
apply as-is, but I'm trying to clarify why Jakob is doing what he's
doing (because clearly I wasn't the only one taken by surprise by
it).

The reasons I don't like it is:

- patches like these are very random. And very hard to read (and very
easy to re-introduce the bug).

- I think it conflates the non-speculative "use pointer of the wrong
type" issue like in this patch with the speculation

- I'm not even convinced that 'list_for_each_entry()' is that special
wrt speculative type accesses, considering that we have other uses of
doubly linked list *everywhere* - and it can happen in a lot of other
situations anyway, so it all seems to be a bit ad hoc.

but I do think the problem is real.

So elsewhere I suggested that the fix to "you can't use the pointer
outside the loop" be made to literally disallow it (using C99 for-loop
variables seems the cleanest model), and have the compiler refuse to
touch code that tries to use the loop iterator outside.

And that is then entirely separate from the issue of actual
speculative accesses (but honestly, I think that's a "you have to
teach the compiler not to do them" issue, not a "let's randomly change
*one* of our loop walkers).

Linus

2022-02-24 01:44:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 11:06:03AM -0800, Linus Torvalds wrote:

> And as such, you not only can't dereference it, but you also shouldn't
> even compare pointer values - because the pointer arithmetic that was
> valid for loop entries is not valid for the HEAD entry that is
> embedded in another type. So the pointer arithmetic might have turned
> it into a pointer outside the real container of the HEAD, and might
> actually match something else.

Yes, this is what I had put together as well about this patch, and I
think it is OK as-is. In this case the list head is in the .bss of a
module so I don't think it is very likely that the type confused
container_of() will alias a kalloc result, but it is certainly
technically wrong as-is.

> So elsewhere I suggested that the fix to "you can't use the pointer
> outside the loop" be made to literally disallow it (using C99 for-loop
> variables seems the cleanest model), and have the compiler refuse to
> touch code that tries to use the loop iterator outside.

Oh yes, that would be really nice solution.

Jason

2022-02-24 01:44:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 04/13] vfio/mdev: remove the usage of the list iterator after the loop

On Wed, Feb 23, 2022 at 11:12 AM Jason Gunthorpe <[email protected]> wrote:
>
> Yes, this is what I had put together as well about this patch, and I
> think it is OK as-is. In this case the list head is in the .bss of a
> module so I don't think it is very likely that the type confused
> container_of() will alias a kalloc result, but it is certainly
> technically wrong as-is.

I think that the pattern we should strive to use is not top use a
'bool' with the

- initialize to false, and then in loop: do 'found = true;' if found

- use the iterator variable if 'found'.

but instead strive to

- either only use the iterator variable inside the loop

- if you want to use it after the loop, have a externally visible
separate pointer initialized to NULL, and set it to the iterator
variable inside the loop

IOW, instead of having a variable that is a 'bool', just make that
variable _be_ the pointer you want. It's clearer, and it avoids using
the iterator variable outside the loop.

It also is likely to generate better code, because there are fewer
live variables - outside the loop now only uses that one variable,
rather than using the 'bool' variable _and_ the iterator variable.

Linus