2023-09-15 12:30:39

by Alexey Dobriyan

[permalink] [raw]
Subject: Buggy __free(kfree) usage pattern already in tree

__free() got some usage and some of the usage is buggy:

832 static struct fwnode_handle *
833 gpio_sim_make_bank_swnode(struct gpio_sim_bank *bank,
834 struct fwnode_handle *parent)
835 {
838 char **line_names __free(kfree) = NULL;
// returns NULL or ERR_PTR(-E)
848 line_names = gpio_sim_make_line_names(bank, &line_names_size);
849 if (IS_ERR(line_names))
850 return ERR_CAST(line_names);


This pattern will result in calling kfree() on error value.
And there are no compiler or sparse checking these things.

This test module demonstrates the landmine:

[ 812.981089] ------------[ cut here ]------------
[ 812.981597] WARNING: CPU: 0 PID: 1326 at mm/slab_common.c:991 free_large_kmalloc+0x50/0x80
[ 813.013266] ---[ end trace 0000000000000000 ]---
[ 813.013800] object pointer: 0xfffffffffffffff4

#include <linux/module.h>
#include <linux/slab.h>
#include <linux/cleanup.h>

struct S {
int x;
};

static struct S* f(void)
{
struct S* s = kmalloc(sizeof(struct S), GFP_KERNEL);
s = NULL;
return s ?: ERR_PTR(-ENOMEM);
}

static int __init xxx_module_init(void)
{
struct S *s __free(kfree) = NULL;
s = f();
if (IS_ERR(s)) {
return PTR_ERR(s);
}
return 0;
}

static void __exit xxx_module_exit(void)
{
}
module_init(xxx_module_init);
module_exit(xxx_module_exit);
MODULE_LICENSE("GPL");


2023-09-15 17:13:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 02:56, Alexey Dobriyan <[email protected]> wrote:
>
> __free() got some usage and some of the usage is buggy:

Yeah, that's sad.

I think the '__free(kfree)' thing should *only* be used in the form

struct obj *p __free(kfree) = kmalloc(...);

which is what our docs mention. Anything else is simply buggy.

But how do we *notice* this?

I do want to stress how I was unhappy about this conversion to begin with

https://lore.kernel.org/lkml/CAHk-=wigZt6kVkY0HU1j_LJ5H1KzwPiYnwwk6CbqXqT=sGenjg@mail.gmail.com/

but I reacted to the wrong issue.

This stuff needs to be done *way* more carefully.

Linus

2023-09-15 20:37:48

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 21:06, Linus Torvalds
<[email protected]> wrote:
>
> On Fri, 15 Sept 2023 at 10:22, Bartosz Golaszewski
> <[email protected]> wrote:
> >
> > IMO this feature has much more potential at fixing existing memory
> > leaks than introducing new ones. I agree, I should have been more
> > careful, but I wouldn't exaggerate the issue. It's a bug, I sent a
> > fix, it'll be fine in a few days. I hope it won't be seen as an
> > argument against __free(). It just needs some time to mature.
>
> Honestly, I think your "fix" is still wrong.
>
> It may *work*, but it's against the whole spirit of having an
> allocation paired with the "this is how you free it".
>
> Your model of is fundamentally fragile, and honestly, it's disgusting.
>
> The fact that you literally have
>
> char ***line_names
>
> as an argument should have made you wonder. Yes, we have triple
> indirect pointers in some other parts of the tree, but it sure isn't a
> "feature".
>
> The thing is, your cleanup function should mirror your allocation
> function. It didn't, and it caused a bug.
>
> And it STILL DOES NOT, even with your change.
>
> So I claim you are completely mis-using the whole __free thing. What
> you are doing is simply WRONG.
>
> And unless you can do it right, I will revert that change of yours to
> mis-use the cleanup functions, because I do not want anybody else to
> look at your code and get all the wrong ideas.
>
> Seriously.
>
> So look at your code, and DO IT RIGHT. Don't try to claim that
> "kfree()" is the cleanup function for gpio_sim_make_line_names().
> Because it really isn't. One free's a random pointer. Another returns
> a complex data structure *and* a count. They aren't inverses.
>
> I don't care ONE WHIT if you have learnt to use these kinds of things
> from GLib/GObject, and if that kind of disgusting behavior is ok
> there.
>
> It's not going to fly in the kernel.
>
> So your pattern needs to be something like this:
>
> struct X *p __free(freeXYZ) = allocXYZ();
>
> and ABSOLUTELY NOTHING ELSE. So if you use __free(kfree), it looks like
>
> struct obj *p __free(kfree) = kmalloc(...);
>
> and not some different variation of it.
>
> And if you want to do something more complex, we literally have that
> "CLASS()" abstraction to make the above pattern particularly easy to
> use. Use it.
>
> But don't abuse the very special 'kmalloc/kfree' class that we have as
> an example. That's for kmalloc/kfree pairs, not for your "char
> ***line_names" thing.
>
> Now, Just to give you a very concrete example, here are two TOTALLY
> UNTESTED patches.
>
> I wrote two, because there's two ways to fix this properly as per
> above, and use those functions properly.
>
> The *SANE* way is to just re-organize the code to count things
> separately, and then you can allocate it properly with a sane
>
> char **line_names __free(kfree) = kcalloc(lines,
> sizeof(*line_names), GFP_KERNEL);
>
> and not have that crazy "count and fill and return both the count and
> the lines" model at all. The above pairs the constructor and
> destructor correctly and clearly.
>
> So that's the first "maybe-sane.diff" version. It may be untested,
> it's probably still buggy due to that, but it is what I *think* you
> should model the real fix around.
>
> The second patch is WAY overkill, and actually creates a "class" for
> this all, and keeps the old broken "count and fill in one go", and
> returns that *one* value that is just the class, and has a destructor
> for that class etc etc.
>
> It's completely broken and ridiculously over-complicated for something
> like this, but it's trying to document that way of doing things. For
> other things that aren't just one-offs, that whole CLASS() model may
> be the right one.
>
> Either of these patches *might* work, but honestly, both of them are
> likely broken. The second one in particular is almost certainly buggy
> just because it's such a crazy overkill solution, but partly *because*
> of how crazy overkill it is, I think it might be a useful example of
> what *can* be done.
>
> Again: UNTESTED. They both build for me, but that doesn't say much.
>
> Linus

Understood. I'll go with a modified version of maybe-sane. I'll send a
v2 tomorrow and make sure to Cc you.

WRT the silly diff: we have a bunch of helpers centered around string
arrays in the kernel. I think it would make sense to package a string
array into a class because right now we have functions like
kfree_strarray() which takes the pointer to the array AND the number
of strings to free. So it may not be that silly in the end. But that's
a different story.

Bartosz

2023-09-15 22:00:23

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 19:04, Linus Torvalds
<[email protected]> wrote:
>
> On Fri, 15 Sept 2023 at 02:56, Alexey Dobriyan <[email protected]> wrote:
> >
> > __free() got some usage and some of the usage is buggy:
>
> Yeah, that's sad.
>
> I think the '__free(kfree)' thing should *only* be used in the form
>
> struct obj *p __free(kfree) = kmalloc(...);
>
> which is what our docs mention. Anything else is simply buggy.
>
> But how do we *notice* this?
>
> I do want to stress how I was unhappy about this conversion to begin with
>
> https://lore.kernel.org/lkml/CAHk-=wigZt6kVkY0HU1j_LJ5H1KzwPiYnwwk6CbqXqT=sGenjg@mail.gmail.com/
>
> but I reacted to the wrong issue.
>
> This stuff needs to be done *way* more carefully.
>
> Linus

This is why I started with a *testing* driver. It's got "simulator" in
the name for a reason. It doesn't deal with real HW and is mostly run
in VMs anyway. Few people even build it at all so it makes for good
testing grounds for experimental features.

IMO this feature has much more potential at fixing existing memory
leaks than introducing new ones. I agree, I should have been more
careful, but I wouldn't exaggerate the issue. It's a bug, I sent a
fix, it'll be fine in a few days. I hope it won't be seen as an
argument against __free(). It just needs some time to mature.

Bartosz

2023-09-15 22:23:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, Sep 15, 2023 at 01:40:25PM -0700, Linus Torvalds wrote:

> Not because I think it's necessarily any kind of final rule, but
> because I think our whole cleanup thing is new enough that I think
> we're better off being a bit inflexible, and having a syntax where a
> simple "grep" ends up showing pretty much exactly what is going on wrt
> the pairing.

So in the perf-event conversion patches I do have this:

struct task_struct *task __free(put_task) = NULL;

...

if (pid != -1) {
task = find_lively_task_by_vpid(pid);
if (!task)
return -ESRCH;
}

...

pattern. The having of task is fully optional in the code-flow.

I suppose I can try and rewrite that a little something like:

...

struct task_struct *task __free(put_task) =
find_lively_task_by_vpid(pid); /* ensure pid==-1 returns NULL */

if (!task && pid > 0)
return -ESRCH;

...


But a little later in that same function I then have:

do {
struct rw_semaphore *exec_update_lock __free(up_read) = NULL;
if (task) {
err = down_read_interruptible(&task->signal->exec_update_lock);
if (err)
return err;

exec_update_lock = &task->signal->exec_update_lock;

if (!perf_check_permissions(&attr, task))
return -EACCESS;
}

... stuff serialized against exec *if* this is a task event ...

} while (0);


And that might be a little harder to 'fix'.


I suppose I'm saying that when thing truly are conditional, this is a
useful pattern, but avoid where reasonably possible.

2023-09-15 22:28:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 10:22, Bartosz Golaszewski
<[email protected]> wrote:
>
> IMO this feature has much more potential at fixing existing memory
> leaks than introducing new ones. I agree, I should have been more
> careful, but I wouldn't exaggerate the issue. It's a bug, I sent a
> fix, it'll be fine in a few days. I hope it won't be seen as an
> argument against __free(). It just needs some time to mature.

Honestly, I think your "fix" is still wrong.

It may *work*, but it's against the whole spirit of having an
allocation paired with the "this is how you free it".

Your model of is fundamentally fragile, and honestly, it's disgusting.

The fact that you literally have

char ***line_names

as an argument should have made you wonder. Yes, we have triple
indirect pointers in some other parts of the tree, but it sure isn't a
"feature".

The thing is, your cleanup function should mirror your allocation
function. It didn't, and it caused a bug.

And it STILL DOES NOT, even with your change.

So I claim you are completely mis-using the whole __free thing. What
you are doing is simply WRONG.

And unless you can do it right, I will revert that change of yours to
mis-use the cleanup functions, because I do not want anybody else to
look at your code and get all the wrong ideas.

Seriously.

So look at your code, and DO IT RIGHT. Don't try to claim that
"kfree()" is the cleanup function for gpio_sim_make_line_names().
Because it really isn't. One free's a random pointer. Another returns
a complex data structure *and* a count. They aren't inverses.

I don't care ONE WHIT if you have learnt to use these kinds of things
from GLib/GObject, and if that kind of disgusting behavior is ok
there.

It's not going to fly in the kernel.

So your pattern needs to be something like this:

struct X *p __free(freeXYZ) = allocXYZ();

and ABSOLUTELY NOTHING ELSE. So if you use __free(kfree), it looks like

struct obj *p __free(kfree) = kmalloc(...);

and not some different variation of it.

And if you want to do something more complex, we literally have that
"CLASS()" abstraction to make the above pattern particularly easy to
use. Use it.

But don't abuse the very special 'kmalloc/kfree' class that we have as
an example. That's for kmalloc/kfree pairs, not for your "char
***line_names" thing.

Now, Just to give you a very concrete example, here are two TOTALLY
UNTESTED patches.

I wrote two, because there's two ways to fix this properly as per
above, and use those functions properly.

The *SANE* way is to just re-organize the code to count things
separately, and then you can allocate it properly with a sane

char **line_names __free(kfree) = kcalloc(lines,
sizeof(*line_names), GFP_KERNEL);

and not have that crazy "count and fill and return both the count and
the lines" model at all. The above pairs the constructor and
destructor correctly and clearly.

So that's the first "maybe-sane.diff" version. It may be untested,
it's probably still buggy due to that, but it is what I *think* you
should model the real fix around.

The second patch is WAY overkill, and actually creates a "class" for
this all, and keeps the old broken "count and fill in one go", and
returns that *one* value that is just the class, and has a destructor
for that class etc etc.

It's completely broken and ridiculously over-complicated for something
like this, but it's trying to document that way of doing things. For
other things that aren't just one-offs, that whole CLASS() model may
be the right one.

Either of these patches *might* work, but honestly, both of them are
likely broken. The second one in particular is almost certainly buggy
just because it's such a crazy overkill solution, but partly *because*
of how crazy overkill it is, I think it might be a useful example of
what *can* be done.

Again: UNTESTED. They both build for me, but that doesn't say much.

Linus


Attachments:
maybe-sane.diff (3.08 kB)
silly.diff (3.28 kB)
Download all attachments

2023-09-15 22:41:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 14:18, Peter Zijlstra <[email protected]> wrote:
>
> Hmm, perhaps I can do a class for it and write the thing like:

.. crossed emails.

Yes.

Except I think a full-fledged class thing is overkill for a one-time
use, and I think you really could just write it out as plain "this is
the constructor, this is the cleanup".

Yes, yes, that is obviously what our CLASS() thing *is*, but at least
my personal mental model is that a "class" is for when you really
export it to other uses.

If you just have a one-off, you don't need the class abstraction to
export to anybody else. You just do a one-off "alloc like this, free
like this".

Again, this I think is really just a "mental model" rather than any hard rule.

Linus

2023-09-16 00:46:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, Sep 15, 2023 at 11:08:51PM +0200, Peter Zijlstra wrote:

> But a little later in that same function I then have:
>
> do {
> struct rw_semaphore *exec_update_lock __free(up_read) = NULL;
> if (task) {
> err = down_read_interruptible(&task->signal->exec_update_lock);
> if (err)
> return err;
>
> exec_update_lock = &task->signal->exec_update_lock;
>
> if (!perf_check_permissions(&attr, task))
> return -EACCESS;
> }
>
> ... stuff serialized against exec *if* this is a task event ...
>
> } while (0);
>
>
> And that might be a little harder to 'fix'.

Hmm, perhaps I can do a class for it and write the thing like:

do {
CLASS(cond_exec_update_lock, exec_lock_guard)(task); /* allow task == NULL */
if (task && !exec_lock_guard)
return -EINTR;

if (task && !perf_check_permissions(&attr, task))
return -EACCESS;

... the rest ...

} while (0);

that might be nicer..

2023-09-16 02:10:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 13:04, Bartosz Golaszewski
<[email protected]> wrote:
>
> One more question wrt the __free() coding style.

I don't think we really have much of a coding style yet.

We currently literally have _one_ use of that __free() thing, and it
was problematic.

Which is why I'd like to start off fairly strict, but I'm not sure we
should make it a "coding style" yet.

IOW, my current thinking is "let's always have the constructor and
destructor together", and see how it ends up going.

Not because I think it's necessarily any kind of final rule, but
because I think our whole cleanup thing is new enough that I think
we're better off being a bit inflexible, and having a syntax where a
simple "grep" ends up showing pretty much exactly what is going on wrt
the pairing.

Linus

2023-09-16 02:31:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 14:08, Peter Zijlstra <[email protected]> wrote:
>
> So in the perf-event conversion patches I do have this:
>
> struct task_struct *task __free(put_task) = NULL;
>
> ...
>
> if (pid != -1) {
> task = find_lively_task_by_vpid(pid);
> if (!task)
> return -ESRCH;
> }
>
> ...
>
> pattern. The having of task is fully optional in the code-flow.

Yeah, if you end up having conditional initialization, you can't have
the cleanup declaration in the same place, since it would be in an
inner scope and get free'd immediately.

Still, I think that's likely the exception rather than the rule.

Side note: I hope your code snippets are "something like this" rather
than the real deal.

Because code like this:

> But a little later in that same function I then have:
>
> do {
> struct rw_semaphore *exec_update_lock __free(up_read) = NULL;
> if (task) {
> err = down_read_interruptible(&task->signal->exec_update_lock);
>
> struct rw_semaphore *exec_update_lock __free(up_read) = NULL;

is just garbage. That's not a "freeing" function. That should be "__cleanup()".

The last thing we want is misleading naming, making people think that
you are "freeing" a lock.

Naming is hard, let's not make it worse by making it actively misleading.

And honestly, I think the above is actually a *HORIBLE* argument for
doing that "initialize to NULL, change later". I think the above is
exactly the kind of code that we ABSOLUTELY DO NOT WANT.

You should aim for a nice

struct rw_semaphore *struct rw_semaphore *exec_update_lock
__cleanup(release_exec_update_lock) = get_exec_update_lock(task);

and simply have proper constructors and destructors. It's going to be
much cleaner.

You can literally do something like

static inline void release_exec_update_lock(struct rw_semaphore *sem)
{ if (!IS_ERR_OR_NULL(sem)) up_read(sem); }

static inline void get_exec_update_lock(struct task_struct *tsk)
{
if (!task)
return NULL;
if (down_read_interruptible(&task->signal->exec_update_lock))
return ERR_PTR(-EINTR);
retuin &task->signal->exec_update_lock;
}

and the code will be *much* cleaner, wouldn't you say?

Please use proper constructors and destructors when you do these kinds
of automatic cleanup things. Don't write ad-hoc garbage.

You'll thank me a year from now when the code is actually legible.

Linus

2023-09-16 02:35:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, Sep 15, 2023 at 02:50:48PM -0700, Linus Torvalds wrote:
> On Fri, 15 Sept 2023 at 14:32, Peter Zijlstra <[email protected]> wrote:
> >
> >
> > It also got me thinking about named_guard() for the myriad of
> > conditional locks we have.
> >
> > named_guard(try_mutex, foo_guard)(&foo->lock);
> > if (foo_guard) {
> > // we got the lock, do our thing
> > }
>
> Hmm. It looks ugly to me. I really hate the "named_guard" thing. One
> of the reasons I liked the guard/scoped_guard() macros was because how
> it created _anonymous_ guards, and made it completely unnecessary to
> make up a pointless name.

Yeah, the anonymous thing is very nice.

My ulterior motive for the above was perhaps extending it to allow the
lock argument to be NULL. Which would give us a handy conditional
pattern.

struct rw_semaphore *exec_update_lock = task ? &task->exec_update_lock : NULL;
named_guard(rwsem_read_interruptible, exec_lock_guard)(exec_update_lock);
if (task && !exec_lock_guard)
return -EINTR;

And yes, that is definitely not pretty, but it does provide a fairly
wide array of options.

> If trylock ends up being a common pattern, I think we should strive to
> make it a lot easier to use.
>
> Can we make it act like "scoped_guard()", except the lock function is
> fundamentally conditional?
>
> Call it "cond_guard()", and make the syntax otherwise be the same as
> "scoped_guard()", iow, using a unique ID for the guard name.
>
> So
>
> cond_guard(try_mutex)(&foo->lock) {
> .. this is the "we got the lock" region ..
> }
>
> would I think be a much better syntax.
>
> Could we live with that?

For the trypical use-case that is definitely the more appealing syntax.

Something like:

#define cond_guard(_name, args...) \
for (CLASS(_name, scope)(args), *done = NULL; \
!done && scope; done = (void *)1)

works for the simple cases, but something like: try_spinlock_irqsave
would be a bit of a challenge because that would end up with one of
those structs that is not a pointer and they don't cast to a boolean :/

I think I can make it work, I'll go have a play, but perhaps not now,
it's past midnight ;-)

2023-09-16 02:47:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 14:32, Peter Zijlstra <[email protected]> wrote:
>
>
> It also got me thinking about named_guard() for the myriad of
> conditional locks we have.
>
> named_guard(try_mutex, foo_guard)(&foo->lock);
> if (foo_guard) {
> // we got the lock, do our thing
> }

Hmm. It looks ugly to me. I really hate the "named_guard" thing. One
of the reasons I liked the guard/scoped_guard() macros was because how
it created _anonymous_ guards, and made it completely unnecessary to
make up a pointless name.

If trylock ends up being a common pattern, I think we should strive to
make it a lot easier to use.

Can we make it act like "scoped_guard()", except the lock function is
fundamentally conditional?

Call it "cond_guard()", and make the syntax otherwise be the same as
"scoped_guard()", iow, using a unique ID for the guard name.

So

cond_guard(try_mutex)(&foo->lock) {
.. this is the "we got the lock" region ..
}

would I think be a much better syntax.

Could we live with that?

Linus

2023-09-16 03:42:24

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 21:27, Bartosz Golaszewski
<[email protected]> wrote:
>

[snip!]

>
> Understood. I'll go with a modified version of maybe-sane. I'll send a
> v2 tomorrow and make sure to Cc you.
>

[snip!]

One more question wrt the __free() coding style.

Is the following acceptable:

void foo(void)
{
char *s __free(kfree) = NULL;

do_stuff();
s = kmalloc(42, GFP_KERNEL);
}

Or does it always have to be:

void foo(void)
{
do_stuff();
char *s __free(kfree) = kmalloc(42, GFP_KERNEL);
}

?

I guess it would be useful to get these rules into
Documentation/process/coding-style.rst at some point as we also have
an ongoing discussion about whether scoped guards should always
require curly braces[1] even for single statements.

Bartosz

[1] https://lore.kernel.org/lkml/CAMuHMdVYDSPGP48OXxi-s4GFegfzUu900ASBnRmMo=18UzmCrQ@mail.gmail.com/

2023-09-16 03:47:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, 15 Sept 2023 at 14:50, Linus Torvalds
<[email protected]> wrote:
>
> Call it "cond_guard()", and make the syntax otherwise be the same as
> "scoped_guard()", iow, using a unique ID for the guard name.

I *think* you don't even need a new "cond_guard()" macro.

This might be as simple as having our current scoped_guard() macro
have the conditional

"scope.lock && !done"

instead of just the current "!done".

Then you introduce "trylock" variants that conditionally set "lock = 1".

I dunno. Maybe there's something really obvious I'm missing, but I
really think that gives us the option to literally do just

scoped_guard(try_mutex, ..somelock..) {
...
}

and all it requires is for us to declare a "try_mutex" lock guard class.

Ok, so the current DEFINE_LOCK_GUARD_x() macros clearly set .lock to 1
unconditionally, so there's a small matter of extending the lock side
to be conditional...

"SMOP".

Linus

2023-09-16 04:26:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Fri, Sep 15, 2023 at 02:22:02PM -0700, Linus Torvalds wrote:

> Naming is hard, let's not make it worse by making it actively misleading.

I actually did use the DEFINE_FREE() helper, will go fix. Because yes,
free is not the right word in this case.

> And honestly, I think the above is actually a *HORIBLE* argument for
> doing that "initialize to NULL, change later". I think the above is
> exactly the kind of code that we ABSOLUTELY DO NOT WANT.
>
> You should aim for a nice
>
> struct rw_semaphore *struct rw_semaphore *exec_update_lock
> __cleanup(release_exec_update_lock) = get_exec_update_lock(task);

Ah, that might be nicer still than the class thing I proposed in a
follow up email.

It also got me thinking about named_guard() for the myriad of
conditional locks we have.

named_guard(try_mutex, foo_guard)(&foo->lock);
if (foo_guard) {
// we got the lock, do our thing
}


or

named_guard(interruptible_mutex, foo_guard)(&foo->lock);
if (!foo_guard)
return -EINTR;


Are these sane patterns?

2023-09-19 14:12:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Tue, Sep 19, 2023 at 02:59:54PM +0200, Peter Zijlstra wrote:


> + scoped_guard (mutex_intr, &task->signal->cred_guard_mutex) {
>
> + scoped_guard (task_lock, task) {
> + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> + if (retval)
> + return retval;
> + }
>
> + scoped_guard (write_lock, &tasklist_lock) {
> + if (unlikely(task->exit_state))
> + return -EPERM;
> + if (task->ptrace)
> + return -EPERM;
>
> + task->ptrace = flags;
>
> + ptrace_link(task, current);
> +
> + /* SEIZE doesn't trap tracee on attach */
> + if (!seize)
> + send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
> +
> + ptrace_set_stopped(task);
> +
> + }
> +
> + goto success;
> }
> + return -ERESTARTNOINTR;
>
> +success:
> + /*
> + * We do not bother to change retval or clear JOBCTL_TRAPPING
> + * if wait_on_bit() was interrupted by SIGKILL. The tracer will
> + * not return to user-mode, it will exit and clear this bit in
> + * __ptrace_unlink() if it wasn't already cleared by the tracee;
> + * and until then nobody can ptrace this task.
> + */
> + wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE);
> + proc_ptrace_connector(task, PTRACE_ATTACH);
> +
> + return 0;

This isn't exactly nice..

I tried something like:

scoped_cond_guard (mutex_intr, return -EINTR, &task->signal->cred_guard_mutex) {
...
}

Which I can make work, but then I also tried to capture my other case:

scoped_cond_guard (rwsem_down_intr, if (task) return -EINTR,
task ? &task->signal->exec_guard_mutex : NULL) {

...
}

But I can't get that to work because of that extra if, the not case
doesn't fall through and do the body.

Anyway, I'll poke more..

2023-09-19 14:54:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Sat, Sep 16, 2023 at 12:13:32AM +0200, Peter Zijlstra wrote:

> I think I can make it work, I'll go have a play, but perhaps not now,
> it's past midnight ;-)

So I have been playing about with this a bit and it keeps falling short
of 'nice' :/

What I ended up with is the below. The simple scoped_guard () extension
is useful, albeit slightly awkward (I'll reply with another patch making
use of it). But the basic problem is that it will have to have the form:

scoped_guard (mutex_intr, &task->signal->cred_guard_mutex) {
...
return 0;
}
return -EINTR;

That is; the guard body must terminate such that after the body is the
'fail' case. This is not always convenient.

Additionally, this did not help with my case where I need to hold the
lock conditionally (when I have a task) but need to execute the body
unconditionally. Then the above form just doesn't work.

For that, I ended up with:

cond_guard(name, stmt, args...)

Which can be used like:


do {
cond_guard(rwsem_down_intr, if (task) return -EINTR,
task ? task->signal->exec_update_lock : NULL);

... do stuff that is conditionally guarded by @exec_update_lock ...

} while (0);


I'll continue poking for a bit,.. see if there's something better.


---
Subject: cleanup: Add conditional guard support
From: Peter Zijlstra <[email protected]>
Date: Sun Sep 17 13:22:17 CEST 2023

Adds:

- DEFINE_GUARD_COND() / DEFINE_LOCK_GUARD_1_COND() to extend existing
guards with conditional lock primitives, eg. mutex_trylock(),
mutex_lock_interruptible().

nb. both primitives allow NULL 'locks', which cause the lock to
fail (obviously).

- extends scoped_guard() to not take the body when the the
conditional guard 'fails'. eg.

scoped_guard (mutex_intr, &task->signal_cred_guard_mutex) {
...
}

will only execute the body when the mutex is held.

- provides cond_guard(name, stmt, args...); which extends guard()
to execute @stmt when the lock fails.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
Index: linux-2.6/include/linux/cleanup.h
===================================================================
--- linux-2.6.orig/include/linux/cleanup.h
+++ linux-2.6/include/linux/cleanup.h
@@ -136,14 +136,36 @@ static inline class_##_name##_t class_##
*/

#define DEFINE_GUARD(_name, _type, _lock, _unlock) \
- DEFINE_CLASS(_name, _type, _unlock, ({ _lock; _T; }), _type _T)
+ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
+ static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
+ { return *_T; }
+
+#define DEFINE_GUARD_COND(_name, _ext, _condlock) \
+ EXTEND_CLASS(_name, _ext, \
+ ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
+ class_##_name##_t _T) \
+ static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
+ { return class_##_name##_lock_ptr(_T); }
+
+#define _guard(_name, var) \
+ class_##_name##_t var __cleanup(class_##_name##_destructor) = \
+ class_##_name##_constructor

#define guard(_name) \
- CLASS(_name, __UNIQUE_ID(guard))
+ _guard(_name, __UNIQUE_ID(guard))
+
+#define __guard_ptr(_name) class_##_name##_lock_ptr

#define scoped_guard(_name, args...) \
for (CLASS(_name, scope)(args), \
- *done = NULL; !done; done = (void *)1)
+ *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1)
+
+#define _cond_guard(_name, _var, _stmt, args...) \
+ _guard(_name, _var)(args); \
+ if (!__guard_ptr(_name)(&_var)) _stmt
+
+#define cond_guard(_name, _stmt, args...) \
+ _cond_guard(_name, __UNIQUE_ID(guard), _stmt, args)

/*
* Additional helper macros for generating lock guards with types, either for
@@ -173,6 +195,11 @@ typedef struct { \
static inline void class_##_name##_destructor(class_##_name##_t *_T) \
{ \
if (_T->lock) { _unlock; } \
+} \
+ \
+static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \
+{ \
+ return _T->lock; \
}


@@ -201,4 +228,14 @@ __DEFINE_LOCK_GUARD_1(_name, _type, _loc
__DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__) \
__DEFINE_LOCK_GUARD_0(_name, _lock)

+#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \
+ EXTEND_CLASS(_name, _ext, \
+ ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
+ if (_T->lock && !(_condlock)) _T->lock = NULL; \
+ _t; }), \
+ typeof_member(class_##_name##_t, lock) l) \
+ static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
+ { return class_##_name##_lock_ptr(_T); }
+
+
#endif /* __LINUX_GUARDS_H */
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -221,6 +221,7 @@ extern void mutex_unlock(struct mutex *l
extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);

DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
-DEFINE_FREE(mutex, struct mutex *, if (_T) mutex_unlock(_T))
+DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
+DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)

#endif /* __LINUX_MUTEX_H */
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -507,6 +507,8 @@ DEFINE_LOCK_GUARD_1(raw_spinlock, raw_sp
raw_spin_lock(_T->lock),
raw_spin_unlock(_T->lock))

+DEFINE_LOCK_GUARD_1_COND(raw_spinlock, _try, raw_spin_trylock(_T->lock))
+
DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t,
raw_spin_lock_nested(_T->lock, SINGLE_DEPTH_NESTING),
raw_spin_unlock(_T->lock))
@@ -515,23 +517,36 @@ DEFINE_LOCK_GUARD_1(raw_spinlock_irq, ra
raw_spin_lock_irq(_T->lock),
raw_spin_unlock_irq(_T->lock))

+DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock))
+
DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
raw_spin_lock_irqsave(_T->lock, _T->flags),
raw_spin_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)

+DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try,
+ raw_spin_trylock_irqsave(_T->lock, _T->flags))
+
DEFINE_LOCK_GUARD_1(spinlock, spinlock_t,
spin_lock(_T->lock),
spin_unlock(_T->lock))

+DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock))
+
DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
spin_lock_irq(_T->lock),
spin_unlock_irq(_T->lock))

+DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try,
+ spin_trylock_irq(_T->lock))
+
DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
spin_lock_irqsave(_T->lock, _T->flags),
spin_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)

+DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
+ spin_trylock_irqsave(_T->lock, _T->flags))
+
#undef __LINUX_INSIDE_SPINLOCK_H
#endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6/include/linux/rwsem.h
===================================================================
--- linux-2.6.orig/include/linux/rwsem.h
+++ linux-2.6/include/linux/rwsem.h
@@ -203,11 +203,11 @@ extern void up_read(struct rw_semaphore
extern void up_write(struct rw_semaphore *sem);

DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T))
-DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T))
-
-DEFINE_FREE(up_read, struct rw_semaphore *, if (_T) up_read(_T))
-DEFINE_FREE(up_write, struct rw_semaphore *, if (_T) up_write(_T))
+DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T))
+DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0)

+DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T))
+DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T))

/*
* downgrade write lock to read lock

2023-09-19 18:00:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Tue, Sep 19, 2023 at 02:57:52PM +0200, Peter Zijlstra wrote:
> On Sat, Sep 16, 2023 at 12:13:32AM +0200, Peter Zijlstra wrote:
>
> > I think I can make it work, I'll go have a play, but perhaps not now,
> > it's past midnight ;-)
>
> So I have been playing about with this a bit and it keeps falling short
> of 'nice' :/
>
> What I ended up with is the below. The simple scoped_guard () extension
> is useful, albeit slightly awkward (I'll reply with another patch making
> use of it). But the basic problem is that it will have to have the form:

---
Index: linux-2.6/include/linux/sched/task.h
===================================================================
--- linux-2.6.orig/include/linux/sched/task.h
+++ linux-2.6/include/linux/sched/task.h
@@ -226,4 +226,6 @@ static inline void task_unlock(struct ta
spin_unlock(&p->alloc_lock);
}

+DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T))
+
#endif /* _LINUX_SCHED_TASK_H */
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -548,5 +548,31 @@ DEFINE_LOCK_GUARD_1(spinlock_irqsave, sp
DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
spin_trylock_irqsave(_T->lock, _T->flags))

+DEFINE_LOCK_GUARD_1(read_lock, rwlock_t,
+ read_lock(_T->lock),
+ read_unlock(_T->lock))
+
+DEFINE_LOCK_GUARD_1(read_lock_irq, rwlock_t,
+ read_lock_irq(_T->lock),
+ read_unlock_irq(_T->lock))
+
+DEFINE_LOCK_GUARD_1(read_lock_irqsave, rwlock_t,
+ read_lock_irqsave(_T->lock, _T->flags),
+ read_unlock_irqrestore(_T->lock, _T->flags),
+ unsigned long flags)
+
+DEFINE_LOCK_GUARD_1(write_lock, rwlock_t,
+ write_lock(_T->lock),
+ write_unlock(_T->lock))
+
+DEFINE_LOCK_GUARD_1(write_lock_irq, rwlock_t,
+ write_lock_irq(_T->lock),
+ write_unlock_irq(_T->lock))
+
+DEFINE_LOCK_GUARD_1(write_lock_irqsave, rwlock_t,
+ write_lock_irqsave(_T->lock, _T->flags),
+ write_unlock_irqrestore(_T->lock, _T->flags),
+ unsigned long flags)
+
#undef __LINUX_INSIDE_SPINLOCK_H
#endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6/kernel/ptrace.c
===================================================================
--- linux-2.6.orig/kernel/ptrace.c
+++ linux-2.6/kernel/ptrace.c
@@ -386,6 +386,34 @@ static int check_ptrace_options(unsigned
return 0;
}

+static inline void ptrace_set_stopped(struct task_struct *task)
+{
+ guard(spinlock)(&task->sighand->siglock);
+
+ /*
+ * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
+ * TRAPPING, and kick it so that it transits to TRACED. TRAPPING
+ * will be cleared if the child completes the transition or any
+ * event which clears the group stop states happens. We'll wait
+ * for the transition to complete before returning from this
+ * function.
+ *
+ * This hides STOPPED -> RUNNING -> TRACED transition from the
+ * attaching thread but a different thread in the same group can
+ * still observe the transient RUNNING state. IOW, if another
+ * thread's WNOHANG wait(2) on the stopped tracee races against
+ * ATTACH, the wait(2) may fail due to the transient RUNNING.
+ *
+ * The following task_is_stopped() test is safe as both transitions
+ * in and out of STOPPED are protected by siglock.
+ */
+ if (task_is_stopped(task) &&
+ task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+ task->jobctl &= ~JOBCTL_STOPPED;
+ signal_wake_up_state(task, __TASK_STOPPED);
+ }
+}
+
static int ptrace_attach(struct task_struct *task, long request,
unsigned long addr,
unsigned long flags)
@@ -393,17 +421,17 @@ static int ptrace_attach(struct task_str
bool seize = (request == PTRACE_SEIZE);
int retval;

- retval = -EIO;
if (seize) {
if (addr != 0)
- goto out;
+ return -EIO;
/*
* This duplicates the check in check_ptrace_options() because
* ptrace_attach() and ptrace_setoptions() have historically
* used different error codes for unknown ptrace options.
*/
if (flags & ~(unsigned long)PTRACE_O_MASK)
- goto out;
+ return -EIO;
+
retval = check_ptrace_options(flags);
if (retval)
return retval;
@@ -414,88 +442,58 @@ static int ptrace_attach(struct task_str

audit_ptrace(task);

- retval = -EPERM;
if (unlikely(task->flags & PF_KTHREAD))
- goto out;
+ return -EPERM;
if (same_thread_group(task, current))
- goto out;
+ return -EPERM;

/*
* Protect exec's credential calculations against our interference;
* SUID, SGID and LSM creds get determined differently
* under ptrace.
*/
- retval = -ERESTARTNOINTR;
- if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
- goto out;
-
- task_lock(task);
- retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
- task_unlock(task);
- if (retval)
- goto unlock_creds;
-
- write_lock_irq(&tasklist_lock);
- retval = -EPERM;
- if (unlikely(task->exit_state))
- goto unlock_tasklist;
- if (task->ptrace)
- goto unlock_tasklist;
-
- task->ptrace = flags;
-
- ptrace_link(task, current);
-
- /* SEIZE doesn't trap tracee on attach */
- if (!seize)
- send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+ scoped_guard (mutex_intr, &task->signal->cred_guard_mutex) {

- spin_lock(&task->sighand->siglock);
+ scoped_guard (task_lock, task) {
+ retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+ if (retval)
+ return retval;
+ }

- /*
- * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
- * TRAPPING, and kick it so that it transits to TRACED. TRAPPING
- * will be cleared if the child completes the transition or any
- * event which clears the group stop states happens. We'll wait
- * for the transition to complete before returning from this
- * function.
- *
- * This hides STOPPED -> RUNNING -> TRACED transition from the
- * attaching thread but a different thread in the same group can
- * still observe the transient RUNNING state. IOW, if another
- * thread's WNOHANG wait(2) on the stopped tracee races against
- * ATTACH, the wait(2) may fail due to the transient RUNNING.
- *
- * The following task_is_stopped() test is safe as both transitions
- * in and out of STOPPED are protected by siglock.
- */
- if (task_is_stopped(task) &&
- task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
- task->jobctl &= ~JOBCTL_STOPPED;
- signal_wake_up_state(task, __TASK_STOPPED);
- }
+ scoped_guard (write_lock, &tasklist_lock) {
+ if (unlikely(task->exit_state))
+ return -EPERM;
+ if (task->ptrace)
+ return -EPERM;

- spin_unlock(&task->sighand->siglock);
+ task->ptrace = flags;

- retval = 0;
-unlock_tasklist:
- write_unlock_irq(&tasklist_lock);
-unlock_creds:
- mutex_unlock(&task->signal->cred_guard_mutex);
-out:
- if (!retval) {
- /*
- * We do not bother to change retval or clear JOBCTL_TRAPPING
- * if wait_on_bit() was interrupted by SIGKILL. The tracer will
- * not return to user-mode, it will exit and clear this bit in
- * __ptrace_unlink() if it wasn't already cleared by the tracee;
- * and until then nobody can ptrace this task.
- */
- wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE);
- proc_ptrace_connector(task, PTRACE_ATTACH);
+ ptrace_link(task, current);
+
+ /* SEIZE doesn't trap tracee on attach */
+ if (!seize)
+ send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+
+ ptrace_set_stopped(task);
+
+ }
+
+ goto success;
}
+ return -ERESTARTNOINTR;

- return retval;
+success:
+ /*
+ * We do not bother to change retval or clear JOBCTL_TRAPPING
+ * if wait_on_bit() was interrupted by SIGKILL. The tracer will
+ * not return to user-mode, it will exit and clear this bit in
+ * __ptrace_unlink() if it wasn't already cleared by the tracee;
+ * and until then nobody can ptrace this task.
+ */
+ wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE);
+ proc_ptrace_connector(task, PTRACE_ATTACH);
+
+ return 0;
}

/**

2023-09-20 04:41:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy __free(kfree) usage pattern already in tree

On Tue, Sep 19, 2023 at 03:10:38PM +0200, Peter Zijlstra wrote:

> This isn't exactly nice..
>
> I tried something like:
>
> scoped_cond_guard (mutex_intr, return -EINTR, &task->signal->cred_guard_mutex) {
> ...
> }
>
> Which I can make work, but then I also tried to capture my other case:
>
> scoped_cond_guard (rwsem_down_intr, if (task) return -EINTR,
> task ? &task->signal->exec_guard_mutex : NULL) {
>
> ...
> }
>
> But I can't get that to work because of that extra if, the not case
> doesn't fall through and do the body.
>
> Anyway, I'll poke more..


#define scoped_cond_guard(_name, _label, args...) \
for (CLASS(_name, scope)(args), \
*done = NULL; !done; done = (void *)1) \
if (!__guard_ptr(_name)(&scope)) goto _label; \
else

Allows one to write:

scoped_cond_guard (rwsem_down_intr, no_lock,
task ? &task->signal->exec_guard_mutex : NULL) {

if (0) {
no_lock:
if (task)
return -EINTR;
}

... block that holds exec_guard_mutex if task ...
}

Still not exactly pretty, but perhaps better than before...

2023-09-20 11:34:42

by David Laight

[permalink] [raw]
Subject: RE: Buggy __free(kfree) usage pattern already in tree

If this stuff is so hard to get right for non-trivial cases
perhaps it should all be ripped out?

The trivial cases are pretty easy to eve-ball check and
static analysis tools do a reasonable job.

Maybe I'm 'old-school' but I'd much rather see explicit
unlock() and free() than have 'some compiler magic' do
it for you.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)