2020-04-28 02:58:49

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/12/20 10:14 PM, Linus Torvalds wrote:
> On Sun, Apr 12, 2020 at 12:51 PM Oleg Nesterov <[email protected]> wrote:
>>
>> To be honest, I don't understand it... OK, suppose that the main thread
>> M execs and zap_other_threads() finds a single (and alive) sub-thread T,
>> sig->notify_count = 1.
>>
>> If T is traced, then ->notify_count won't be decremented until the tracer
>> reaps this task, so we have the same problem.
>
> Right you are.
>
> I was hoping to avoid the "move notify_count update", but you're
> right, the threads that do get properly killed will never get to that
> point, so now the live ones that we're waiting for will just hit the
> same issue that the dead ones did.
>
> Goot catch. So the optimistic simplification doesn't work.
>
>>> You do say in that old patch that we can't just share the signal
>>> state, but I wonder how true that is.
>>
>> We can share sighand_struct with TASK_ZOMBIE's. The problem is that
>> we can not unshare ->sighand until they go away, execing thread and
>> zombies must use the same sighand->siglock to serialize the access to
>> ->thread_head/etc.
>
> Yeah, they'll still touch the lock, and maybe look at it, but it's not
> like they'll be changing any state.
>
>> but see above, I don't think this makes any sense.
>
> Yeah, I think your patch is better since my simplification doesn't work.
>

Ping...
was this resolved meanwhile?


Thanks
Bernd.

> Linus
>


2020-04-28 17:09:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Mon, Apr 27, 2020 at 7:56 PM Bernd Edlinger
<[email protected]> wrote:
>
> was this resolved meanwhile?

No. I think the tentative plan is to just apply Oleg's "don't wait for
zombie threads with cred_guard_mutex held" patch, hopefully with that
de_thread() moved into install_exec_creds() (right after the dropping
of the locks).

But since it's arguably a user-level bug, and not a regression by any
means, it's not been exactly urgent. I suspect I'd like Oleg to
perhaps decide to take the patch up again.

Linus

2020-04-28 19:11:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 04/28, Linus Torvalds wrote:
>
> On Mon, Apr 27, 2020 at 7:56 PM Bernd Edlinger
> <[email protected]> wrote:
> >
> > was this resolved meanwhile?
>
> No. I think the tentative plan is to just apply Oleg's "don't wait for
> zombie threads with cred_guard_mutex held" patch, hopefully with that
> de_thread() moved into install_exec_creds() (right after the dropping
> of the locks).

Oops. I can update that old patch but somehow I thought there is a better
plan which I don't yet understand...

And, IIRC, Jan had some ideas how to rework the new creds calculation in
execve paths to avoid the cred_guard_mutex deadlock?

Oleg.

2020-04-28 20:38:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 12:08 PM Oleg Nesterov <[email protected]> wrote:
>
> Oops. I can update that old patch but somehow I thought there is a better
> plan which I don't yet understand...

I don't think any plan survived reality.

Unless we want to do something *really* hacky.. The attached patch is
not meant to be serious.

> And, IIRC, Jan had some ideas how to rework the new creds calculation in
> execve paths to avoid the cred_guard_mutex deadlock?

I'm not sure how you'd do that.

Execve() fundamentally needs to serialize with PTRACE_ATTACH somehow,
since the whole point is that "tsk->ptrace" changes how the
credentials are interpreted.

So PTRACE_ATTACH doesn't really _change_ the credentials, but it very
much changes what execve() will do with them.

But I guess we could do a "if somebody attached to us while we did the
execve(), just repeat the whole thing"

Jann, what was your clever idea? Maybe it got lost in the long thread..

Linus


Attachments:
patch (1.29 kB)

2020-04-28 21:10:18

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 10:36 PM Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 28, 2020 at 12:08 PM Oleg Nesterov <[email protected]> wrote:
> >
> > Oops. I can update that old patch but somehow I thought there is a better
> > plan which I don't yet understand...
>
> I don't think any plan survived reality.
>
> Unless we want to do something *really* hacky.. The attached patch is
> not meant to be serious.
>
> > And, IIRC, Jan had some ideas how to rework the new creds calculation in
> > execve paths to avoid the cred_guard_mutex deadlock?
>
> I'm not sure how you'd do that.
>
> Execve() fundamentally needs to serialize with PTRACE_ATTACH somehow,
> since the whole point is that "tsk->ptrace" changes how the
> credentials are interpreted.
>
> So PTRACE_ATTACH doesn't really _change_ the credentials, but it very
> much changes what execve() will do with them.
>
> But I guess we could do a "if somebody attached to us while we did the
> execve(), just repeat the whole thing"
>
> Jann, what was your clever idea? Maybe it got lost in the long thread..

My clever/horrible/overly-complex idea was basically:

In execve:

- After the point of no return, but before we start waiting for the
other threads to go away, finish calculating our post-execve creds
and stash them somewhere in the task_struct or so.
- Drop the cred_guard_mutex.
- Wait for the other threads to die.
- Take the cred_guard_mutex again.
- Clear out the pointer in the task_struct.
- Finish execve and install the new creds.
- Drop the cred_guard_mutex again.

Then in ptrace_may_access, after taking the cred_guard_mutex, we'd
know that the target task is either outside execve or in the middle of
execve, with old and new credentials known; and then we could say "you
only get to access that task if you're capable relative to *both* its
old and new credentials, since the task currently has both state from
the old executable and from the new one". (Other users that expect to
use cred_guard_mutex to synchronize with execve would also have to be
changed appropriately; e.g. seccomp tsync would have to bail out if
the task turns out to be in execve after the mutex has been acquired.)

So I think we can conceptually fix the deadlock, but it requires a bit
of refactoring. (I have an old branch somewhere in which I tried to
implement this, and where I did a bunch of refactoring around
ptrace_may_access() so that e.g. the LSM hooks for ptrace can be
invoked twice when the target task is in execve, and so that they take
the target's cred* as an argument.)

2020-04-28 21:39:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 2:06 PM Jann Horn <[email protected]> wrote:
>
> In execve:
>
> - After the point of no return, but before we start waiting for the
> other threads to go away, finish calculating our post-execve creds
> and stash them somewhere in the task_struct or so.
> - Drop the cred_guard_mutex.
> - Wait for the other threads to die.
> - Take the cred_guard_mutex again.
> - Clear out the pointer in the task_struct.
> - Finish execve and install the new creds.
> - Drop the cred_guard_mutex again.
>
> Then in ptrace_may_access, after taking the cred_guard_mutex, we'd
> know that the target task is either outside execve or in the middle of
> execve, with old and new credentials known; and then we could say "you
> only get to access that task if you're capable relative to *both* its
> old and new credentials, since the task currently has both state from
> the old executable and from the new one".

That doesn't solve the problem with "check_unsafe_exec()" - you might
miss setting LSM_UNSAFE_PTRACE.

Although maybe that whole function could be moved down (to after you
get the cred_guard_mutex the second time).

Linus

2020-04-28 21:57:51

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 11:37 PM Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 28, 2020 at 2:06 PM Jann Horn <[email protected]> wrote:
> > In execve:
> >
> > - After the point of no return, but before we start waiting for the
> > other threads to go away, finish calculating our post-execve creds
> > and stash them somewhere in the task_struct or so.
> > - Drop the cred_guard_mutex.
> > - Wait for the other threads to die.
> > - Take the cred_guard_mutex again.
> > - Clear out the pointer in the task_struct.
> > - Finish execve and install the new creds.
> > - Drop the cred_guard_mutex again.
> >
> > Then in ptrace_may_access, after taking the cred_guard_mutex, we'd
> > know that the target task is either outside execve or in the middle of
> > execve, with old and new credentials known; and then we could say "you
> > only get to access that task if you're capable relative to *both* its
> > old and new credentials, since the task currently has both state from
> > the old executable and from the new one".
>
> That doesn't solve the problem with "check_unsafe_exec()" - you might
> miss setting LSM_UNSAFE_PTRACE.

You don't need LSM_UNSAFE_PTRACE if the tracer has already passed a
ptrace_may_access() check against the post-execve creds of the target
- that's no different from having done PTRACE_ATTACH after execve is
over.

2020-04-28 22:19:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 2:53 PM Jann Horn <[email protected]> wrote:
>
> You don't need LSM_UNSAFE_PTRACE if the tracer has already passed a
> ptrace_may_access() check against the post-execve creds of the target
> - that's no different from having done PTRACE_ATTACH after execve is
> over.

Hmm. That sounds believable, I guess.

But along these ways, I'm starting to think that we might perhaps skip
the lock entirely.

What if we made the rule instead be:

- we move check_unsafe_exec() down. As far as I can tell, there's no
reason it's that early - the flags it sets aren't actually used until
when we actually do that final set_creds..

- we add a "next cred" pointer to the signal struct (or task struct)

- make the rule be that check_unsafe_exec() checks p->ptrace under
the tasklist_lock (or sighand lock - whatever ends up being most
convenient)

- set "next cred" to be the known next cred there too under the lock.
We call this small locked region the "cred sync point".

- ptrace will check if we have the "in_exec" flag set and have one of
those "next cred" pointers, in which case it checks both the old and
the next credentials.

No cred_guard_mutex at all, instead the rule is that as execve() goes
through that "cred sync point", we have two cases

(a) either ptrace has attached (because task->ptrace is set), and it
does the LSM_UNSAFE_PTRACE dance.

or

(b) it knows that ptrace will check the next creds if it races with execve.

And then after execve has installed the final new creds, it just
clears the "next cred" pointer again, because at that point it knows
that now any subsequent PTRACE_ATTACH will be checking the new creds.

So instead of taking and dropping the cred_guard_mutex, we'd basically
get rid of it entirely.

Yeah, I didn't look at the seccomp case, but I guess the issues will be similar.

Linus

2020-04-28 23:41:41

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 28, 2020 at 2:53 PM Jann Horn <[email protected]> wrote:
> >
> > You don't need LSM_UNSAFE_PTRACE if the tracer has already passed a
> > ptrace_may_access() check against the post-execve creds of the target
> > - that's no different from having done PTRACE_ATTACH after execve is
> > over.
>
> Hmm. That sounds believable, I guess.
>
> But along these ways, I'm starting to think that we might perhaps skip
> the lock entirely.

Just as a note: One effect of that will be that if a process goes into
execve on a setuid binary, but bails out before the point of no
return, a tracer may fail to attach to it during that time window. But
that should be completely fine.

> What if we made the rule instead be:
>
> - we move check_unsafe_exec() down. As far as I can tell, there's no
> reason it's that early - the flags it sets aren't actually used until
> when we actually do that final set_creds..

Right, we should be able to do that stuff quite a bit later than it happens now.

At the moment the final security_bprm_set_creds() seems to happen
before we're calling would_dump(), which computes some of the data
we'll need for access checks... I guess we'll have to split up
security_bprm_set_creds into one hook for "here's another executable
file that's part of the execve chain" and a second hook
("security_bprm_cred_sync()"?) for "now the ->unsafe flags are ready
and we're about to drop the lock again, time to modify the creds if
you want to do that". And then LSMs can make decisions that are
influenced by ->unsafe (fiddling with the creds or rejecting the
execution - I think e.g. selinux_bprm_set_creds() can do both) inside
the "cred sync point".

> - we add a "next cred" pointer to the signal struct (or task struct)
>
> - make the rule be that check_unsafe_exec() checks p->ptrace under
> the tasklist_lock (or sighand lock - whatever ends up being most
> convenient)
>
> - set "next cred" to be the known next cred there too under the lock.
> We call this small locked region the "cred sync point".
>
> - ptrace will check if we have the "in_exec" flag set and have one of
> those "next cred" pointers, in which case it checks both the old and
> the next credentials.
>
> No cred_guard_mutex at all, instead the rule is that as execve() goes
> through that "cred sync point", we have two cases
>
> (a) either ptrace has attached (because task->ptrace is set), and it
> does the LSM_UNSAFE_PTRACE dance.
>
> or
>
> (b) it knows that ptrace will check the next creds if it races with execve.
>
> And then after execve has installed the final new creds, it just
> clears the "next cred" pointer again, because at that point it knows
> that now any subsequent PTRACE_ATTACH will be checking the new creds.
>
> So instead of taking and dropping the cred_guard_mutex, we'd basically
> get rid of it entirely.
>
> Yeah, I didn't look at the seccomp case, but I guess the issues will be similar.

Yeah, seccomp should be able to reject any thread sync if the "next
cred" pointer is set on any of the threads. It should work well as
long as the lock around the "next cred" pointer is at least on the
level of the signal_struct (or broader), so that TSYNC can decide
whether everything's good before starting to iterate over the threads.

2020-04-29 18:00:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
>
> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > - we move check_unsafe_exec() down. As far as I can tell, there's no
> > reason it's that early - the flags it sets aren't actually used until
> > when we actually do that final set_creds..
>
> Right, we should be able to do that stuff quite a bit later than it happens now.

Actually, looking at it, this looks painful for multiple reasons.

The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
when I traced it through, happened much earlier than I thought. Making
things worse, it's done by prepare_binprm(), which also potentially
gets called from random points by the low-level binfmt handlers too.

And we also have that odd "fs->in_exec" flag, which is used by thread
cloning and io_uring, and I'm not sure what the exact semantics are.

I'm _almost_ inclined to say that we should just abort the execve()
entirely if somebody tries to attach in the middle.

IOW, get rid of the locking, and replace it all just with a sequence
count. Make execve() abort if the sequence count has changed between
loading the original creds, and having installed the new creds.

You can ptrace _over_ an execve, and you can ptrace _after_ an
execve(), but trying to attach just as we execve() would just cause
the execve() to fail.

We could maybe make it conditional on the credentials actually having
changed at all (set another flag in bprm_fill_uid()). So it would only
fail for the suid exec case.

Because honestly, trying to ptrace in the middle of a suid execve()
sounds like an attack, not a useful thing.

That sequence count approach would be a much simpler change.

Linus

2020-04-29 18:37:53

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 7:58 PM Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
> >
> > On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > - we move check_unsafe_exec() down. As far as I can tell, there's no
> > > reason it's that early - the flags it sets aren't actually used until
> > > when we actually do that final set_creds..
> >
> > Right, we should be able to do that stuff quite a bit later than it happens now.
>
> Actually, looking at it, this looks painful for multiple reasons.
>
> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> when I traced it through, happened much earlier than I thought. Making
> things worse, it's done by prepare_binprm(), which also potentially
> gets called from random points by the low-level binfmt handlers too.

Yeah, but all of that happens before we actually need to do anything
with the accumulated credential information from the prepare_binprm()
calls. We can probably move the unsafe calculation and a new LSM hook
into flush_old_exec(), right before de_thread().

> And we also have that odd "fs->in_exec" flag, which is used by thread
> cloning and io_uring, and I'm not sure what the exact semantics are.

The idea is to ensure that once we're through check_unsafe_exec() and
have computed our LSM_UNSAFE_* flags, another thread that's still
running must not be able to fork() off a child with CLONE_FS, because
having an fs_struct that's shared with anything other than sibling
threads (which will be killed off) is supposed to only be possible if
LSM_UNSAFE_SHARE is set. So:

If check_unsafe_exec() can match each reference in the refcount
->fs->users with a reference from a sibling thread (iow the fs_struct
is not currently shared with another task), it sets p->fs->in_exec.

If another thread tries to clone(CLONE_FS) while we're in execve(),
copy_fs() will throw -EAGAIN. And if io_uring tries to grab a
reference to the fs_struct with the intent to use it on a kernel
worker thread (which conceptually is kinda similar to the
clone(CLONE_FS) case), that also aborts.

And then at the end of execve(), we clear the ->fs->in_exec flag again.

So this should work fine as long as we ensure that we can't have two
threads from the same process going through execve concurrently. (Or
if we actually want to support that, we could make ->in_exec a counter
instead of a flag, but really, preventing concurrent execve()s from a
multithreaded process seems saner...)

> I'm _almost_ inclined to say that we should just abort the execve()
> entirely if somebody tries to attach in the middle.
>
> IOW, get rid of the locking, and replace it all just with a sequence
> count. Make execve() abort if the sequence count has changed between
> loading the original creds, and having installed the new creds.
>
> You can ptrace _over_ an execve, and you can ptrace _after_ an
> execve(), but trying to attach just as we execve() would just cause
> the execve() to fail.
>
> We could maybe make it conditional on the credentials actually having
> changed at all (set another flag in bprm_fill_uid()). So it would only
> fail for the suid exec case.
>
> Because honestly, trying to ptrace in the middle of a suid execve()
> sounds like an attack, not a useful thing.
>
> That sequence count approach would be a much simpler change.

In that model, what should happen if someone tries to attach to a
process that's in execve(), but after the point of no return in
de_thread()? "Abort" after the point of no return normally means
force_sigsegv(), right?

2020-04-29 19:01:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 11:33 AM Jann Horn <[email protected]> wrote:
>
> > That sequence count approach would be a much simpler change.
>
> In that model, what should happen if someone tries to attach to a
> process that's in execve(), but after the point of no return in
> de_thread()? "Abort" after the point of no return normally means
> force_sigsegv(), right?

It would by definition have to check the sequence number at the end of
install_exec_creds() (where we currently release the
cred_guard_mutex).

And yes, that's after the point of no return, so it would cause the
usual "kill the process".

We could check earlier too (while still able to return errors) and
return -EAGAIN or something, but that wouldn't obviate the need for
that final check, iut would just shrink the window for the "fatal
exec" case.

Linus

2020-04-29 19:25:47

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/29/20 7:58 PM, Linus Torvalds wrote:
> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
>>
>> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> - we move check_unsafe_exec() down. As far as I can tell, there's no
>>> reason it's that early - the flags it sets aren't actually used until
>>> when we actually do that final set_creds..
>>
>> Right, we should be able to do that stuff quite a bit later than it happens now.
>
> Actually, looking at it, this looks painful for multiple reasons.
>
> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> when I traced it through, happened much earlier than I thought. Making
> things worse, it's done by prepare_binprm(), which also potentially
> gets called from random points by the low-level binfmt handlers too.
>
> And we also have that odd "fs->in_exec" flag, which is used by thread
> cloning and io_uring, and I'm not sure what the exact semantics are.
>
> I'm _almost_ inclined to say that we should just abort the execve()
> entirely if somebody tries to attach in the middle.
>
> IOW, get rid of the locking, and replace it all just with a sequence
> count. Make execve() abort if the sequence count has changed between
> loading the original creds, and having installed the new creds.
>
> You can ptrace _over_ an execve, and you can ptrace _after_ an
> execve(), but trying to attach just as we execve() would just cause
> the execve() to fail.
>
> We could maybe make it conditional on the credentials actually having
> changed at all (set another flag in bprm_fill_uid()). So it would only
> fail for the suid exec case.
>
> Because honestly, trying to ptrace in the middle of a suid execve()
> sounds like an attack, not a useful thing.
>

I think the use case where a program attaches and detaches many
processes at a high rate, is either an attack or a very aggressive
virus checker, fixing a bug that prevents an attack is not a good
idea, but fixing a bug that would otherwise break a virus checker
would be a good thing.

By the way, all other attempts to fix it look much more dangerous
than my initially proposed patch, you know the one you hated, but
it does work and does not look overly complicated either.

What was the reason why that cannot be done this way?


Thanks,
Bernd.

2020-04-29 19:29:17

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 9:23 PM Bernd Edlinger
<[email protected]> wrote:
> On 4/29/20 7:58 PM, Linus Torvalds wrote:
> > On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
> >>
> >> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
> >> <[email protected]> wrote:
> >>>
> >>> - we move check_unsafe_exec() down. As far as I can tell, there's no
> >>> reason it's that early - the flags it sets aren't actually used until
> >>> when we actually do that final set_creds..
> >>
> >> Right, we should be able to do that stuff quite a bit later than it happens now.
> >
> > Actually, looking at it, this looks painful for multiple reasons.
> >
> > The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> > when I traced it through, happened much earlier than I thought. Making
> > things worse, it's done by prepare_binprm(), which also potentially
> > gets called from random points by the low-level binfmt handlers too.
> >
> > And we also have that odd "fs->in_exec" flag, which is used by thread
> > cloning and io_uring, and I'm not sure what the exact semantics are.
> >
> > I'm _almost_ inclined to say that we should just abort the execve()
> > entirely if somebody tries to attach in the middle.
> >
> > IOW, get rid of the locking, and replace it all just with a sequence
> > count. Make execve() abort if the sequence count has changed between
> > loading the original creds, and having installed the new creds.
> >
> > You can ptrace _over_ an execve, and you can ptrace _after_ an
> > execve(), but trying to attach just as we execve() would just cause
> > the execve() to fail.
> >
> > We could maybe make it conditional on the credentials actually having
> > changed at all (set another flag in bprm_fill_uid()). So it would only
> > fail for the suid exec case.
> >
> > Because honestly, trying to ptrace in the middle of a suid execve()
> > sounds like an attack, not a useful thing.
> >
>
> I think the use case where a program attaches and detaches many
> processes at a high rate, is either an attack or a very aggressive
> virus checker, fixing a bug that prevents an attack is not a good
> idea, but fixing a bug that would otherwise break a virus checker
> would be a good thing.
>
> By the way, all other attempts to fix it look much more dangerous
> than my initially proposed patch, you know the one you hated, but
> it does work and does not look overly complicated either.
>
> What was the reason why that cannot be done this way?

I'm not sure which patch you're talking about - I assume you don't
mean <https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/>?

2020-04-29 20:21:47

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/29/20 9:26 PM, Jann Horn wrote:
> On Wed, Apr 29, 2020 at 9:23 PM Bernd Edlinger
> <[email protected]> wrote:
>> On 4/29/20 7:58 PM, Linus Torvalds wrote:
>>> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
>>>>
>>>> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> - we move check_unsafe_exec() down. As far as I can tell, there's no
>>>>> reason it's that early - the flags it sets aren't actually used until
>>>>> when we actually do that final set_creds..
>>>>
>>>> Right, we should be able to do that stuff quite a bit later than it happens now.
>>>
>>> Actually, looking at it, this looks painful for multiple reasons.
>>>
>>> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
>>> when I traced it through, happened much earlier than I thought. Making
>>> things worse, it's done by prepare_binprm(), which also potentially
>>> gets called from random points by the low-level binfmt handlers too.
>>>
>>> And we also have that odd "fs->in_exec" flag, which is used by thread
>>> cloning and io_uring, and I'm not sure what the exact semantics are.
>>>
>>> I'm _almost_ inclined to say that we should just abort the execve()
>>> entirely if somebody tries to attach in the middle.
>>>
>>> IOW, get rid of the locking, and replace it all just with a sequence
>>> count. Make execve() abort if the sequence count has changed between
>>> loading the original creds, and having installed the new creds.
>>>
>>> You can ptrace _over_ an execve, and you can ptrace _after_ an
>>> execve(), but trying to attach just as we execve() would just cause
>>> the execve() to fail.
>>>
>>> We could maybe make it conditional on the credentials actually having
>>> changed at all (set another flag in bprm_fill_uid()). So it would only
>>> fail for the suid exec case.
>>>
>>> Because honestly, trying to ptrace in the middle of a suid execve()
>>> sounds like an attack, not a useful thing.
>>>
>>
>> I think the use case where a program attaches and detaches many
>> processes at a high rate, is either an attack or a very aggressive
>> virus checker, fixing a bug that prevents an attack is not a good
>> idea, but fixing a bug that would otherwise break a virus checker
>> would be a good thing.
>>
>> By the way, all other attempts to fix it look much more dangerous
>> than my initially proposed patch, you know the one you hated, but
>> it does work and does not look overly complicated either.
>>
>> What was the reason why that cannot be done this way?
>
> I'm not sure which patch you're talking about - I assume you don't
> mean <https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/>?
>

No, I meant:

[PATCH v7 15/16] exec: Fix dead-lock in de_thread with ptrace_attach
https://marc.info/?l=linux-kernel&m=158559277631548&w=2

and

[PATCH v6 16/16] doc: Update documentation of ->exec_*_mutex
https://marc.info/?l=linux-kernel&m=158559277631548&w=2


I think that was the latest version, but this had several iterations already.

Thanks
Bernd.

2020-04-29 21:08:54

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 10:20 PM Bernd Edlinger
<[email protected]> wrote:
> On 4/29/20 9:26 PM, Jann Horn wrote:
> > On Wed, Apr 29, 2020 at 9:23 PM Bernd Edlinger
> > <[email protected]> wrote:
> >> On 4/29/20 7:58 PM, Linus Torvalds wrote:
> >>> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <[email protected]> wrote:
> >>>>
> >>>> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> - we move check_unsafe_exec() down. As far as I can tell, there's no
> >>>>> reason it's that early - the flags it sets aren't actually used until
> >>>>> when we actually do that final set_creds..
> >>>>
> >>>> Right, we should be able to do that stuff quite a bit later than it happens now.
> >>>
> >>> Actually, looking at it, this looks painful for multiple reasons.
> >>>
> >>> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> >>> when I traced it through, happened much earlier than I thought. Making
> >>> things worse, it's done by prepare_binprm(), which also potentially
> >>> gets called from random points by the low-level binfmt handlers too.
> >>>
> >>> And we also have that odd "fs->in_exec" flag, which is used by thread
> >>> cloning and io_uring, and I'm not sure what the exact semantics are.
> >>>
> >>> I'm _almost_ inclined to say that we should just abort the execve()
> >>> entirely if somebody tries to attach in the middle.
> >>>
> >>> IOW, get rid of the locking, and replace it all just with a sequence
> >>> count. Make execve() abort if the sequence count has changed between
> >>> loading the original creds, and having installed the new creds.
> >>>
> >>> You can ptrace _over_ an execve, and you can ptrace _after_ an
> >>> execve(), but trying to attach just as we execve() would just cause
> >>> the execve() to fail.
> >>>
> >>> We could maybe make it conditional on the credentials actually having
> >>> changed at all (set another flag in bprm_fill_uid()). So it would only
> >>> fail for the suid exec case.
> >>>
> >>> Because honestly, trying to ptrace in the middle of a suid execve()
> >>> sounds like an attack, not a useful thing.
> >>>
> >>
> >> I think the use case where a program attaches and detaches many
> >> processes at a high rate, is either an attack or a very aggressive
> >> virus checker, fixing a bug that prevents an attack is not a good
> >> idea, but fixing a bug that would otherwise break a virus checker
> >> would be a good thing.
> >>
> >> By the way, all other attempts to fix it look much more dangerous
> >> than my initially proposed patch, you know the one you hated, but
> >> it does work and does not look overly complicated either.
> >>
> >> What was the reason why that cannot be done this way?
> >
> > I'm not sure which patch you're talking about - I assume you don't
> > mean <https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/>?
> >
>
> No, I meant:
>
> [PATCH v7 15/16] exec: Fix dead-lock in de_thread with ptrace_attach
> https://marc.info/?l=linux-kernel&m=158559277631548&w=2

(on lore: https://lore.kernel.org/lkml/AM6PR03MB51700577CF9EF4972FDE568AE4CB0@AM6PR03MB5170.eurprd03.prod.outlook.com/)

I mean - I guess that kinda works? It'll mean that attaching to a task
with "strace" as root won't work reliably if the task is in the middle
of execve though, there'll be a weird race where if strace first
attaches to a sibling and then tries to attach to the thread going
through execve(), it'll fail to attach to that thread, and then lose
the process entirely once de_thread() has killed off the other
threads. My perfectionist side is somewhat irked by that.

Still, it's probably far better than Linus' "simpler change" where the
target just gets killed off if someone tries to trace it at the wrong
time.

2020-04-29 22:40:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 12:23 PM Bernd Edlinger
<[email protected]> wrote:
>
> By the way, all other attempts to fix it look much more dangerous
> than my initially proposed patch, you know the one you hated, but
> it does work and does not look overly complicated either.

I don't think it works.

The whole "take lock, release it in the middle, then re-take it" is
fundamentally a broken model. We've never had it work well, and it
tends to have subtle issues. That's particularly true when you then
open-core the (only) acceptable sequence something like five times.

> What was the reason why that cannot be done this way?

If it had introduced a new locking model, and new locking helpers for
that model, with a comment in _one_ place, and nobody doing the ad-hoc
locking on their own, that might be more acceptable.

But that's not what that patch did. No way will I take something that
is so fragile and hacky, and repeats the hack N times.

If you do it properly, with a helper function instead of repeating
that fragile nasty thing, maybe it will look better to me.

That said, locks that get released in the middle aren't really locks.
But at least if the only way to take that lock had the "oh, this lock
is in that inconsistent state, I will return -EAGAIN", that would be
one thing. But when you have N different users and rely on all of them
getting that special semantic right, you're doing something wrong.

Linus

2020-04-29 23:26:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 3:38 PM Linus Torvalds
<[email protected]> wrote:
>
> If you do it properly, with a helper function instead of repeating
> that fragile nasty thing, maybe it will look better to me.

Side note: if it has a special helper function for the "get lock,
repeat if it was invalid", you can do a better job than return
-EAGAIN.

In particular, you can do this

set_thread_flag(TIF_SIGPENDING);
return -RESTARTNOINTR;

which will actually restart the system call. So a ptrace() user (or
somebody doing a "write()" to /proc/<pid>/attr/xyz, wouldn't even see
the impossible EAGAIN error.

But that all requires that you have some locking helper routines like

int lock_exec_creds(struct task_struct *);
void unlock_exec_guard(struct task_struct *);

because there's no way we put that kind of super-fragile code in
several places. It would be very much one single routine with a *HUGE*
comment on it.

Linus

2020-04-30 00:02:23

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Thu, Apr 30, 2020 at 1:22 AM Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 29, 2020 at 3:38 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > If you do it properly, with a helper function instead of repeating
> > that fragile nasty thing, maybe it will look better to me.
>
> Side note: if it has a special helper function for the "get lock,
> repeat if it was invalid", you can do a better job than return
> -EAGAIN.
>
> In particular, you can do this
>
> set_thread_flag(TIF_SIGPENDING);
> return -RESTARTNOINTR;
>
> which will actually restart the system call. So a ptrace() user (or
> somebody doing a "write()" to /proc/<pid>/attr/xyz, wouldn't even see
> the impossible EAGAIN error.

Wouldn't you end up livelocked in the scenario that currently deadlocks? Like:

- tracer attaches to thread A
- thread B goes into execve, blocks on waiting for A's death
- tracer tries to attach to B and hits the -EAGAIN

If we make the PTRACE_ATTACH call restart, the tracer will just end up
looping without ever resolving the deadlock. If we want to get through
this cleanly with this approach, userspace needs to either
deprioritize the "I want to attach to pid X" and go back into its
eventloop, or to just treat -EAGAIN as a fatal error and give up
trying to attach to that task.

2020-04-30 01:13:01

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/30/20 1:59 AM, Jann Horn wrote:
> On Thu, Apr 30, 2020 at 1:22 AM Linus Torvalds
> <[email protected]> wrote:
>> On Wed, Apr 29, 2020 at 3:38 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> If you do it properly, with a helper function instead of repeating
>>> that fragile nasty thing, maybe it will look better to me.

I added the BIG FAT WARNNIG comments as a mitigation for that.
Did you like those comments?

>>
>> Side note: if it has a special helper function for the "get lock,
>> repeat if it was invalid", you can do a better job than return
>> -EAGAIN.
>>
>> In particular, you can do this
>>
>> set_thread_flag(TIF_SIGPENDING);
>> return -RESTARTNOINTR;
>>
>> which will actually restart the system call. So a ptrace() user (or
>> somebody doing a "write()" to /proc/<pid>/attr/xyz, wouldn't even see
>> the impossible EAGAIN error.
>
> Wouldn't you end up livelocked in the scenario that currently deadlocks? Like:
>
> - tracer attaches to thread A
> - thread B goes into execve, blocks on waiting for A's death
> - tracer tries to attach to B and hits the -EAGAIN
>
> If we make the PTRACE_ATTACH call restart, the tracer will just end up
> looping without ever resolving the deadlock. If we want to get through
> this cleanly with this approach, userspace needs to either
> deprioritize the "I want to attach to pid X" and go back into its
> eventloop, or to just treat -EAGAIN as a fatal error and give up
> trying to attach to that task.
>

Yes, exactly, the point is the caller is expected to call wait in that
scenario, otherwise the -EAGAIN just repeats forever, that is an API
change, yes, but something unavoidable, and the patch tries hard to
limit it to cases where the live-lock or pseudo-dead-lock is unavoidable
anyway.


Bernd.

2020-04-30 02:18:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 5:00 PM Jann Horn <[email protected]> wrote:
>
> Wouldn't you end up livelocked in the scenario that currently deadlocks?

The test case that we already know is broken, and any fix will have to
change anyway?

Let's just say that I don't care in the least.

But Bernd's patch as-is breaks a test-case that currently *works*,
namely something as simple as

echo xyz > /proc/<pid>/attr/something

and honestly, breaking something that _works_ and may be used in
reality, in orderf to make a known buggy user testcase work?

Because no, "write()" returning -EAGAIN isn't ok.

Linus

2020-04-30 02:24:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 6:08 PM Bernd Edlinger
<[email protected]> wrote:
>
> I added the BIG FAT WARNNIG comments as a mitigation for that.
> Did you like those comments?

No.

What's the point olf saying "THIS CODE IS GARBAGE" and then expecting
that to make it ok?

No,m that doesn't make it ok. It just means that it should have been
done differently.

> Yes, exactly, the point is the caller is expected to call wait in that
> scenario, otherwise the -EAGAIN just repeats forever, that is an API
> change, yes, but something unavoidable, and the patch tries hard to
> limit it to cases where the live-lock or pseudo-dead-lock is unavoidable
> anyway.

I'm getting really fed up with your insistence on that KNOWN BROKEN
garbage test-case.

It's shit. The test-case is wrong. I've told you before.

Your patch as-is breaks other cases that are *not* wrong in the kernel
currently, and that don't have test-cases because they JustWork(tm).

The livelock isn't interesting. The test-case that shows it is pure
garbage, and is written wrong.

IF that test-case hadn't been buggy in the first place, it would have
had ignored its child (or had a handler for SIGCHLD), and not
livelocked.

Linus

2020-04-30 03:04:54

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Thu, Apr 30, 2020 at 4:20 AM Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 29, 2020 at 6:08 PM Bernd Edlinger
> <[email protected]> wrote:
> >
> > I added the BIG FAT WARNNIG comments as a mitigation for that.
> > Did you like those comments?
>
> No.
>
> What's the point olf saying "THIS CODE IS GARBAGE" and then expecting
> that to make it ok?
>
> No,m that doesn't make it ok. It just means that it should have been
> done differently.
>
> > Yes, exactly, the point is the caller is expected to call wait in that
> > scenario, otherwise the -EAGAIN just repeats forever, that is an API
> > change, yes, but something unavoidable, and the patch tries hard to
> > limit it to cases where the live-lock or pseudo-dead-lock is unavoidable
> > anyway.
>
> I'm getting really fed up with your insistence on that KNOWN BROKEN
> garbage test-case.
>
> It's shit. The test-case is wrong. I've told you before.
>
> Your patch as-is breaks other cases that are *not* wrong in the kernel
> currently, and that don't have test-cases because they JustWork(tm).
>
> The livelock isn't interesting. The test-case that shows it is pure
> garbage, and is written wrong.
>
> IF that test-case hadn't been buggy in the first place, it would have
> had ignored its child (or had a handler for SIGCHLD), and not
> livelocked.

But if we go with Bernd's approach together with your restart
suggestion, then simply doing PTRACE_ATTACH on two threads A and B
would be enough to livelock, right?

tracer: PTRACE_ATTACHes to A
B: enters de_thread()
tracer: attempts to PTRACE_ATTACH to B

Now the tracer will loop on PTRACE_ATTACH, right?

2020-04-30 03:28:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 8:00 PM Jann Horn <[email protected]> wrote:
>
> But if we go with Bernd's approach together with your restart
> suggestion,

So repeat after me: Bernd's approach _without_ the restart is unacceptable.

It's unacceptable because it breaks things that currently work, and
returns EAGAIN in situations where that is simple not a valid error
code.

His original patch also happens to be unacceptable because it's an
unmaintainable mess, but that's independent of the bug it introduced.

That bug has nothing to do with ptrace(). It's literally a "write()"
to a file in /proc.

What is so hard to get about this basic thing?

> then simply doing PTRACE_ATTACH on two threads A and B
> would be enough to livelock, right?

The same case that just causes a recursive wait. Yes. No worse off than we were.

And the fact is, *THAT* case looks truly trivial to work around.

Just make the ptrace() code - but not the fs/proc/base.c code - do
something like

if (lock_exec_creds(tsk))
return -EINTR;

and now ptrace() doesn't repeat (simply because it doesn't return that
ERESTARTNOINTR. It would go through that "return through signal
handling code" in the kernel, but it wouldn't actually retry the
system call).

But I'm getting less and less interested in trying to "fix" this
problem, when people don't seem to realize that the important case is
to not break _good_ programs, and the pointless buggy garbage
test-case is entirely uninteresting. It's buggy user code. If it
causes a wait or a livelock, nobody sane should care in the least. Fix
the bug in user space.

Introducing new bugs in the kernel where they didn't exist before - in
order to try to work around buggy user-space that has never ever
worked - is not acceptable.

Linus

2020-04-30 03:43:52

by Jann Horn

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Thu, Apr 30, 2020 at 5:26 AM Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 29, 2020 at 8:00 PM Jann Horn <[email protected]> wrote:
> >
> > But if we go with Bernd's approach together with your restart
> > suggestion,
>
> So repeat after me: Bernd's approach _without_ the restart is unacceptable.
>
> It's unacceptable because it breaks things that currently work, and
> returns EAGAIN in situations where that is simple not a valid error
> code.

Sure, makes sense to me. I'm not eager to start randomly throwing
EAGAIN where it couldn't happen before either (and I initially missed
that Bernd's patch did that for procfs files, too).

> That bug has nothing to do with ptrace(). It's literally a "write()"
> to a file in /proc.
>
> What is so hard to get about this basic thing?

You said:

| So a ptrace() user (or [...] wouldn't even see the impossible EAGAIN error.

So I assumed you explicitly wanted ptrace() to restart, too. I was
just pointing out that that didn't make sense to me.

2020-04-30 03:53:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 8:41 PM Jann Horn <[email protected]> wrote:
>
> | So a ptrace() user (or [...] wouldn't even see the impossible EAGAIN error.
>
> So I assumed you explicitly wanted ptrace() to restart, too. I was
> just pointing out that that didn't make sense to me.

I'm actually ok with the restart option, simply because I continue to
maintain that the program is buggy. "Anything goes".

To not be buggy, the program needs to install a SIGCHLD handler so
that it can reap its (pseudo-)children.

At which point it doesn't actually make any difference whether we fix
the kernel or not, because then the non-buggy program will just work -
even with a non-modified kernel.

Honestly, the main argument for the kernel doing anything different at
all is that from a user-mode perspective, silently hanging in the
kernel waiting for something to happen is likely the least easy to
debug.

But if you do a return to user space - even if it's to just rinse and
repeat - it's at least not "silent" any more, even if the main noise
it makes is just to waste 100% CPU time. At least that's a big hint to
somebody to take a look.

But yes, we can make ptrace() - and _only_ ptrace() - then not repeat,
and return a new error code that it has never returned before. Like
EAGAIN. Mainly because in that case we're only breaking semantics of
something that was already broken - unlike "write()", which has
perfectly well-defined semantics and wasn't broken.

Linus

2020-04-30 13:40:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Wed, Apr 29, 2020 at 8:25 PM Linus Torvalds
<[email protected]> wrote:
>
> Bernd's approach _without_ the restart is unacceptable.
>
> It's unacceptable because it breaks things that currently work, and
> returns EAGAIN in situations where that is simple not a valid error
> code.

Looking at my restart thing, I think it's a hack, and I don't think
that's acceptable either.

I was pleased with how clever it was, but it's one of those "clever
hacks" that is in the end more "hack" than "clever".

The basic issue is that releasing a lock in the middle just
fundamentally defeats the purpose of the lock unless you have a way to
redo the operation after fixing whatever caused the drop.

And the system call restart thing is dodgy, because there's none of
that "fixing".

It can cause that "write()" call to do the CPU busy loop too if it
hits that "execve() in process" situation.

The only difference with the "write()" case vs "ptrace()" is that
nobody has ever written an insane test-case that doesn't wait for
children, and then does a "write()" to the /proc file that can then
require zombie children to be reaped.

So I don't think the approach is valid even with the restart. Not
restarting isn't acceptable for write(), but restarting doesn't really
work either.

I guess we could have a very special lock that does something like

int lock_exec_cred_mutex(struct task_struct *task)
{
if (mutex_trylock(&task->signal->cred_guard_mutex))
return 0;

if (lock_can_deadlock(task))
return -EDEADLK;

return mutex_lock_interruptible(&task->signal->cred_guard_mutex);
}

might work. But that "lock_can_deadlock()" needs some kind of oracle
or heuristic.

And I can't come up with a perfect one, although I can come up with
things like "if the target has threads, and those threads have a
reaoer that is you, then you have to have SIGCHLD enabled". But it
gets ugly and hacky.

But I think actually releasing the lock in the middle of execve()
before it's done with is worse than ugly and hacky - it's
fundamentally broken.

Moving things around? Sure - like waiting for the threads _after_ the
lock and having done all the cred calculations. So I think Oleg's
patch works.

Linus

2020-04-30 13:42:14

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/30/20 4:16 AM, Linus Torvalds wrote:
> On Wed, Apr 29, 2020 at 5:00 PM Jann Horn <[email protected]> wrote:
>>
>> Wouldn't you end up livelocked in the scenario that currently deadlocks?
>
> The test case that we already know is broken, and any fix will have to
> change anyway?
>

The purpose of the test case was only to test the behaviour of my
later patch. The test case _must_ be adjusted to the follow-up
patch, I have no problem with that. Anybody may change the test case
when we know how to fix the API. I did just not anticipate that Eric
would only apply 14 of 16 patches = 87.5% of the patch series. Now that
causes some tension, but it is not really a problem IMHO.

> Let's just say that I don't care in the least.
>
> But Bernd's patch as-is breaks a test-case that currently *works*,
> namely something as simple as
>
> echo xyz > /proc/<pid>/attr/something
>

Excuse me, but what in my /proc folder there is no attr/something
is there a procfs equivalent of pthread_attach ?

What exactly is "attr/something" ?

> and honestly, breaking something that _works_ and may be used in
> reality, in orderf to make a known buggy user testcase work?
>
> Because no, "write()" returning -EAGAIN isn't ok.
>

write can return -EAGAIN if the file is non-blocking, it is
never the case for a disk file, but for a NFS that is not at all
clear, depends on a mount option, and once I had a deadlock in
one of my test systems after OOM-stress, but I never was able
to reproduce, the umount deadlocked, then I was not able to
reboot, could be an alpha-particle of course, who knows...


Hmmm.. maybe a stupid idea:

We could keep the old deadlock-capable API,
and add a new _flag_ somewhere to the PTHREAD_ATTACH call,
that _enables_ the non-blocking behavior, how about that.



Thanks
Bernd.

> Linus
>

2020-04-30 13:52:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Thu, Apr 30, 2020 at 6:39 AM Bernd Edlinger
<[email protected]> wrote:
>
> Excuse me, but what in my /proc folder there is no attr/something
> is there a procfs equivalent of pthread_attach ?
>
> What exactly is "attr/something" ?

Anything that uses that proc_pid_attr_write().

Which you should have realized, since you wrote the patch that changed
that function to return -EAGAIN.

That's

/proc/<pid>/attr/{current,exec,fscreate,keycreate,prev,sockcreate}

and some smack files.

Your patch definitely made them return -EINVAL if they happen in that
execve() black hole, instead of waiting for the execve() to just
complete and then just work.

Dropping a lock really is broken. It';s broken even if you then set a
flag saying "I dropped the lock, now you can't use it".

Linus

2020-04-30 14:31:52

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

Hi Linus,

On 4/30/20 3:47 PM, Linus Torvalds wrote:
> On Thu, Apr 30, 2020 at 6:39 AM Bernd Edlinger
> <[email protected]> wrote:
>>
>> Excuse me, but what in my /proc folder there is no attr/something
>> is there a procfs equivalent of pthread_attach ?
>>
>> What exactly is "attr/something" ?
>
> Anything that uses that proc_pid_attr_write().
>
> Which you should have realized, since you wrote the patch that changed
> that function to return -EAGAIN.
>

Ah, now I see, that was of course not the intended effect,
but that is not where the pseudo-deadlock happens at all,
would returning -RESTARTNOINTR in this function make this
patch acceptable, it will not have an effect on the test case?


Bernd.

> That's
>
> /proc/<pid>/attr/{current,exec,fscreate,keycreate,prev,sockcreate}
>
> and some smack files.
>
> Your patch definitely made them return -EINVAL if they happen in that
> execve() black hole, instead of waiting for the execve() to just
> complete and then just work.
>
> Dropping a lock really is broken. It';s broken even if you then set a
> flag saying "I dropped the lock, now you can't use it".
>
> Linus
>

2020-04-30 16:45:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On Thu, Apr 30, 2020 at 7:29 AM Bernd Edlinger
<[email protected]> wrote:
>
> Ah, now I see, that was of course not the intended effect,
> but that is not where the pseudo-deadlock happens at all,
> would returning -RESTARTNOINTR in this function make this
> patch acceptable, it will not have an effect on the test case?

So that was why I suggested doing it all with a helper function, and
also doing that

set_thread_flag(TIF_SIGPENDING);

because without going through the "check-for-signals" code at return
to user space, -ERESTARTNOINTR doesn't actually _do_ any restart.

However, the more I looked at it, the less I actually liked that hack.

Part of it is simply because it can cause the exact same problem that
ptrace() does (at least in theory). And even if you don't get the
livelock thing, you can get the "use 100% CPU time" thing, because if
that case ever triggers, and we re-try, it will generally just _keep_
on triggering (think "execve is waiting for a zombie, nobody is
reaping it").

IOW, restarting doesn't really fix the problem, or guarantee any
forward progress.

So I'd have been ok with your "unsafe_exec_flag" if

(a) it had been done in one place with a helper function.

(b) it would _only_ trigger for ptrace (and perhaps seccomp).

but I don't think it works for that write() case.

That said, I'm not 100% convinced that that write() case really even
needs that cred_guard_mutex (renamed or not).

Maybe we can introduce a new mutex just against concurrent ptrace
(which is what at least the _comment_ says_ that
security_setprocattr() wants - I didn't check the actual low-level
security code).

So maybe that proc_pid_attr_write() case could be done some other way entirely.

Th emore we go through all this, the more I really think that Oleg's
patch to just delay the waiting for things until after dropping the
mutex in execve() is the way to go.

Is it a "simple" and small patch? No. But it really addresses the core
issue, without introducing new odd rules or special cases, or making a
lock that doesn't reliably work as a lock.

Linus

2020-05-02 04:14:06

by Bernd Edlinger

[permalink] [raw]
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/30/20 6:40 PM, Linus Torvalds wrote:
> On Thu, Apr 30, 2020 at 7:29 AM Bernd Edlinger
> <[email protected]> wrote:
>>
>> Ah, now I see, that was of course not the intended effect,
>> but that is not where the pseudo-deadlock happens at all,
>> would returning -RESTARTNOINTR in this function make this
>> patch acceptable, it will not have an effect on the test case?
>
> So that was why I suggested doing it all with a helper function, and
> also doing that
>
> set_thread_flag(TIF_SIGPENDING);
>
> because without going through the "check-for-signals" code at return
> to user space, -ERESTARTNOINTR doesn't actually _do_ any restart.
>
> However, the more I looked at it, the less I actually liked that hack.
>
> Part of it is simply because it can cause the exact same problem that
> ptrace() does (at least in theory). And even if you don't get the
> livelock thing, you can get the "use 100% CPU time" thing, because if
> that case ever triggers, and we re-try, it will generally just _keep_
> on triggering (think "execve is waiting for a zombie, nobody is
> reaping it").
>
> IOW, restarting doesn't really fix the problem, or guarantee any
> forward progress.
>

Right, if it is a real time process it will result in priority-inversion.
Correct.

If it is a virus checker it will be real time priority and it will be
very aggressive ;-) I can feel its aggressiveness already :-) shiver...

And this little zombie-process will paralyze it immediately, nice try.

You see what I mean?

> So I'd have been ok with your "unsafe_exec_flag" if
>
> (a) it had been done in one place with a helper function.
>
> (b) it would _only_ trigger for ptrace (and perhaps seccomp).
>
> but I don't think it works for that write() case.
>
> That said, I'm not 100% convinced that that write() case really even
> needs that cred_guard_mutex (renamed or not).
>
> Maybe we can introduce a new mutex just against concurrent ptrace
> (which is what at least the _comment_ says_ that
> security_setprocattr() wants - I didn't check the actual low-level
> security code).
>
> So maybe that proc_pid_attr_write() case could be done some other way entirely.
>
> Th emore we go through all this, the more I really think that Oleg's
> patch to just delay the waiting for things until after dropping the
> mutex in execve() is the way to go.
>
> Is it a "simple" and small patch? No. But it really addresses the core
> issue, without introducing new odd rules or special cases, or making a
> lock that doesn't reliably work as a lock.
>

Hmm. I think I can agree, that this problem deserves to be solved
really slowly.

Oleg where was your last patch, does it still work or does it
need to be re-based?

And I almost forgot about Eric, are you still with us?


Thanks
Bernd.