2009-03-27 09:35:33

by Metzger, Markus T

[permalink] [raw]
Subject: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

Bts tracing had been stopped in __ptrace_unlink() which is called
with interrupts disabled and tasklist_lock write-held.

Release the bts tracer early in do_exit() when interrupts are enabled
and tasklist_lock is not held.

Resolve races between exit and detach by atomically exchanging the
bts tracer pointer with NULL.


Signed-off-by: Markus Metzger <[email protected]>
---

Index: git-tip/arch/x86/include/asm/ptrace.h
===================================================================
--- git-tip.orig/arch/x86/include/asm/ptrace.h 2009-03-27 07:59:22.000000000 +0100
+++ git-tip/arch/x86/include/asm/ptrace.h 2009-03-27 08:18:46.000000000 +0100
@@ -235,12 +235,14 @@ extern int do_get_thread_area(struct tas
extern int do_set_thread_area(struct task_struct *p, int idx,
struct user_desc __user *info, int can_allocate);

-extern void x86_ptrace_untrace(struct task_struct *);
+extern void x86_ptrace_untrace(struct task_struct *child);
extern void x86_ptrace_fork(struct task_struct *child,
unsigned long clone_flags);
+extern void x86_ptrace_report_exit(long exit_code);

#define arch_ptrace_untrace(tsk) x86_ptrace_untrace(tsk)
#define arch_ptrace_fork(child, flags) x86_ptrace_fork(child, flags)
+#define arch_ptrace_report_exit(code) x86_ptrace_report_exit(code)

#endif /* __KERNEL__ */

Index: git-tip/arch/x86/kernel/ptrace.c
===================================================================
--- git-tip.orig/arch/x86/kernel/ptrace.c 2009-03-27 07:59:23.000000000 +0100
+++ git-tip/arch/x86/kernel/ptrace.c 2009-03-27 08:27:46.000000000 +0100
@@ -789,41 +789,150 @@ static void ptrace_bts_fork(struct task_
tsk->thread.bts_ovfl_signal = 0;
}

-static void ptrace_bts_untrace(struct task_struct *child)
+/*
+ * Release child's bts tracer and free the bts trace buffer.
+ *
+ * This is called for:
+ * - a normal detach
+ * - an exiting tracer
+ * - an exiting tracee
+ *
+ * The various calls may race.
+ *
+ * The first to grab the bts tracer pointer will release first the
+ * tracer and then the buffer.
+ *
+ * Tracer and tracee will call this with the tracer's mm.
+ *
+ * Must be called with interrupts enabled and tasklist_lock not held.
+ */
+static void ptrace_bts_release(struct task_struct *child,
+ struct mm_struct *mm)
{
- if (unlikely(child->bts)) {
- ds_release_bts(child->bts);
- child->bts = NULL;
-
- /* We cannot update total_vm and locked_vm since
- child's mm is already gone. But we can reclaim the
- memory. */
+ struct bts_tracer *tracer;
+
+ tracer = xchg(&child->bts, NULL);
+ if (tracer) {
+ ds_release_bts(tracer);
+
+ release_locked_buffer_on_behalf(mm, child->bts_buffer,
+ child->bts_size);
kfree(child->bts_buffer);
child->bts_buffer = NULL;
child->bts_size = 0;
}
}

-static void ptrace_bts_detach(struct task_struct *child)
+static void ptrace_bts_exit_tracer(void)
{
+ struct task_struct *child, *next;
+
/*
- * Ptrace_detach() races with ptrace_untrace() in case
- * the child dies and is reaped by another thread.
+ * We must not hold the tasklist_lock when we release the bts
+ * tracer, since we need to make sure the cpu executing the
+ * tracee stops tracing before we may free the trace buffer.
*
- * We only do the memory accounting at this point and
- * leave the buffer deallocation and the bts tracer
- * release to ptrace_bts_untrace() which will be called
- * later on with tasklist_lock held.
+ * read_lock(&tasklist_lock) and smp_call_function() may
+ * deadlock with write_lock_irq(&tasklist_lock).
+ *
+ * We need to hold the tasklist_lock while iterating over our list
+ * of ptraced children, though, to protect against inserts
+ * (e.g. traceme).
+ *
+ * New additions will not have bts tracers, yet. Removals will have
+ * their bts tracer released by the removing task.
+ * All we need to care about is that we complete one traversal.
+ *
+ * We use the next pointer in the safe walk to detect any changes
+ * involving the current element that might affect our traversal and
+ * start over again.
*/
- release_locked_buffer(child->bts_buffer, child->bts_size);
+ read_lock(&tasklist_lock);
+ repeat:
+ list_for_each_entry_safe(child, next, &current->ptraced, ptrace_entry) {
+ int start_over;
+
+ get_task_struct(child);
+ read_unlock(&tasklist_lock);
+
+ ptrace_bts_release(child, current->mm);
+
+ read_lock(&tasklist_lock);
+ start_over =
+ next != list_entry(child->ptrace_entry.next,
+ typeof(*child), ptrace_entry);
+ put_task_struct(child);
+ if (start_over)
+ goto repeat;
+ }
+ read_unlock(&tasklist_lock);
+}
+
+static void ptrace_bts_exit_tracee(void)
+{
+ struct mm_struct *mm = NULL;
+
+ /*
+ * This code may race with ptrace reparenting.
+ *
+ * We hold the tasklist lock while we check whether we're
+ * ptraced and grab our ptracer's mm.
+ *
+ * It does not really matter if we're reparented,
+ * afterwards. We hold onto the correct mm.
+ *
+ * If we're reparented before we get the tasklist_lock, our
+ * former ptrace parent will release the bts tracer.
+ */
+ read_lock(&tasklist_lock);
+ if (current->ptrace)
+ mm = get_task_mm(current->parent);
+ read_unlock(&tasklist_lock);
+
+ if (mm) {
+ ptrace_bts_release(current, mm);
+ mmput(mm);
+ }
+}
+
+/*
+ * Cleanup bts tracing for an exiting task.
+ *
+ * Tracer and tracee may race to release the bts tracer. Whoever
+ * reaches it first, will release it.
+ * Both will use the tracer's mm for refunding the allocated and
+ * locked memory.
+ *
+ * Must be called with interrupts enabled and no locks held.
+ */
+static void ptrace_bts_exit(void)
+{
+ if (unlikely(!list_empty(&current->ptraced)))
+ ptrace_bts_exit_tracer();
+
+ if (unlikely(current->bts))
+ ptrace_bts_exit_tracee();
+}
+
+/*
+ * Called from __ptrace_unlink() after the child has been moved back
+ * to its original parent.
+ *
+ * By then, branch tracing should have been disabled, already; either
+ * via ptrace_detach() or via do_exit() of either tracer or tracee.
+ */
+static void ptrace_bts_untrace(struct task_struct *child)
+{
+ WARN(child->bts, "leaking BTS tracer\n");
}
#else
static inline void ptrace_bts_fork(struct task_struct *tsk) {}
-static inline void ptrace_bts_detach(struct task_struct *child) {}
static inline void ptrace_bts_untrace(struct task_struct *child) {}
+static inline void ptrace_bts_exit(void) {}
#endif /* CONFIG_X86_PTRACE_BTS */

-void x86_ptrace_fork(struct task_struct *child, unsigned long clone_flags)
+void x86_ptrace_fork(struct task_struct *child,
+ unsigned long clone_flags)
{
ptrace_bts_fork(child);
}
@@ -833,6 +942,11 @@ void x86_ptrace_untrace(struct task_stru
ptrace_bts_untrace(child);
}

+void x86_ptrace_report_exit(long exit_code)
+{
+ ptrace_bts_exit();
+}
+
/*
* Called by kernel/ptrace.c when detaching..
*
@@ -844,7 +958,7 @@ void ptrace_disable(struct task_struct *
#ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
#endif
- ptrace_bts_detach(child);
+ ptrace_bts_release(child, current->mm);
}

#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


2009-03-27 14:38:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

On 03/27, Markus Metzger wrote:
>
> +static void ptrace_bts_exit(void)
> +{
> + if (unlikely(!list_empty(&current->ptraced)))
> + ptrace_bts_exit_tracer();
> +
> + if (unlikely(current->bts))
> + ptrace_bts_exit_tracee();
> +}

Could you explain why do we need ptrace_bts_exit_tracee() ?

If current is traced, the tracer should do ptrace_bts_release()
eventually, no?

And if we really need to do ptrace_bts_exit_tracee(), then
"if (unlikely(current->bts))" check is racy. The tracer
can do PTRACE_BTS_CONFIG right after the check.

Oleg.

2009-03-27 15:36:34

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

>-----Original Message-----
>From: Oleg Nesterov [mailto:[email protected]]
>Sent: Friday, March 27, 2009 3:35 PM


>> +static void ptrace_bts_exit(void)
>> +{
>> + if (unlikely(!list_empty(&current->ptraced)))
>> + ptrace_bts_exit_tracer();
>> +
>> + if (unlikely(current->bts))
>> + ptrace_bts_exit_tracee();
>> +}
>
>Could you explain why do we need ptrace_bts_exit_tracee() ?
>
>If current is traced, the tracer should do ptrace_bts_release()
>eventually, no?

If current is traced and exits, it may be reaped by another thread that is not
the tracer (that's actually your example you made in an earlier thread to
describe the race between a normal detach and an exiting tracee).

The ptrace_unlink() call to detach the tracer is executed with irq's disabled.
I need irq's enabled (see the other discussion, to wait for the traced task).

Therefore, I have the tracee disable bts tracing itself when it exits.


>And if we really need to do ptrace_bts_exit_tracee(), then
>"if (unlikely(current->bts))" check is racy. The tracer
>can do PTRACE_BTS_CONFIG right after the check.

The ptrace system call to do this would require the tracee to be stopped.


regards,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-03-27 17:28:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

On 03/27, Metzger, Markus T wrote:
>
> >-----Original Message-----
> >From: Oleg Nesterov [mailto:[email protected]]
> >Sent: Friday, March 27, 2009 3:35 PM
>
> >> +static void ptrace_bts_exit(void)
> >> +{
> >> + if (unlikely(!list_empty(&current->ptraced)))
> >> + ptrace_bts_exit_tracer();
> >> +
> >> + if (unlikely(current->bts))
> >> + ptrace_bts_exit_tracee();
> >> +}
> >
> >Could you explain why do we need ptrace_bts_exit_tracee() ?
> >
> >If current is traced, the tracer should do ptrace_bts_release()
> >eventually, no?
>
> If current is traced and exits, it may be reaped by another thread that is not
> the tracer (that's actually your example you made in an earlier thread to
> describe the race between a normal detach and an exiting tracee).
>
> The ptrace_unlink() call to detach the tracer is executed with irq's disabled.
> I need irq's enabled (see the other discussion, to wait for the traced task).

OK,

> Therefore, I have the tracee disable bts tracing itself when it exits.
>
>
> >And if we really need to do ptrace_bts_exit_tracee(), then
> >"if (unlikely(current->bts))" check is racy. The tracer
> >can do PTRACE_BTS_CONFIG right after the check.
>
> The ptrace system call to do this would require the tracee to be stopped.

Yes, but this doesn't matter.

The tracer starts ptrace(PTRACE_BTS_CONFIG) and stops the tracee.
But when the tracer calls ptrace_bts_config() the tracee can be already
killed, and it can exit and bypass ptrace_bts_exit_tracee().

Oleg.

2009-03-27 19:26:45

by Markus Metzger

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

On Fri, 2009-03-27 at 18:24 +0100, Oleg Nesterov wrote:
> On 03/27, Metzger, Markus T wrote:
> >
> > >-----Original Message-----
> > >From: Oleg Nesterov [mailto:[email protected]]
> > >Sent: Friday, March 27, 2009 3:35 PM
> >
> > >> +static void ptrace_bts_exit(void)
> > >> +{
> > >> + if (unlikely(!list_empty(&current->ptraced)))
> > >> + ptrace_bts_exit_tracer();
> > >> +
> > >> + if (unlikely(current->bts))
> > >> + ptrace_bts_exit_tracee();
> > >> +}
> > >
> > >Could you explain why do we need ptrace_bts_exit_tracee() ?
> > >
> > >If current is traced, the tracer should do ptrace_bts_release()
> > >eventually, no?
> >
> > If current is traced and exits, it may be reaped by another thread that is not
> > the tracer (that's actually your example you made in an earlier thread to
> > describe the race between a normal detach and an exiting tracee).
> >
> > The ptrace_unlink() call to detach the tracer is executed with irq's disabled.
> > I need irq's enabled (see the other discussion, to wait for the traced task).
>
> OK,
>
> > Therefore, I have the tracee disable bts tracing itself when it exits.
> >
> >
> > >And if we really need to do ptrace_bts_exit_tracee(), then
> > >"if (unlikely(current->bts))" check is racy. The tracer
> > >can do PTRACE_BTS_CONFIG right after the check.
> >
> > The ptrace system call to do this would require the tracee to be stopped.
>
> Yes, but this doesn't matter.
>
> The tracer starts ptrace(PTRACE_BTS_CONFIG) and stops the tracee.
> But when the tracer calls ptrace_bts_config() the tracee can be already
> killed, and it can exit and bypass ptrace_bts_exit_tracee().

Thanks for your review! This is really helpful.


This exit race is a nasty thing.
I think I need to either prevent a task from running while I change the
tracing configuration, or have the tracee execute all the setup/cleanup
code.

The former looks more attractive to me. I would expect that there is
some infrastructure for that. I asked for pointers in a previous email,
already.


thanks and regards,
markus.

2009-03-28 11:32:19

by Markus Metzger

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

On Fri, 2009-03-27 at 18:24 +0100, Oleg Nesterov wrote:
> On 03/27, Metzger, Markus T wrote:
> >
> > >-----Original Message-----
> > >From: Oleg Nesterov [mailto:[email protected]]
> > >Sent: Friday, March 27, 2009 3:35 PM
> >
> > >> +static void ptrace_bts_exit(void)
> > >> +{
> > >> + if (unlikely(!list_empty(&current->ptraced)))
> > >> + ptrace_bts_exit_tracer();
> > >> +
> > >> + if (unlikely(current->bts))
> > >> + ptrace_bts_exit_tracee();
> > >> +}
> > >
> > >Could you explain why do we need ptrace_bts_exit_tracee() ?
> > >
> > >If current is traced, the tracer should do ptrace_bts_release()
> > >eventually, no?
> >
> > If current is traced and exits, it may be reaped by another thread that is not
> > the tracer (that's actually your example you made in an earlier thread to
> > describe the race between a normal detach and an exiting tracee).
> >
> > The ptrace_unlink() call to detach the tracer is executed with irq's disabled.
> > I need irq's enabled (see the other discussion, to wait for the traced task).
>
> OK,
>
> > Therefore, I have the tracee disable bts tracing itself when it exits.
> >
> >
> > >And if we really need to do ptrace_bts_exit_tracee(), then
> > >"if (unlikely(current->bts))" check is racy. The tracer
> > >can do PTRACE_BTS_CONFIG right after the check.
> >
> > The ptrace system call to do this would require the tracee to be stopped.
>
> Yes, but this doesn't matter.
>
> The tracer starts ptrace(PTRACE_BTS_CONFIG) and stops the tracee.
> But when the tracer calls ptrace_bts_config() the tracee can be already
> killed, and it can exit and bypass ptrace_bts_exit_tracee().

Could something like that work?

ds_release_bts(struct bts_tracer *tracer)
{
struct task_struct *task =
tracer->ds.context->task;

do {
set_task_state(task, TASK_UNINTERRUPTIBLE);
while (!wait_task_inactive(task, TASK_UNINTERRUPTIBLE));

ds_suspend_bts(tracer);
ds_free_bts(tracer);

wake_up_process(task);
}

I guess it would not work in general, since task could already sleep
on some event and be woken up after the do loop.

I was thinking it might work for the exit race, since we don't sleep
during exit, but it might have started sleeping before the SIGKILL.


Isn't this a general problem for ptrace?

Ptrace uses a similar pattern in ptrace_check_attach().
It stops the traced task, but that task might wake up immediately after
that check. It might be scheduled in any time during a ptrace operation.

In that case, must I always assume I'm operating on a running task?


regards,
markus.

2009-03-30 01:26:48

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

On 03/28, Markus Metzger wrote:
>
> ds_release_bts(struct bts_tracer *tracer)
> {
> struct task_struct *task =
> tracer->ds.context->task;
>
> do {
> set_task_state(task, TASK_UNINTERRUPTIBLE);

Oh, this is not right,

> while (!wait_task_inactive(task, TASK_UNINTERRUPTIBLE));

and can't help to wait_task_inactive(). Again, unless this task
does a blocking call wait_task_inactive() can't force this task
to be deactivated. Even we set TASK_UNINTERRUPTIBLE.

> ds_suspend_bts(tracer);
> ds_free_bts(tracer);
>
> wake_up_process(task);

makes it TASK_RUNNING. This can't restores the proper ->state
if it was not running (say, TASK_STOPPED).

> }
>
> I guess it would not work in general, since task could already sleep
> on some event and be woken up after the do loop.

This too,

> I was thinking it might work for the exit race, since we don't sleep
> during exit,

We do. In the context of bts problems, we don't care if the task sleeps
after _free_tracee(). But this task can have tracees too, it can in turn
sleep in wait_task_inactive().

> Isn't this a general problem for ptrace?
>
> Ptrace uses a similar pattern in ptrace_check_attach().
> It stops the traced task, but that task might wake up immediately after
> that check. It might be scheduled in any time during a ptrace operation.

Yes. ptrace() can assume the tracee is TASK_TRACED, or it is dying/dead.

If you need to make sure it is still traced, you can re-check ->state
under ->siglock. Until you drop this lock, the tracee (if still traced)
can't escape from ptrace_stop/do_signal_stop, even if scheduled.

Oleg.

2009-03-30 06:56:12

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit

>-----Original Message-----
>From: Oleg Nesterov [mailto:[email protected]]
>Sent: Monday, March 30, 2009 3:22 AM
>To: Markus Metzger


>> ds_release_bts(struct bts_tracer *tracer)
>> {
>> struct task_struct *task =
>> tracer->ds.context->task;
>>
>> do {
>> set_task_state(task, TASK_UNINTERRUPTIBLE);
>
>Oh, this is not right,

Agreed.

I wonder if it is ever safe to change another task's state.
There's a lot of code that sets the task state seemingly
unprotected - but always for current.


>> Isn't this a general problem for ptrace?
>>
>> Ptrace uses a similar pattern in ptrace_check_attach().
>> It stops the traced task, but that task might wake up immediately after
>> that check. It might be scheduled in any time during a ptrace operation.
>
>Yes. ptrace() can assume the tracee is TASK_TRACED, or it is dying/dead.
>
>If you need to make sure it is still traced, you can re-check ->state
>under ->siglock. Until you drop this lock, the tracee (if still traced)
>can't escape from ptrace_stop/do_signal_stop, even if scheduled.

I can't hold the siglock for the entire ptrace operation, can I?

Instead, I put all bts stuff into a context struct and I use use-counts
to make sure the tracer is not released while it is used by a ptrace operation.

regards,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-03-30 10:44:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 3/14] x86, ptrace, bts: stop bts tracing early in do_exit


>
>>> ds_release_bts(struct bts_tracer *tracer)
>>> {
>>> struct task_struct *task =
>>> tracer->ds.context->task;
>>>
>>> do {
>>> set_task_state(task, TASK_UNINTERRUPTIBLE);
>> Oh, this is not right,
>
> Agreed.
>
> I wonder if it is ever safe to change another task's state.
> There's a lot of code that sets the task state seemingly
> unprotected - but always for current.

It should be safe when you can guarantee that the other task is stopped
(as in ptrace) and you protect against yourself (only a single
external writer)

-Andi