2010-07-22 06:46:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Possible Oprofile crash/race when stopping

Hi folks !

We've hit a strange crash internally, that we -think- we have tracked
down to an oprofile bug. It's hard to hit, so I can't guarantee yet that
we have fully smashed it but I'd like to share our findings in case you
guys have a better idea.

So the initial observation is a spinlock bad magic followed by a crash
in the spinlock debug code:

[ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
[ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03

Backtrace looks like:

spin_bug+0x74/0xd4
._raw_spin_lock+0x48/0x184
._spin_lock+0x10/0x24
.get_task_mm+0x28/0x8c
.sync_buffer+0x1b4/0x598
.wq_sync_buffer+0xa0/0xdc
.worker_thread+0x1d8/0x2a8
.kthread+0xa8/0xb4
.kernel_thread+0x54/0x70

So we are accessing a freed task struct in the work queue when
processing the samples.

Now that shouldn't happen since we should be handed off the dying tasks
via our task handoff notifier, and only free them after all the events
have been processed.

However I also observed using xmon that at the point of the crash, the
task_free_notifier structure had nothing attached to it.

Thus my tentative explanation:

When we stop oprofile, we call sync_stop() which I copied below:

void sync_stop(void)
{
unregister_module_notifier(&module_load_nb);
profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
task_handoff_unregister(&task_free_nb);
end_sync();
free_cpumask_var(marked_cpus);
}

You see the call to task_handoff_unregister(). This is when we take
ourselves off the list. So any task freed after that point will not be
retained by oprofile.

However, we have done nothing to clear the pending sample buffers yes.
So if we have a buffer with samples for a task, and that task gets
destroyed just after we have done the handoff, and before we have
cancelled the work queue, then we can potentially access a stale task
struct.

Now, end_sync() does that cancelling, as you see, -after- we removed the
handoff:

static void end_sync(void)
{
end_cpu_work(); <--- This is where we stop the workqueues
/* make sure we don't leak task structs */
process_task_mortuary();
process_task_mortuary();
}

I think the right sequence however requires breaking up end_sync. Ie, we
need to do in that order:

- cancel the workqueues
- unregister the notifier
- process the mortuary

What do you think ?

Cheers,
Ben.


Subject: Re: Possible Oprofile crash/race when stopping

On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> We've hit a strange crash internally, that we -think- we have tracked
> down to an oprofile bug. It's hard to hit, so I can't guarantee yet that
> we have fully smashed it but I'd like to share our findings in case you
> guys have a better idea.
>
> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:

Benjamin,

thanks for reporting this. I was trying to reproduce this with various
loads and scenarios, but without success so far. Can you give me a
hint of the load you have (number of processes running, cpu load, do
you switch off oprofile while many processes are still running)? Are
you able to regularly trigger it?

> I think the right sequence however requires breaking up end_sync. Ie, we
> need to do in that order:
>
> - cancel the workqueues
> - unregister the notifier
> - process the mortuary
>
> What do you think ?

This could potentially fix it, I will have to look deeper into the
code. Try to do this next week.

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2010-08-03 01:55:57

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Possible Oprofile crash/race when stopping


> Benjamin,
>
> thanks for reporting this. I was trying to reproduce this with various
> loads and scenarios, but without success so far. Can you give me a
> hint of the load you have (number of processes running, cpu load, do
> you switch off oprofile while many processes are still running)? Are
> you able to regularly trigger it?

Hi, sorry, I missed your reply, Michael just gave me a wake-up call.

I can't tell that much about the workload, I don't have access to it
either, let's say that from my point of view it's a "customer" binary
blob.

I can re-trigger it though.

> > I think the right sequence however requires breaking up end_sync. Ie, we
> > need to do in that order:
> >
> > - cancel the workqueues
> > - unregister the notifier
> > - process the mortuary
> >
> > What do you think ?
>
> This could potentially fix it, I will have to look deeper into the
> code. Try to do this next week.

Thanks. Let me know.

Cheers,
Ben.

Subject: [PATCH] oprofile: fix crash when accessing freed task structs

On 02.08.10 21:39:33, Benjamin Herrenschmidt wrote:

> I can't tell that much about the workload, I don't have access to it
> either, let's say that from my point of view it's a "customer" binary
> blob.
>
> I can re-trigger it though.

Benjamin,

can you try the patch below?

Thanks,

-Robert

>From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Fri, 13 Aug 2010 16:29:04 +0200
Subject: [PATCH] oprofile: fix crash when accessing freed task structs

This patch fixes a crash during shutdown reported below. The crash is
caused be accessing already freed task structs. The fix changes the
order for registering and unregistering notifier callbacks.

All notifiers must be initialized before buffers start working. To
stop buffer synchronization we cancel all workqueues, unregister the
notifier callback and then flush all buffers. After all of this we
finally can free all tasks listed.

This should avoid accessing freed tasks.

On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:

> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:
>
> [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
>
> Backtrace looks like:
>
> spin_bug+0x74/0xd4
> ._raw_spin_lock+0x48/0x184
> ._spin_lock+0x10/0x24
> .get_task_mm+0x28/0x8c
> .sync_buffer+0x1b4/0x598
> .wq_sync_buffer+0xa0/0xdc
> .worker_thread+0x1d8/0x2a8
> .kthread+0xa8/0xb4
> .kernel_thread+0x54/0x70
>
> So we are accessing a freed task struct in the work queue when
> processing the samples.

Reported-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/buffer_sync.c | 27 ++++++++++++++-------------
drivers/oprofile/cpu_buffer.c | 2 --
2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index a9352b2..b7e755f 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -141,16 +141,6 @@ static struct notifier_block module_load_nb = {
.notifier_call = module_load_notify,
};

-
-static void end_sync(void)
-{
- end_cpu_work();
- /* make sure we don't leak task structs */
- process_task_mortuary();
- process_task_mortuary();
-}
-
-
int sync_start(void)
{
int err;
@@ -158,7 +148,7 @@ int sync_start(void)
if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
return -ENOMEM;

- start_cpu_work();
+ mutex_lock(&buffer_mutex);

err = task_handoff_register(&task_free_nb);
if (err)
@@ -173,7 +163,10 @@ int sync_start(void)
if (err)
goto out4;

+ start_cpu_work();
+
out:
+ mutex_unlock(&buffer_mutex);
return err;
out4:
profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
@@ -182,7 +175,6 @@ out3:
out2:
task_handoff_unregister(&task_free_nb);
out1:
- end_sync();
free_cpumask_var(marked_cpus);
goto out;
}
@@ -190,11 +182,20 @@ out1:

void sync_stop(void)
{
+ /* flush buffers */
+ mutex_lock(&buffer_mutex);
+ end_cpu_work();
unregister_module_notifier(&module_load_nb);
profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
task_handoff_unregister(&task_free_nb);
- end_sync();
+ mutex_unlock(&buffer_mutex);
+ flush_scheduled_work();
+
+ /* make sure we don't leak task structs */
+ process_task_mortuary();
+ process_task_mortuary();
+
free_cpumask_var(marked_cpus);
}

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 219f79e..f179ac2 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -120,8 +120,6 @@ void end_cpu_work(void)

cancel_delayed_work(&b->work);
}
-
- flush_scheduled_work();
}

/*
--
1.7.1.1



--
Advanced Micro Devices, Inc.
Operating System Research Center

2010-08-15 22:22:32

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] oprofile: fix crash when accessing freed task structs

On Fri, 2010-08-13 at 17:39 +0200, Robert Richter wrote:
> On 02.08.10 21:39:33, Benjamin Herrenschmidt wrote:
>
> > I can't tell that much about the workload, I don't have access to it
> > either, let's say that from my point of view it's a "customer" binary
> > blob.
> >
> > I can re-trigger it though.
>
> Benjamin,
>
> can you try the patch below?

Thanks. I'll see if the folks who have a repro-case can give it a spin
for me.

Cheers,
Ben.

> Thanks,
>
> -Robert
>
> >From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001
> From: Robert Richter <[email protected]>
> Date: Fri, 13 Aug 2010 16:29:04 +0200
> Subject: [PATCH] oprofile: fix crash when accessing freed task structs
>
> This patch fixes a crash during shutdown reported below. The crash is
> caused be accessing already freed task structs. The fix changes the
> order for registering and unregistering notifier callbacks.
>
> All notifiers must be initialized before buffers start working. To
> stop buffer synchronization we cancel all workqueues, unregister the
> notifier callback and then flush all buffers. After all of this we
> finally can free all tasks listed.
>
> This should avoid accessing freed tasks.
>
> On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
>
> > So the initial observation is a spinlock bad magic followed by a crash
> > in the spinlock debug code:
> >
> > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
> >
> > Backtrace looks like:
> >
> > spin_bug+0x74/0xd4
> > ._raw_spin_lock+0x48/0x184
> > ._spin_lock+0x10/0x24
> > .get_task_mm+0x28/0x8c
> > .sync_buffer+0x1b4/0x598
> > .wq_sync_buffer+0xa0/0xdc
> > .worker_thread+0x1d8/0x2a8
> > .kthread+0xa8/0xb4
> > .kernel_thread+0x54/0x70
> >
> > So we are accessing a freed task struct in the work queue when
> > processing the samples.
>
> Reported-by: Benjamin Herrenschmidt <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/oprofile/buffer_sync.c | 27 ++++++++++++++-------------
> drivers/oprofile/cpu_buffer.c | 2 --
> 2 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
> index a9352b2..b7e755f 100644
> --- a/drivers/oprofile/buffer_sync.c
> +++ b/drivers/oprofile/buffer_sync.c
> @@ -141,16 +141,6 @@ static struct notifier_block module_load_nb = {
> .notifier_call = module_load_notify,
> };
>
> -
> -static void end_sync(void)
> -{
> - end_cpu_work();
> - /* make sure we don't leak task structs */
> - process_task_mortuary();
> - process_task_mortuary();
> -}
> -
> -
> int sync_start(void)
> {
> int err;
> @@ -158,7 +148,7 @@ int sync_start(void)
> if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
> return -ENOMEM;
>
> - start_cpu_work();
> + mutex_lock(&buffer_mutex);
>
> err = task_handoff_register(&task_free_nb);
> if (err)
> @@ -173,7 +163,10 @@ int sync_start(void)
> if (err)
> goto out4;
>
> + start_cpu_work();
> +
> out:
> + mutex_unlock(&buffer_mutex);
> return err;
> out4:
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> @@ -182,7 +175,6 @@ out3:
> out2:
> task_handoff_unregister(&task_free_nb);
> out1:
> - end_sync();
> free_cpumask_var(marked_cpus);
> goto out;
> }
> @@ -190,11 +182,20 @@ out1:
>
> void sync_stop(void)
> {
> + /* flush buffers */
> + mutex_lock(&buffer_mutex);
> + end_cpu_work();
> unregister_module_notifier(&module_load_nb);
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
> task_handoff_unregister(&task_free_nb);
> - end_sync();
> + mutex_unlock(&buffer_mutex);
> + flush_scheduled_work();
> +
> + /* make sure we don't leak task structs */
> + process_task_mortuary();
> + process_task_mortuary();
> +
> free_cpumask_var(marked_cpus);
> }
>
> diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
> index 219f79e..f179ac2 100644
> --- a/drivers/oprofile/cpu_buffer.c
> +++ b/drivers/oprofile/cpu_buffer.c
> @@ -120,8 +120,6 @@ void end_cpu_work(void)
>
> cancel_delayed_work(&b->work);
> }
> -
> - flush_scheduled_work();
> }
>
> /*
> --
> 1.7.1.1
>
>
>

Subject: Re: [PATCH] oprofile: fix crash when accessing freed task structs

On 15.08.10 18:22:04, Benjamin Herrenschmidt wrote:
> > >From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001
> > From: Robert Richter <[email protected]>
> > Date: Fri, 13 Aug 2010 16:29:04 +0200
> > Subject: [PATCH] oprofile: fix crash when accessing freed task structs
> >
> > This patch fixes a crash during shutdown reported below. The crash is
> > caused be accessing already freed task structs. The fix changes the
> > order for registering and unregistering notifier callbacks.
> >
> > All notifiers must be initialized before buffers start working. To
> > stop buffer synchronization we cancel all workqueues, unregister the
> > notifier callback and then flush all buffers. After all of this we
> > finally can free all tasks listed.
> >
> > This should avoid accessing freed tasks.
> >
> > On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> >
> > > So the initial observation is a spinlock bad magic followed by a crash
> > > in the spinlock debug code:
> > >
> > > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> > > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
> > >
> > > Backtrace looks like:
> > >
> > > spin_bug+0x74/0xd4
> > > ._raw_spin_lock+0x48/0x184
> > > ._spin_lock+0x10/0x24
> > > .get_task_mm+0x28/0x8c
> > > .sync_buffer+0x1b4/0x598
> > > .wq_sync_buffer+0xa0/0xdc
> > > .worker_thread+0x1d8/0x2a8
> > > .kthread+0xa8/0xb4
> > > .kernel_thread+0x54/0x70
> > >
> > > So we are accessing a freed task struct in the work queue when
> > > processing the samples.
> >
> > Reported-by: Benjamin Herrenschmidt <[email protected]>
> > Signed-off-by: Robert Richter <[email protected]>

I have applied the patch to

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git urgent

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center