Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751462Ab0HOWWc (ORCPT ); Sun, 15 Aug 2010 18:22:32 -0400 Received: from gate.crashing.org ([63.228.1.57]:41166 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751169Ab0HOWWa (ORCPT ); Sun, 15 Aug 2010 18:22:30 -0400 Subject: Re: [PATCH] oprofile: fix crash when accessing freed task structs From: Benjamin Herrenschmidt To: Robert Richter Cc: "linux-kernel@vger.kernel.org" , "Carl E. Love" , Michael Ellerman , oprofile-list In-Reply-To: <20100813153910.GD26154@erda.amd.com> References: <1279775680.1970.13.camel@pasglop> <20100728122111.GO26154@erda.amd.com> <1280799573.1902.81.camel@pasglop> <20100813153910.GD26154@erda.amd.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 16 Aug 2010 08:22:04 +1000 Message-ID: <1281910924.2811.0.camel@pasglop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4627 Lines: 165 On Fri, 2010-08-13 at 17:39 +0200, Robert Richter wrote: > On 02.08.10 21:39:33, Benjamin Herrenschmidt wrote: > > > I can't tell that much about the workload, I don't have access to it > > either, let's say that from my point of view it's a "customer" binary > > blob. > > > > I can re-trigger it though. > > Benjamin, > > can you try the patch below? Thanks. I'll see if the folks who have a repro-case can give it a spin for me. Cheers, Ben. > Thanks, > > -Robert > > >From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001 > From: Robert Richter > Date: Fri, 13 Aug 2010 16:29:04 +0200 > Subject: [PATCH] oprofile: fix crash when accessing freed task structs > > This patch fixes a crash during shutdown reported below. The crash is > caused be accessing already freed task structs. The fix changes the > order for registering and unregistering notifier callbacks. > > All notifiers must be initialized before buffers start working. To > stop buffer synchronization we cancel all workqueues, unregister the > notifier callback and then flush all buffers. After all of this we > finally can free all tasks listed. > > This should avoid accessing freed tasks. > > On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote: > > > So the initial observation is a spinlock bad magic followed by a crash > > in the spinlock debug code: > > > > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136 > > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03 > > > > Backtrace looks like: > > > > spin_bug+0x74/0xd4 > > ._raw_spin_lock+0x48/0x184 > > ._spin_lock+0x10/0x24 > > .get_task_mm+0x28/0x8c > > .sync_buffer+0x1b4/0x598 > > .wq_sync_buffer+0xa0/0xdc > > .worker_thread+0x1d8/0x2a8 > > .kthread+0xa8/0xb4 > > .kernel_thread+0x54/0x70 > > > > So we are accessing a freed task struct in the work queue when > > processing the samples. > > Reported-by: Benjamin Herrenschmidt > Signed-off-by: Robert Richter > --- > drivers/oprofile/buffer_sync.c | 27 ++++++++++++++------------- > drivers/oprofile/cpu_buffer.c | 2 -- > 2 files changed, 14 insertions(+), 15 deletions(-) > > diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c > index a9352b2..b7e755f 100644 > --- a/drivers/oprofile/buffer_sync.c > +++ b/drivers/oprofile/buffer_sync.c > @@ -141,16 +141,6 @@ static struct notifier_block module_load_nb = { > .notifier_call = module_load_notify, > }; > > - > -static void end_sync(void) > -{ > - end_cpu_work(); > - /* make sure we don't leak task structs */ > - process_task_mortuary(); > - process_task_mortuary(); > -} > - > - > int sync_start(void) > { > int err; > @@ -158,7 +148,7 @@ int sync_start(void) > if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL)) > return -ENOMEM; > > - start_cpu_work(); > + mutex_lock(&buffer_mutex); > > err = task_handoff_register(&task_free_nb); > if (err) > @@ -173,7 +163,10 @@ int sync_start(void) > if (err) > goto out4; > > + start_cpu_work(); > + > out: > + mutex_unlock(&buffer_mutex); > return err; > out4: > profile_event_unregister(PROFILE_MUNMAP, &munmap_nb); > @@ -182,7 +175,6 @@ out3: > out2: > task_handoff_unregister(&task_free_nb); > out1: > - end_sync(); > free_cpumask_var(marked_cpus); > goto out; > } > @@ -190,11 +182,20 @@ out1: > > void sync_stop(void) > { > + /* flush buffers */ > + mutex_lock(&buffer_mutex); > + end_cpu_work(); > unregister_module_notifier(&module_load_nb); > profile_event_unregister(PROFILE_MUNMAP, &munmap_nb); > profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb); > task_handoff_unregister(&task_free_nb); > - end_sync(); > + mutex_unlock(&buffer_mutex); > + flush_scheduled_work(); > + > + /* make sure we don't leak task structs */ > + process_task_mortuary(); > + process_task_mortuary(); > + > free_cpumask_var(marked_cpus); > } > > diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c > index 219f79e..f179ac2 100644 > --- a/drivers/oprofile/cpu_buffer.c > +++ b/drivers/oprofile/cpu_buffer.c > @@ -120,8 +120,6 @@ void end_cpu_work(void) > > cancel_delayed_work(&b->work); > } > - > - flush_scheduled_work(); > } > > /* > -- > 1.7.1.1 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/