Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753896Ab3GOCnQ (ORCPT ); Sun, 14 Jul 2013 22:43:16 -0400 Received: from e23smtp08.au.ibm.com ([202.81.31.141]:44904 "EHLO e23smtp08.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753846Ab3GOCnO (ORCPT ); Sun, 14 Jul 2013 22:43:14 -0400 Message-ID: <51E3619B.6060901@linux.vnet.ibm.com> Date: Mon, 15 Jul 2013 10:42:35 +0800 From: Michael Wang User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Sergey Senozhatsky CC: Jiri Kosina , Borislav Petkov , "Rafael J. Wysocki" , Viresh Kumar , "Srivatsa S. Bhat" , linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: [LOCKDEP] cpufreq: possible circular locking dependency detected References: <20130625211544.GA2270@swordfish> <51D10899.1080501@linux.vnet.ibm.com> <20130710231305.GA4046@swordfish> <51DE1BE1.3090707@linux.vnet.ibm.com> <20130714114721.GB2178@swordfish> In-Reply-To: <20130714114721.GB2178@swordfish> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13071502-5140-0000-0000-000003847BA7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9153 Lines: 253 On 07/14/2013 07:47 PM, Sergey Senozhatsky wrote: [snip] > > Hello, > > I just realized that lockdep was disabling itself at startup (after recent AMD > radeon patch set) due to radeon kms error: > > [ 4.790019] [drm] Loading CEDAR Microcode > [ 4.790943] r600_cp: Failed to load firmware "radeon/CEDAR_smc.bin" > [ 4.791152] [drm:evergreen_startup] *ERROR* Failed to load firmware! > [ 4.791330] radeon 0000:01:00.0: disabling GPU acceleration > [ 4.792633] INFO: trying to register non-static key. > [ 4.792792] the code is fine but needs lockdep annotation. > [ 4.792953] turning off the locking correctness validator. > > > Now, as I fixed radeon kms, I can see: > > [ 806.660530] ------------[ cut here ]------------ > [ 806.660539] WARNING: CPU: 0 PID: 2389 at arch/x86/kernel/smp.c:124 > native_smp_send_reschedule+0x57/0x60() > > [ 806.660572] Workqueue: events od_dbs_timer > [ 806.660575] 0000000000000009 ffff8801531cfbd8 ffffffff816044ee > 0000000000000000 > [ 806.660577] ffff8801531cfc10 ffffffff8104995d 0000000000000003 > ffff8801531f8000 > [ 806.660579] 000000010001ee39 0000000000000003 0000000000000003 > ffff8801531cfc20 > [ 806.660580] Call Trace: > [ 806.660587] [] dump_stack+0x4e/0x82 > [ 806.660591] [] warn_slowpath_common+0x7d/0xa0 > [ 806.660593] [] warn_slowpath_null+0x1a/0x20 > [ 806.660595] [] native_smp_send_reschedule+0x57/0x60 > [ 806.660599] [] wake_up_nohz_cpu+0x61/0xb0 > [ 806.660603] [] add_timer_on+0x8d/0x1e0 > [ 806.660607] [] __queue_delayed_work+0x166/0x1a0 > [ 806.660609] [] ? try_to_grab_pending+0xd9/0x1a0 > [ 806.660611] [] mod_delayed_work_on+0x4f/0x90 > [ 806.660613] [] gov_queue_work+0x56/0xd0 > [ 806.660615] [] od_dbs_timer+0xc0/0x160 > [ 806.660617] [] process_one_work+0x1cd/0x6a0 > [ 806.660619] [] ? process_one_work+0x163/0x6a0 > [ 806.660622] [] worker_thread+0x121/0x3a0 > [ 806.660627] [] ? trace_hardirqs_on+0xd/0x10 > [ 806.660629] [] ? manage_workers.isra.24+0x2a0/0x2a0 > [ 806.660633] [] kthread+0xdb/0xe0 > [ 806.660635] [] ? insert_kthread_work+0x70/0x70 > [ 806.660639] [] ret_from_fork+0x7c/0xb0 > [ 806.660640] [] ? insert_kthread_work+0x70/0x70 > [ 806.660642] ---[ end trace 01ae278488a0ad6d ]--- So it back again... Currently I have some assumptions in my mind: 1. we still failed to stop od_dbs_timer after STOP notify. 2. there is some code else which restart the work after STOP notify. 3. policy->cpus is broken. I think we need a more detail investigation this time, let's catch the mouse out ;-) Regards, Michael Wang > > > The same problem why get/put_online_cpus() has been added to __gov_queue_work() > > commit 2f7021a815f20f3481c10884fe9735ce2a56db35 > Author: Michael Wang > > cpufreq: protect 'policy->cpus' from offlining during > __gov_queue_work() > > -ss > >> Regards, >> Michael Wang >> >> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c >> index dc9b72e..a64b544 100644 >> --- a/drivers/cpufreq/cpufreq_governor.c >> +++ b/drivers/cpufreq/cpufreq_governor.c >> @@ -178,13 +178,14 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, >> { >> int i; >> >> + if (dbs_data->queue_stop) >> + return; >> + >> if (!all_cpus) { >> __gov_queue_work(smp_processor_id(), dbs_data, delay); >> } else { >> - get_online_cpus(); >> for_each_cpu(i, policy->cpus) >> __gov_queue_work(i, dbs_data, delay); >> - put_online_cpus(); >> } >> } >> EXPORT_SYMBOL_GPL(gov_queue_work); >> @@ -193,12 +194,27 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, >> struct cpufreq_policy *policy) >> { >> struct cpu_dbs_common_info *cdbs; >> - int i; >> + int i, round = 2; >> >> + dbs_data->queue_stop = 1; >> +redo: >> + round--; >> for_each_cpu(i, policy->cpus) { >> cdbs = dbs_data->cdata->get_cpu_cdbs(i); >> cancel_delayed_work_sync(&cdbs->work); >> } >> + >> + /* >> + * Since there is no lock to prvent re-queue the >> + * cancelled work, some early cancelled work might >> + * have been queued again by later cancelled work. >> + * >> + * Flush the work again with dbs_data->queue_stop >> + * enabled, this time there will be no survivors. >> + */ >> + if (round) >> + goto redo; >> + dbs_data->queue_stop = 0; >> } >> >> /* Will return if we need to evaluate cpu load again or not */ >> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h >> index e16a961..9116135 100644 >> --- a/drivers/cpufreq/cpufreq_governor.h >> +++ b/drivers/cpufreq/cpufreq_governor.h >> @@ -213,6 +213,7 @@ struct dbs_data { >> unsigned int min_sampling_rate; >> int usage_count; >> void *tuners; >> + int queue_stop; >> >> /* dbs_mutex protects dbs_enable in governor start/stop */ >> struct mutex mutex; >> >>> >>> Signed-off-by: Sergey Senozhatsky >>> >>> --- >>> >>> drivers/cpufreq/cpufreq.c | 5 +---- >>> drivers/cpufreq/cpufreq_governor.c | 17 +++++++++++------ >>> drivers/cpufreq/cpufreq_stats.c | 2 +- >>> 3 files changed, 13 insertions(+), 11 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index 6a015ad..f8aacf1 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -1943,13 +1943,10 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb, >>> case CPU_ONLINE: >>> cpufreq_add_dev(dev, NULL); >>> break; >>> - case CPU_DOWN_PREPARE: >>> + case CPU_POST_DEAD: >>> case CPU_UP_CANCELED_FROZEN: >>> __cpufreq_remove_dev(dev, NULL); >>> break; >>> - case CPU_DOWN_FAILED: >>> - cpufreq_add_dev(dev, NULL); >>> - break; >>> } >>> } >>> return NOTIFY_OK; >>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c >>> index 4645876..681d5d6 100644 >>> --- a/drivers/cpufreq/cpufreq_governor.c >>> +++ b/drivers/cpufreq/cpufreq_governor.c >>> @@ -125,7 +125,11 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, >>> unsigned int delay) >>> { >>> struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); >>> - >>> + /* cpu offline might block existing gov_queue_work() user, >>> + * unblocking it after CPU_DEAD and before CPU_POST_DEAD. >>> + * thus potentially we can hit offlined CPU */ >>> + if (unlikely(cpu_is_offline(cpu))) >>> + return; >>> mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay); >>> } >>> >>> @@ -133,15 +137,14 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, >>> unsigned int delay, bool all_cpus) >>> { >>> int i; >>> - >>> + get_online_cpus(); >>> if (!all_cpus) { >>> __gov_queue_work(smp_processor_id(), dbs_data, delay); >>> } else { >>> - get_online_cpus(); >>> for_each_cpu(i, policy->cpus) >>> __gov_queue_work(i, dbs_data, delay); >>> - put_online_cpus(); >>> } >>> + put_online_cpus(); >>> } >>> EXPORT_SYMBOL_GPL(gov_queue_work); >>> >>> @@ -354,8 +357,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, >>> /* Initiate timer time stamp */ >>> cpu_cdbs->time_stamp = ktime_get(); >>> >>> - gov_queue_work(dbs_data, policy, >>> - delay_for_sampling_rate(sampling_rate), true); >>> + /* hotplug lock already held */ >>> + for_each_cpu(j, policy->cpus) >>> + __gov_queue_work(j, dbs_data, >>> + delay_for_sampling_rate(sampling_rate)); >>> break; >>> >>> case CPUFREQ_GOV_STOP: >>> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c >>> index cd9e817..833816e 100644 >>> --- a/drivers/cpufreq/cpufreq_stats.c >>> +++ b/drivers/cpufreq/cpufreq_stats.c >>> @@ -355,7 +355,7 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb, >>> case CPU_DOWN_PREPARE: >>> cpufreq_stats_free_sysfs(cpu); >>> break; >>> - case CPU_DEAD: >>> + case CPU_POST_DEAD: >>> cpufreq_stats_free_table(cpu); >>> break; >>> case CPU_UP_CANCELED_FROZEN: >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/