Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758351AbYGPNpa (ORCPT ); Wed, 16 Jul 2008 09:45:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756896AbYGPNpW (ORCPT ); Wed, 16 Jul 2008 09:45:22 -0400 Received: from casper.infradead.org ([85.118.1.10]:55393 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756371AbYGPNpV (ORCPT ); Wed, 16 Jul 2008 09:45:21 -0400 Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c From: Peter Zijlstra To: Johannes Weiner Cc: Nageswara R Sastry , linux-kernel@vger.kernel.org, balbir@linux.vnet.ibm.com, ego@linux.vnet.ibm.com, svaidy@linux.vnet.ibm.com, davej@codemonkey.org.uk In-Reply-To: <877ibukn8k.fsf@saeurebad.de> References: <485F8028.1070302@linux.vnet.ibm.com> <87y74w41fp.fsf@skyscraper.fehenstaub.lan> <4860BB8E.2070505@linux.vnet.ibm.com> <87tzfh2t5l.fsf@skyscraper.fehenstaub.lan> <48638906.4090308@linux.vnet.ibm.com> <87y74l4scd.fsf@skyscraper.fehenstaub.lan> <87fxqp7nye.fsf@skyscraper.fehenstaub.lan> <4871E657.3040403@linux.vnet.ibm.com> <87d4lqq7ec.fsf@saeurebad.de> <487300B1.4040300@linux.vnet.ibm.com> <877ibukn8k.fsf@saeurebad.de> Content-Type: text/plain Date: Wed, 16 Jul 2008 15:44:56 +0200 Message-Id: <1216215897.1652.24.camel@lappy.programming.kicks-ass.net> Mime-Version: 1.0 X-Mailer: Evolution 2.22.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7964 Lines: 205 On Thu, 2008-07-10 at 13:11 +0200, Johannes Weiner wrote: > Hi, > > [added Peter on CC, lockdep confuses me] > > Nageswara R Sastry writes: > > > Hi Johannes, > > > > Johannes Weiner wrote: > > > >>> * I am seeing the circular locking dependency with the above patch > >>> too. > >> > >> Uhm. Failure or no failure? A possible dead-lock report _is_ a > >> failure. So, do you get one or not? And if so, could you send me the > >> dmesg parts? > >> > >> Thanks a lot, > >> > >> Hannes > > > > I could see a circular locking dependency and sysfs hang with the new > > patch named "cpufreq: cancel self-rearming work synchroneously" > > also. Please find the dmesg output. Please let me know if you need > > more information. Thank a lot for your coordination. > > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.25.9.cpufreq #2 > > ------------------------------------------------------- > > S06cpuspeed/3427 is trying to acquire lock: > > (&(&dbs_info->work)->work){--..}, at: [] > > __cancel_work_timer+0x80/0x177 > > > > but task is already holding lock: > > (dbs_mutex){--..}, at: [] cpufreq_governor_dbs+0x26a/0x2f7 > > Hmmm, it's weird. This path should be okay. I wonder where the > dependency work -> dbs_mutex comes from. The mutex is nowhere taken > with the work lock held (I removed this in the new version of the patch, > can you double-check you applied to correct patch?). > > So the chain should really be dbs_mutex -> work-lock. > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (dbs_mutex){--..}: > > [] add_lock_to_list+0x61/0x83 > > [] __lock_acquire+0x953/0xb05 > > [] cpufreq_governor_dbs+0x72/0x2f7 > > [] lock_acquire+0x5f/0x79 > > [] cpufreq_governor_dbs+0x72/0x2f7 > > [] mutex_lock_nested+0xce/0x222 > > [] cpufreq_governor_dbs+0x72/0x2f7 > > [] cpufreq_governor_dbs+0x72/0x2f7 > > [] cpufreq_governor_dbs+0x72/0x2f7 > > [] __cpufreq_governor+0x73/0xa6 > > [] __cpufreq_set_policy+0x13b/0x19e > > [] cpufreq_add_dev+0x3b4/0x4aa > > [] handle_update+0x0/0x21 > > [] sysdev_driver_register+0x48/0x9a > > [] cpufreq_register_driver+0x9b/0x147 > > [] kernel_init+0x130/0x26f > > [] kernel_init+0x0/0x26f > > [] kernel_init+0x0/0x26f > > [] kernel_thread_helper+0x7/0x10 > > [] 0xffffffff > > > > -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}: > > [] __lock_acquire+0x953/0xb05 > > [] lock_policy_rwsem_write+0x30/0x56 > > [] save_stack_trace+0x1a/0x35 > > [] lock_acquire+0x5f/0x79 > > [] lock_policy_rwsem_write+0x30/0x56 > > [] down_write+0x2b/0x44 > > [] lock_policy_rwsem_write+0x30/0x56 > > [] lock_policy_rwsem_write+0x30/0x56 > > [] do_dbs_timer+0x40/0x24f > > [] run_workqueue+0x81/0x187 > > [] run_workqueue+0xbc/0x187 > > [] run_workqueue+0x81/0x187 > > [] do_dbs_timer+0x0/0x24f > > [] worker_thread+0x0/0xbd > > [] worker_thread+0xb3/0xbd > > [] autoremove_wake_function+0x0/0x2d > > [] kthread+0x38/0x5d > > [] kthread+0x0/0x5d > > [] kernel_thread_helper+0x7/0x10 > > [] 0xffffffff > > > > -> #0 (&(&dbs_info->work)->work){--..}: > > [] print_circular_bug_tail+0x2a/0x61 > > [] __lock_acquire+0x878/0xb05 > > [] lock_acquire+0x5f/0x79 > > [] __cancel_work_timer+0x80/0x177 > > [] __cancel_work_timer+0xab/0x177 > > [] __cancel_work_timer+0x80/0x177 > > [] mark_held_locks+0x39/0x53 > > [] mutex_lock_nested+0x20f/0x222 > > [] trace_hardirqs_on+0xe7/0x10e > > [] mutex_lock_nested+0x21a/0x222 > > [] cpufreq_governor_dbs+0x26a/0x2f7 > > [] cpufreq_governor_dbs+0x27c/0x2f7 > > [] __cpufreq_governor+0x73/0xa6 > > [] __cpufreq_set_policy+0x129/0x19e > > [] store_scaling_governor+0x112/0x135 > > [] handle_update+0x0/0x21 > > [] atkbd_set_leds+0x5/0xcf > > [] store_scaling_governor+0x0/0x135 > > [] store+0x3c/0x54 > > [] sysfs_write_file+0xa9/0xdd > > [] sysfs_write_file+0x0/0xdd > > [] vfs_write+0x83/0xf6 > > [] sys_write+0x3c/0x63 > > [] sysenter_past_esp+0x5f/0xa5 > > [] 0xffffffff > > Uhm, this dependency is as new as the actual lockdep detection (the same > backtrace as the whole event, see below). What is lockdep doing here? > Shouldn't this be the callpath where the lock was taken for the first > time? > > I can not see where the chain is ever work-lock -> dbs_mutex, so how > does lockdep come to the conclusion this would be the correct order? Lockdep does not directly observe work-lock -> dbs_mutex, but an indirect dependency through cpu_policy_rwsem. So in do_dbs_timer() we get: work-lock cpu_policy_rwsem then from: cpufreq_add_dev() __cpufreq_set_policy() __cpufreq_governor() we obtain the dependency: cpu_policy_rwsem dbs_mutex So together we get the chain: work-lock cpu_policy_rwsem dbs_mutex And the below trace shows us trying to obtain the work-lock while holding the dbs_mutex, which is in clear violation of the above locking rules. > > other info that might help us debug this: > > > > 3 locks held by S06cpuspeed/3427: > > #0: (&buffer->mutex){--..}, at: [] sysfs_write_file+0x24/0xdd > > #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [] > > lock_policy_rwsem_write+0x30/0x56 > > #2: (dbs_mutex){--..}, at: [] cpufreq_governor_dbs+0x26a/0x2f7 > > > > stack backtrace: > > Pid: 3427, comm: S06cpuspeed Not tainted 2.6.25.9.cpufreq #2 > > [] print_circular_bug_tail+0x57/0x61 > > [] __lock_acquire+0x878/0xb05 > > [] lock_acquire+0x5f/0x79 > > [] __cancel_work_timer+0x80/0x177 > > [] __cancel_work_timer+0xab/0x177 > > [] __cancel_work_timer+0x80/0x177 > > [] mark_held_locks+0x39/0x53 > > [] mutex_lock_nested+0x20f/0x222 > > [] trace_hardirqs_on+0xe7/0x10e > > [] mutex_lock_nested+0x21a/0x222 > > [] cpufreq_governor_dbs+0x26a/0x2f7 > > [] cpufreq_governor_dbs+0x27c/0x2f7 > > [] __cpufreq_governor+0x73/0xa6 > > [] __cpufreq_set_policy+0x129/0x19e > > [] store_scaling_governor+0x112/0x135 > > [] handle_update+0x0/0x21 > > [] atkbd_set_leds+0x5/0xcf > > [] store_scaling_governor+0x0/0x135 > > [] store+0x3c/0x54 > > [] sysfs_write_file+0xa9/0xdd > > [] sysfs_write_file+0x0/0xdd > > [] vfs_write+0x83/0xf6 > > [] sys_write+0x3c/0x63 > > [] sysenter_past_esp+0x5f/0xa5 > > ======================= > > > > Thanks!! > > Regards > > R.Nageswara Sastry > > Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/