Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752039AbaABMcF (ORCPT ); Thu, 2 Jan 2014 07:32:05 -0500 Received: from canardo.mork.no ([148.122.252.1]:43286 "EHLO canardo.mork.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750782AbaABMcB convert rfc822-to-8bit (ORCPT ); Thu, 2 Jan 2014 07:32:01 -0500 From: =?utf-8?Q?Bj=C3=B8rn_Mork?= To: "Rafael J. Wysocki" Cc: Viresh Kumar , linaro-kernel@lists.linaro.org, patches@linaro.org, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] cpufreq: try to resume policies which failed on last resume Organization: m References: <5562479.pVWRuDL0y6@vostro.rjw.lan> Date: Thu, 02 Jan 2014 13:15:10 +0100 In-Reply-To: <5562479.pVWRuDL0y6@vostro.rjw.lan> (Rafael J. Wysocki's message of "Thu, 26 Dec 2013 02:05:50 +0100") Message-ID: <87zjne7f75.fsf@nemi.mork.no> User-Agent: Gnus/5.11002 (No Gnus v0.20) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8425 Lines: 181 "Rafael J. Wysocki" writes: > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote: >> __cpufreq_add_dev() can fail sometimes while we are resuming our system. >> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that >> could make userspace unstable. But if we suspend/resume again, we should atleast >> try to bring back those policies. >> >> This patch fixes this issue by clearing fallback data on failure and trying to >> allocate a new struct cpufreq_policy on second resume. >> >> Reported-and-tested-by: Bjørn Mork >> Signed-off-by: Viresh Kumar > > Well, while I appreciate the work done here, I don't like the changelog, > I don't really like the way the code is written and I don't like the comments. > Sorry about that. > > Bjorn, can you please test the patch below instead along with the [2/2] > from this series (on top of linux-pm.git/pm-cpufreq)? I tested this series with your modified 1/2 today, but on top of a current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq. Shouldn't make much difference since Linus already has pulled your 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch. This series fixes the reported bug for me. But I observed this, most likely unrelated, splat during the test: ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282) ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140) ====================================================== [ INFO: possible circular locking dependency detected ] 3.13.0-rc6+ #181 Not tainted ------------------------------------------------------- s2disk/5651 is trying to acquire lock: (s_active#170){++++.+}, at: [] sysfs_addrm_finish+0x28/0x46 but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x28/0x50 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug.lock){+.+.+.}: [] lock_acquire+0xfb/0x144 [] mutex_lock_nested+0x6c/0x397 [] get_online_cpus+0x3c/0x50 [] store+0x20/0xad [] sysfs_write_file+0x138/0x18b [] vfs_write+0x9c/0x102 [] SyS_write+0x50/0x85 [] system_call_fastpath+0x16/0x1b -> #0 (s_active#170){++++.+}: [] __lock_acquire+0xae3/0xe68 [] lock_acquire+0xfb/0x144 [] sysfs_deactivate+0xa5/0x108 [] sysfs_addrm_finish+0x28/0x46 [] sysfs_remove+0x2a/0x31 [] sysfs_remove_dir+0x66/0x6b [] kobject_del+0x18/0x42 [] kobject_cleanup+0xe1/0x14f [] kobject_put+0x45/0x49 [] cpufreq_policy_put_kobj+0x37/0x83 [] __cpufreq_add_dev.isra.18+0x75e/0x78c [] cpufreq_cpu_callback+0x53/0x88 [] notifier_call_chain+0x67/0x92 [] __raw_notifier_call_chain+0x9/0xb [] __cpu_notify+0x1b/0x32 [] cpu_notify+0xe/0x10 [] _cpu_up+0xf1/0x124 [] enable_nonboot_cpus+0x52/0xbf [] hibernation_snapshot+0x1be/0x2ed [] snapshot_ioctl+0x1e5/0x431 [] do_vfs_ioctl+0x45e/0x4a8 [] SyS_ioctl+0x52/0x82 [] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(s_active#170); lock(cpu_hotplug.lock); lock(s_active#170); *** DEADLOCK *** 7 locks held by s2disk/5651: #0: (pm_mutex){+.+.+.}, at: [] snapshot_ioctl+0x4b/0x431 #1: (device_hotplug_lock){+.+.+.}, at: [] lock_device_hotplug+0x12/0x14 #2: (acpi_scan_lock){+.+.+.}, at: [] acpi_scan_lock_acquire+0x12/0x14 #3: (console_lock){+.+.+.}, at: [] suspend_console+0x20/0x38 #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_maps_update_begin+0x12/0x14 #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x28/0x50 #6: (cpufreq_rwsem){.+.+.+}, at: [] __cpufreq_add_dev.isra.18+0x7f/0x78c stack backtrace: CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549 ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928 ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006 Call Trace: [] dump_stack+0x4e/0x68 [] print_circular_bug+0x1f8/0x209 [] __lock_acquire+0xae3/0xe68 [] ? debug_check_no_locks_freed+0x12c/0x143 [] lock_acquire+0xfb/0x144 [] ? sysfs_addrm_finish+0x28/0x46 [] ? lockdep_init_map+0x14e/0x160 [] sysfs_deactivate+0xa5/0x108 [] ? sysfs_addrm_finish+0x28/0x46 [] sysfs_addrm_finish+0x28/0x46 [] sysfs_remove+0x2a/0x31 [] sysfs_remove_dir+0x66/0x6b [] kobject_del+0x18/0x42 [] kobject_cleanup+0xe1/0x14f [] kobject_put+0x45/0x49 [] cpufreq_policy_put_kobj+0x37/0x83 [] __cpufreq_add_dev.isra.18+0x75e/0x78c [] ? __lock_is_held+0x32/0x54 [] cpufreq_cpu_callback+0x53/0x88 [] notifier_call_chain+0x67/0x92 [] __raw_notifier_call_chain+0x9/0xb [] __cpu_notify+0x1b/0x32 [] cpu_notify+0xe/0x10 [] _cpu_up+0xf1/0x124 [] enable_nonboot_cpus+0x52/0xbf [] hibernation_snapshot+0x1be/0x2ed [] snapshot_ioctl+0x1e5/0x431 [] do_vfs_ioctl+0x45e/0x4a8 [] ? fcheck_files+0xa1/0xe4 [] ? fget_light+0x33/0x9a [] SyS_ioctl+0x52/0x82 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b CPU1 is up ACPI: Waking up from system sleep state S4 PM: noirq thaw of devices complete after 2.727 msecs PM: early thaw of devices complete after 1.137 msecs This warning appeared when I tried cancelling hibernation, which is known to fail when using the acpi-cpufreq driver. The point of the test was to verify that such failures still produce the expected result: - all stale sysfs files are cleaned up and removed - later suspend/resume actions will restore (or actually reinitiate) the cpufreq driver for the non-boot CPUs Which was successful, except that it produced the above lockdep warning. I have not verified that this is a new warning (which means that it most likely is not). And I expect the whole acpi-cpufreq problem, including that warning, to go away when you eventually push http://www.spinics.net/lists/cpufreq/msg08714.html with your improved changelog (thanks for doing that BTW). So I don't worry too much about it. Just wanted to let you know. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/