Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756846Ab3HZMbc (ORCPT ); Mon, 26 Aug 2013 08:31:32 -0400 Received: from hydra.sisk.pl ([212.160.235.94]:34832 "EHLO hydra.sisk.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751626Ab3HZMba (ORCPT ); Mon, 26 Aug 2013 08:31:30 -0400 From: "Rafael J. Wysocki" To: Gu Zheng Cc: ACPI Devel Maling List , Greg Kroah-Hartman , Toshi Kani , LKML , Yasuaki Ishimatsu Subject: Re: [PATCH] driver core / ACPI: Avoid device removal locking problems Date: Mon, 26 Aug 2013 14:42:09 +0200 Message-ID: <4785337.sqQqDSPcpL@vostro.rjw.lan> User-Agent: KMail/4.10.5 (Linux/3.11.0-rc6+; KDE/4.10.5; x86_64; ; ) In-Reply-To: <521AC7C9.2000408@cn.fujitsu.com> References: <1543475.L7gSB7lLAu@vostro.rjw.lan> <521AC7C9.2000408@cn.fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="utf-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 29250 Lines: 228 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote: > Hi Rafael, Hi, > On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki > > > > There are two mutexes, device_hotplug_lock and acpi_scan_lock, held > > around the acpi_bus_trim() call in acpi_scan_hot_remove() which > > generally removes devices (it removes ACPI device objects at least, > > but it may also remove "physical" device objects through .detach() > > callbacks of ACPI scan handlers). Thus, potentially, device sysfs > > attributes are removed under these locks and to remove those > > attributes it is necessary to hold the s_active references of their > > directory entries for writing. > > > > On the other hand, the execution of a .show() or .store() callback > > from a sysfs attribute is carried out with that attribute's s_active > > reference held for reading. Consequently, if any device sysfs > > attribute that may be removed from within acpi_scan_hot_remove() > > through acpi_bus_trim() has a .store() or .show() callback which > > acquires either acpi_scan_lock or device_hotplug_lock, the execution > > of that callback may deadlock with the removal of the attribute. > > [Unfortunately, the "online" device attribute of CPUs and memory > > blocks and the "eject" attribute of ACPI device objects are affected > > by this issue.] > > > > To avoid those deadlocks introduce a new protection mechanism that > > can be used by the device sysfs attributes in question. Namely, > > if a device sysfs attribute's .store() or .show() callback routine > > is about to acquire device_hotplug_lock or acpi_scan_lock, it can > > first execute read_lock_device_remove() and return an error code if > > that function returns false. If true is returned, the lock in > > question may be acquired and read_unlock_device_remove() must be > > called. [This mechanism is implemented by means of an additional > > rwsem in drivers/base/core.c.] > > > > Make the affected sysfs attributes in the driver core and ACPI core > > use read_lock_device_remove() and read_unlock_device_remove() as > > described above. > > > > Signed-off-by: Rafael J. Wysocki > > Reported-by: Gu Zheng > > I'm sorry to forget to mention that the original reporter is > Yasuaki Ishimatsu . I continued > the investigation and found more issues. > > We tested this patch on kernel 3.11-rc6, but it seems that the > issue is still there. Detail info as following. Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway, because we'll need to take acpi_scan_lock during system suspend for PCI hot remove to work and that's under pm_mutex. So I wonder if we can simply drop the system sleep locking from lock/unlock_memory_hotplug(). But that's a side note, because dropping it won't help here. Now -> > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF > ------------------------------------------------------- > kworker/0:2/754 is trying to acquire lock: > (s_active#73){++++.+}, at: [] sysfs_addrm_finish+0x3b/0x70 > > but task is already holding lock: > (mem_sysfs_mutex){+.+.+.}, at: [] remove_memory_block+0x1d/0xa0 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #4 (mem_sysfs_mutex){+.+.+.}: > [] validate_chain+0x70c/0x870 > [] __lock_acquire+0x36f/0x5f0 > [] lock_acquire+0xa0/0x130 > [] mutex_lock_nested+0x7b/0x3b0 > [] add_memory_section+0x51/0x150 > [] register_new_memory+0x1b/0x20 > [] __add_pages+0x111/0x120 > [] arch_add_memory+0x64/0xf0 > [] add_memory+0xd7/0x1e0 > [] acpi_memory_enable_device+0x77/0x12b > [] acpi_memory_device_add+0x18f/0x198 > [] acpi_bus_device_attach+0x9a/0x100 > [] acpi_ns_walk_namespace+0xbe/0x17d > [] acpi_walk_namespace+0x8a/0xc4 > [] acpi_bus_scan+0x91/0x9c > [] acpi_scan_bus_device_check+0x7e/0x10f > [] acpi_scan_device_check+0x13/0x15 > [] acpi_os_execute_deferred+0x27/0x34 > [] process_one_work+0x1e8/0x560 > [] worker_thread+0x120/0x3a0 > [] kthread+0xee/0x100 > [] ret_from_fork+0x7c/0xb0 > > -> #3 (pm_mutex){+.+.+.}: > [] validate_chain+0x70c/0x870 > [] __lock_acquire+0x36f/0x5f0 > [] lock_acquire+0xa0/0x130 > [] mutex_lock_nested+0x7b/0x3b0 > [] lock_memory_hotplug+0x35/0x40 > [] add_memory+0x2f/0x1e0 > [] acpi_memory_enable_device+0x77/0x12b > [] acpi_memory_device_add+0x18f/0x198 > [] acpi_bus_device_attach+0x9a/0x100 > [] acpi_ns_walk_namespace+0xbe/0x17d > [] acpi_walk_namespace+0x8a/0xc4 > [] acpi_bus_scan+0x91/0x9c > [] acpi_scan_init+0x66/0x15a > [] acpi_init+0xa1/0xbb > [] do_one_initcall+0xf2/0x1a0 > [] do_basic_setup+0x9d/0xbb > [] kernel_init_freeable+0x20a/0x28a > [] kernel_init+0xe/0xf0 > [] ret_from_fork+0x7c/0xb0 > > -> #2 (mem_hotplug_mutex){+.+.+.}: > [] validate_chain+0x70c/0x870 > [] __lock_acquire+0x36f/0x5f0 > [] lock_acquire+0xa0/0x130 > [] mutex_lock_nested+0x7b/0x3b0 > [] lock_memory_hotplug+0x17/0x40 > [] add_memory+0x2f/0x1e0 > [] acpi_memory_enable_device+0x77/0x12b > [] acpi_memory_device_add+0x18f/0x198 > [] acpi_bus_device_attach+0x9a/0x100 > [] acpi_ns_walk_namespace+0xbe/0x17d > [] acpi_walk_namespace+0x8a/0xc4 > [] acpi_bus_scan+0x91/0x9c > [] acpi_scan_bus_device_check+0x7e/0x10f > [] acpi_scan_device_check+0x13/0x15 > [] acpi_os_execute_deferred+0x27/0x34 > [] process_one_work+0x1e8/0x560 > [] worker_thread+0x120/0x3a0 > [] kthread+0xee/0x100 > [] ret_from_fork+0x7c/0xb0 > > -> #1 (device_hotplug_lock){+.+.+.}: > [] validate_chain+0x70c/0x870 > [] __lock_acquire+0x36f/0x5f0 > [] lock_acquire+0xa0/0x130 > [] mutex_lock_nested+0x7b/0x3b0 > [] lock_device_hotplug+0x17/0x20 -> we should have grabbed device_remove_rwsem for reading here with the patch applied, which means that --> > [] show_online+0x33/0x80 > [] dev_attr_show+0x27/0x50 > [] fill_read_buffer+0x74/0x100 > [] sysfs_read_file+0xac/0xe0 > [] vfs_read+0xb1/0x130 > [] SyS_read+0x62/0xb0 > [] system_call_fastpath+0x16/0x1b > > -> #0 (s_active#73){++++.+}: > [] check_prev_add+0x598/0x5d0 > [] validate_chain+0x70c/0x870 > [] __lock_acquire+0x36f/0x5f0 > [] lock_acquire+0xa0/0x130 > [] sysfs_deactivate+0x157/0x1c0 > [] sysfs_addrm_finish+0x3b/0x70 > [] sysfs_hash_and_remove+0x60/0xb0 > [] sysfs_remove_file+0x39/0x50 > [] device_remove_file+0x17/0x20 > [] device_remove_attrs+0x2a/0xe0 > [] device_del+0x12b/0x1f0 > [] device_unregister+0x22/0x60 > [] remove_memory_block+0x6d/0xa0 > [] unregister_memory_section+0x1f/0x30 > [] __remove_pages+0xc8/0x150 > [] arch_remove_memory+0x94/0xd0 > [] remove_memory+0x6f/0x90 > [] acpi_memory_remove_memory+0x7e/0xa3 > [] acpi_memory_device_remove+0x27/0x33 > [] acpi_bus_device_detach+0x3d/0x5e > [] acpi_ns_walk_namespace+0xd6/0x17d > [] acpi_walk_namespace+0x8a/0xc4 > [] acpi_bus_trim+0x33/0x7a > [] acpi_scan_hot_remove+0x160/0x281 --> device_hotplug_lock is already held by show_online() at this point (or if it is not held, that function will return -EBUSY after attempting to grab device_remove_rwsem for reading), so acpi_scan_hot_remove() will wait for it to be released before calling acpi_bus_trim(). So the situation in which one thread holds s_active for reading and blocks on device_hotplug_lock while another thread is holding it over device removal is clearly impossible to me. So I'm not sure how device_hotplug_lock is still involved? > [] acpi_bus_hot_remove_device+0x37/0x73 > [] acpi_os_execute_deferred+0x27/0x34 > [] process_one_work+0x1e8/0x560 > [] worker_thread+0x120/0x3a0 > [] kthread+0xee/0x100 > [] ret_from_fork+0x7c/0xb0 > > other info that might help us debug this: > > Chain exists of: > s_active#73 --> pm_mutex --> mem_sysfs_mutex > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(mem_sysfs_mutex); > lock(pm_mutex); > lock(mem_sysfs_mutex); > lock(s_active#73); > > *** DEADLOCK *** Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/