Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751976Ab3H0CIO (ORCPT ); Mon, 26 Aug 2013 22:08:14 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:24391 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751391Ab3H0CIM (ORCPT ); Mon, 26 Aug 2013 22:08:12 -0400 X-IronPort-AV: E=Sophos;i="4.89,964,1367942400"; d="scan'208";a="8311642" Message-ID: <521C0907.6000500@cn.fujitsu.com> Date: Tue, 27 Aug 2013 10:03:51 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: "Rafael J. Wysocki" CC: ACPI Devel Maling List , Greg Kroah-Hartman , Toshi Kani , LKML , Yasuaki Ishimatsu Subject: Re: [PATCH] driver core / ACPI: Avoid device removal locking problems References: <1543475.L7gSB7lLAu@vostro.rjw.lan> <521AC7C9.2000408@cn.fujitsu.com> <4785337.sqQqDSPcpL@vostro.rjw.lan> <3389643.Fdir3g6c6B@vostro.rjw.lan> In-Reply-To: <3389643.Fdir3g6c6B@vostro.rjw.lan> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/08/27 10:06:13, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/08/27 10:06:13, Serialize complete at 2013/08/27 10:06:13 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 32639 Lines: 339 Hi Rafael, On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote: > On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote: >> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote: >>> Hi Rafael, >> >> Hi, >> >>> On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote: >>> >>>> From: Rafael J. Wysocki >>>> >>>> There are two mutexes, device_hotplug_lock and acpi_scan_lock, held >>>> around the acpi_bus_trim() call in acpi_scan_hot_remove() which >>>> generally removes devices (it removes ACPI device objects at least, >>>> but it may also remove "physical" device objects through .detach() >>>> callbacks of ACPI scan handlers). Thus, potentially, device sysfs >>>> attributes are removed under these locks and to remove those >>>> attributes it is necessary to hold the s_active references of their >>>> directory entries for writing. >>>> >>>> On the other hand, the execution of a .show() or .store() callback >>>> from a sysfs attribute is carried out with that attribute's s_active >>>> reference held for reading. Consequently, if any device sysfs >>>> attribute that may be removed from within acpi_scan_hot_remove() >>>> through acpi_bus_trim() has a .store() or .show() callback which >>>> acquires either acpi_scan_lock or device_hotplug_lock, the execution >>>> of that callback may deadlock with the removal of the attribute. >>>> [Unfortunately, the "online" device attribute of CPUs and memory >>>> blocks and the "eject" attribute of ACPI device objects are affected >>>> by this issue.] >>>> >>>> To avoid those deadlocks introduce a new protection mechanism that >>>> can be used by the device sysfs attributes in question. Namely, >>>> if a device sysfs attribute's .store() or .show() callback routine >>>> is about to acquire device_hotplug_lock or acpi_scan_lock, it can >>>> first execute read_lock_device_remove() and return an error code if >>>> that function returns false. If true is returned, the lock in >>>> question may be acquired and read_unlock_device_remove() must be >>>> called. [This mechanism is implemented by means of an additional >>>> rwsem in drivers/base/core.c.] >>>> >>>> Make the affected sysfs attributes in the driver core and ACPI core >>>> use read_lock_device_remove() and read_unlock_device_remove() as >>>> described above. >>>> >>>> Signed-off-by: Rafael J. Wysocki >>>> Reported-by: Gu Zheng >>> >>> I'm sorry to forget to mention that the original reporter is >>> Yasuaki Ishimatsu . I continued >>> the investigation and found more issues. >>> >>> We tested this patch on kernel 3.11-rc6, but it seems that the >>> issue is still there. Detail info as following. >> >> Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway, >> because we'll need to take acpi_scan_lock during system suspend for PCI hot >> remove to work and that's under pm_mutex. So I wonder if we can simply >> drop the system sleep locking from lock/unlock_memory_hotplug(). But that's >> a side note, because dropping it won't help here. >> >> Now -> >> >>> ====================================================== >>> [ INFO: possible circular locking dependency detected ] >>> 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF >>> ------------------------------------------------------- >>> kworker/0:2/754 is trying to acquire lock: >>> (s_active#73){++++.+}, at: [] sysfs_addrm_finish+0x3b/0x70 >>> >>> but task is already holding lock: >>> (mem_sysfs_mutex){+.+.+.}, at: [] remove_memory_block+0x1d/0xa0 >>> >>> which lock already depends on the new lock. >>> >>> >>> the existing dependency chain (in reverse order) is: >>> >>> -> #4 (mem_sysfs_mutex){+.+.+.}: >>> [] validate_chain+0x70c/0x870 >>> [] __lock_acquire+0x36f/0x5f0 >>> [] lock_acquire+0xa0/0x130 >>> [] mutex_lock_nested+0x7b/0x3b0 >>> [] add_memory_section+0x51/0x150 >>> [] register_new_memory+0x1b/0x20 >>> [] __add_pages+0x111/0x120 >>> [] arch_add_memory+0x64/0xf0 >>> [] add_memory+0xd7/0x1e0 >>> [] acpi_memory_enable_device+0x77/0x12b >>> [] acpi_memory_device_add+0x18f/0x198 >>> [] acpi_bus_device_attach+0x9a/0x100 >>> [] acpi_ns_walk_namespace+0xbe/0x17d >>> [] acpi_walk_namespace+0x8a/0xc4 >>> [] acpi_bus_scan+0x91/0x9c >>> [] acpi_scan_bus_device_check+0x7e/0x10f >>> [] acpi_scan_device_check+0x13/0x15 >>> [] acpi_os_execute_deferred+0x27/0x34 >>> [] process_one_work+0x1e8/0x560 >>> [] worker_thread+0x120/0x3a0 >>> [] kthread+0xee/0x100 >>> [] ret_from_fork+0x7c/0xb0 >>> >>> -> #3 (pm_mutex){+.+.+.}: >>> [] validate_chain+0x70c/0x870 >>> [] __lock_acquire+0x36f/0x5f0 >>> [] lock_acquire+0xa0/0x130 >>> [] mutex_lock_nested+0x7b/0x3b0 >>> [] lock_memory_hotplug+0x35/0x40 >>> [] add_memory+0x2f/0x1e0 >>> [] acpi_memory_enable_device+0x77/0x12b >>> [] acpi_memory_device_add+0x18f/0x198 >>> [] acpi_bus_device_attach+0x9a/0x100 >>> [] acpi_ns_walk_namespace+0xbe/0x17d >>> [] acpi_walk_namespace+0x8a/0xc4 >>> [] acpi_bus_scan+0x91/0x9c >>> [] acpi_scan_init+0x66/0x15a >>> [] acpi_init+0xa1/0xbb >>> [] do_one_initcall+0xf2/0x1a0 >>> [] do_basic_setup+0x9d/0xbb >>> [] kernel_init_freeable+0x20a/0x28a >>> [] kernel_init+0xe/0xf0 >>> [] ret_from_fork+0x7c/0xb0 >>> >>> -> #2 (mem_hotplug_mutex){+.+.+.}: >>> [] validate_chain+0x70c/0x870 >>> [] __lock_acquire+0x36f/0x5f0 >>> [] lock_acquire+0xa0/0x130 >>> [] mutex_lock_nested+0x7b/0x3b0 >>> [] lock_memory_hotplug+0x17/0x40 >>> [] add_memory+0x2f/0x1e0 >>> [] acpi_memory_enable_device+0x77/0x12b >>> [] acpi_memory_device_add+0x18f/0x198 >>> [] acpi_bus_device_attach+0x9a/0x100 >>> [] acpi_ns_walk_namespace+0xbe/0x17d >>> [] acpi_walk_namespace+0x8a/0xc4 >>> [] acpi_bus_scan+0x91/0x9c >>> [] acpi_scan_bus_device_check+0x7e/0x10f >>> [] acpi_scan_device_check+0x13/0x15 >>> [] acpi_os_execute_deferred+0x27/0x34 >>> [] process_one_work+0x1e8/0x560 >>> [] worker_thread+0x120/0x3a0 >>> [] kthread+0xee/0x100 >>> [] ret_from_fork+0x7c/0xb0 >>> >>> -> #1 (device_hotplug_lock){+.+.+.}: >>> [] validate_chain+0x70c/0x870 >>> [] __lock_acquire+0x36f/0x5f0 >>> [] lock_acquire+0xa0/0x130 >>> [] mutex_lock_nested+0x7b/0x3b0 >>> [] lock_device_hotplug+0x17/0x20 >> >> -> we should have grabbed device_remove_rwsem for reading here with the patch >> applied, which means that --> >> >>> [] show_online+0x33/0x80 >>> [] dev_attr_show+0x27/0x50 >>> [] fill_read_buffer+0x74/0x100 >>> [] sysfs_read_file+0xac/0xe0 >>> [] vfs_read+0xb1/0x130 >>> [] SyS_read+0x62/0xb0 >>> [] system_call_fastpath+0x16/0x1b >>> >>> -> #0 (s_active#73){++++.+}: >>> [] check_prev_add+0x598/0x5d0 >>> [] validate_chain+0x70c/0x870 >>> [] __lock_acquire+0x36f/0x5f0 >>> [] lock_acquire+0xa0/0x130 >>> [] sysfs_deactivate+0x157/0x1c0 >>> [] sysfs_addrm_finish+0x3b/0x70 >>> [] sysfs_hash_and_remove+0x60/0xb0 >>> [] sysfs_remove_file+0x39/0x50 >>> [] device_remove_file+0x17/0x20 >>> [] device_remove_attrs+0x2a/0xe0 >>> [] device_del+0x12b/0x1f0 >>> [] device_unregister+0x22/0x60 >>> [] remove_memory_block+0x6d/0xa0 >>> [] unregister_memory_section+0x1f/0x30 >>> [] __remove_pages+0xc8/0x150 >>> [] arch_remove_memory+0x94/0xd0 >>> [] remove_memory+0x6f/0x90 >>> [] acpi_memory_remove_memory+0x7e/0xa3 >>> [] acpi_memory_device_remove+0x27/0x33 >>> [] acpi_bus_device_detach+0x3d/0x5e >>> [] acpi_ns_walk_namespace+0xd6/0x17d >>> [] acpi_walk_namespace+0x8a/0xc4 >>> [] acpi_bus_trim+0x33/0x7a >>> [] acpi_scan_hot_remove+0x160/0x281 >> >> --> device_hotplug_lock is already held by show_online() at this point (or if >> it is not held, that function will return -EBUSY after attempting to grab >> device_remove_rwsem for reading), so acpi_scan_hot_remove() will wait for it >> to be released before calling acpi_bus_trim(). >> >> So the situation in which one thread holds s_active for reading and blocks on >> device_hotplug_lock while another thread is holding it over device removal is >> clearly impossible to me. >> >> So I'm not sure how device_hotplug_lock is still involved? > > Well, it is not involved, but lockdep doesn't see that anyway. Bummer. > >>> [] acpi_bus_hot_remove_device+0x37/0x73 >>> [] acpi_os_execute_deferred+0x27/0x34 >>> [] process_one_work+0x1e8/0x560 >>> [] worker_thread+0x120/0x3a0 >>> [] kthread+0xee/0x100 >>> [] ret_from_fork+0x7c/0xb0 >>> >>> other info that might help us debug this: >>> >>> Chain exists of: >>> s_active#73 --> pm_mutex --> mem_sysfs_mutex >>> >>> Possible unsafe locking scenario: >>> >>> CPU0 CPU1 >>> ---- ---- >>> lock(mem_sysfs_mutex); >>> lock(pm_mutex); >>> lock(mem_sysfs_mutex); >>> lock(s_active#73); >>> >>> *** DEADLOCK *** > > OK, so the patch below is quick and dirty and overkill, but it should make the > splat go away at least. Yeah, this patch is the same as the one I attached in previous discussion threads. It does make the splat go away, but as you said, it's a bit overkill. Regards, Gu > > Thanks, > Rafael > > > --- > drivers/acpi/scan.c | 3 ++- > drivers/base/core.c | 36 ++++++++++++++++++++---------------- > 2 files changed, 22 insertions(+), 17 deletions(-) > > Index: linux-pm/drivers/base/core.c > =================================================================== > --- linux-pm.orig/drivers/base/core.c > +++ linux-pm/drivers/base/core.c > @@ -49,6 +49,18 @@ static struct kobject *dev_kobj; > struct kobject *sysfs_dev_char_kobj; > struct kobject *sysfs_dev_block_kobj; > > +static DEFINE_MUTEX(device_hotplug_lock); > + > +void lock_device_hotplug(void) > +{ > + mutex_lock(&device_hotplug_lock); > +} > + > +void unlock_device_hotplug(void) > +{ > + mutex_unlock(&device_hotplug_lock); > +} > + > #ifdef CONFIG_BLOCK > static inline int device_is_not_partition(struct device *dev) > { > @@ -408,9 +420,11 @@ static ssize_t show_online(struct device > { > bool val; > > - lock_device_hotplug(); > + if (!mutex_trylock(&device_hotplug_lock)) > + return -EAGAIN; > + > val = !dev->offline; > - unlock_device_hotplug(); > + mutex_unlock(&device_hotplug_lock); > return sprintf(buf, "%u\n", val); > } > > @@ -424,9 +438,11 @@ static ssize_t store_online(struct devic > if (ret < 0) > return ret; > > - lock_device_hotplug(); > + if (!mutex_trylock(&device_hotplug_lock)) > + return -EAGAIN; > + > ret = val ? device_online(dev) : device_offline(dev); > - unlock_device_hotplug(); > + mutex_unlock(&device_hotplug_lock); > return ret < 0 ? ret : count; > } > > @@ -1479,18 +1495,6 @@ EXPORT_SYMBOL_GPL(put_device); > EXPORT_SYMBOL_GPL(device_create_file); > EXPORT_SYMBOL_GPL(device_remove_file); > > -static DEFINE_MUTEX(device_hotplug_lock); > - > -void lock_device_hotplug(void) > -{ > - mutex_lock(&device_hotplug_lock); > -} > - > -void unlock_device_hotplug(void) > -{ > - mutex_unlock(&device_hotplug_lock); > -} > - > static int device_check_offline(struct device *dev, void *not_used) > { > int ret; > Index: linux-pm/drivers/acpi/scan.c > =================================================================== > --- linux-pm.orig/drivers/acpi/scan.c > +++ linux-pm/drivers/acpi/scan.c > @@ -510,7 +510,8 @@ acpi_eject_store(struct device *d, struc > if (ACPI_FAILURE(status) || !acpi_device->flags.ejectable) > return -ENODEV; > > - mutex_lock(&acpi_scan_lock); > + if (!mutex_trylock(&acpi_scan_lock)) > + return -EAGAIN; > > if (acpi_device->flags.eject_pending) { > /* ACPI eject notification event. */ > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/