Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755219AbaDNNKT (ORCPT ); Mon, 14 Apr 2014 09:10:19 -0400 Received: from mo4-p00-ob.smtp.rzone.de ([81.169.146.161]:28558 "EHLO mo4-p00-ob.smtp.rzone.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754437AbaDNNKP convert rfc822-to-8bit (ORCPT ); Mon, 14 Apr 2014 09:10:15 -0400 X-RZG-AUTH: :JGIXVUS7cutRB/49FwqZ7WcKdUCnXG6JabOfSXKWrat+htPvyuOM X-RZG-CLASS-ID: mo00 Subject: Re: w1: 3.14-rc7 - possible recursive locking detected Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: "Dr. H. Nikolaus Schaller" In-Reply-To: <20140413222401.GE5096@spacedout.fries.net> Date: Mon, 14 Apr 2014 15:10:04 +0200 Cc: Belisko Marek , Evgeniy Polyakov , LKML , Greg Kroah-Hartman Content-Transfer-Encoding: 8BIT Message-Id: <4DE9F573-55AC-4BEB-AF66-F721B145C081@goldelico.com> References: <20140323215041.GB5096@spacedout.fries.net> <698551395662480@web15j.yandex.ru> <20140413222401.GE5096@spacedout.fries.net> To: David Fries X-Mailer: Apple Mail (2.1085) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, I have tested and it appears to suppress the locking issue on our GTA04 board. BR, Nikolaus Am 14.04.2014 um 00:24 schrieb David Fries: > Belisko Marek, > Here is a possible solution, could you give it a try and report back? > > Greg Kroah-Hartman, > Evgeniy asked me to look into this report. I don't have the > reporter's hardware configuration, but I wouldn't think that would be > needed, just some w1 bus master (even W1_MASTER_GPIO might work), then > loading the slave device and manually adding a slave device with that > family id. Even then I didn't reproduce the reported recursive > locking error. I saw unrelated locking reports, but not this one. I > wrote up the included patch, but that undoes the notify changes that > you added earlier in commit 47eba33a0997fc7, and I wanted to ask about > that commit. Specifically these two lines, > > err = device_register(&sl->dev); > ... > + dev_set_uevent_suppress(&sl->dev, false); > + kobject_uevent(&sl->dev.kobj, KOBJ_ADD); > > Wouldn't the default be to not suppress? Nothing in the W1 system > enables suppressing so is that even needed? (I'm fine with saying > it's a good idea). > device_register at some point must call device_add and device_add > calls kobject_uevent(&dev->kobj, KOBJ_ADD); > so doesn't the KOBJ_ADD send the add a second time? As in it > shouldn't be needed? > Can the suppress be called before device_register to avoid the > automatic notify, then after it returns setup the slave device as this > patch does to avoid this problem report, and then call the KOBJ_ADD to > make everything happy? > > I'm based on commit ce7613db2d8d4d5af2587a. > > On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote: >> Hi, >> >> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov wrote: >>> Hi everyone >>> >>> 24.03.2014, 01:50, "David Fries" : >>>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: >>>>> booting latest 3.14-rc7 on gta04 board gives following warning: >>>>> >>>>> [ 3.101409] Driver for 1-wire Dallas network protocol. >>>>> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >>>>> [ 3.400146] CPU0 >>>>> [ 3.402709] ---- >>>>> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); >>>>> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); >>>>> [ 3.415374] >>>>> [ 3.415374] *** DEADLOCK *** >>>>> >>>>> AFAIK we didn't see that on (at least 3/14-rc2). >>>> >>>> The last drivers/w1 change went in before v3.14-rc1, so if it is >>>> something in the w1 code, it has either been there or something else >>>> in the rest of the kernel changed to expose it. I'm using the ds2490 >>>> USB w1 dongle, I didn't see that problem when enabling lock debugging >>>> on 3.14.0-rc5+ which includes some changes I'm working on. >>>> >>>> https://github.com/dfries/linux.git w1_rework >>> >>> Can it be a bug in omap w1 driver? >> We test with actual Linus master and it's even worse (it hangs completely). >> Trace look little bit different and it only happens when booting via >> device tree (not with board file). >> >> [ 2.807434] Driver for 1-wire Dallas network protocol. >> [ 2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >> Interrupt mode >> [ 3.034271] call_modprobe: w1-family-0x1 >> [ 3.051818] >> [ 3.053405] ============================================= >> [ 3.059051] [ INFO: possible recursive locking detected ] >> [ 3.064727] 3.14.0-gta04 #556 Not tainted >> [ 3.068939] --------------------------------------------- >> [ 3.074615] w1_bus_master1/795 is trying to acquire lock: >> [ 3.080261] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.090911] >> [ 3.090911] but task is already holding lock: >> [ 3.097045] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.107666] >> [ 3.107666] other info that might help us debug this: >> [ 3.114501] Possible unsafe locking scenario: >> [ 3.114501] >> [ 3.120727] CPU0 >> [ 3.123291] ---- >> [ 3.125854] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.130889] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.135925] >> [ 3.135925] *** DEADLOCK *** >> [ 3.135925] >> [ 3.142150] May be due to missing lock nesting notation >> [ 3.142150] >> [ 3.149291] 2 locks held by w1_bus_master1/795: >> [ 3.154052] #0: (&dev->mutex#3){+.+.+.}, at: [] >> w1_attach_slave_device+0xc4/0x1c8 >> [ 3.163055] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.174163] >> [ 3.174163] stack backtrace: >> [ 3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted >> 3.14.0-gta04 #556 >> [ 3.186370] [] (unwind_backtrace) from [] >> (show_stack+0x10/0x14) >> [ 3.194488] [] (show_stack) from [] >> (dump_stack+0x6c/0x88) >> [ 3.202117] [] (dump_stack) from [] >> (print_deadlock_bug+0xc0/0xf0) >> [ 3.210449] [] (print_deadlock_bug) from [] >> (validate_chain.isra.29+0x4dc/0x534) >> [ 3.220031] [] (validate_chain.isra.29) from [] >> (__lock_acquire+0x728/0x834) >> [ 3.229248] [] (__lock_acquire) from [] >> (lock_acquire+0xf4/0x118) >> [ 3.237487] [] (lock_acquire) from [] >> (down_read+0x24/0x34) >> [ 3.245178] [] (down_read) from [] >> (__blocking_notifier_call_chain+0x28/0x58) >> [ 3.254516] [] (__blocking_notifier_call_chain) from >> [] (blocking_notifier_call_chain+0x14/0x18) >> [ 3.265563] [] (blocking_notifier_call_chain) from >> [] (device_add+0x1f4/0x368) >> [ 3.274993] [] (device_add) from [] >> (platform_device_add+0x138/0x1c8) >> [ 3.283569] [] (platform_device_add) from [] >> (w1_bq27000_add_slave+0x44/0x74) >> [ 3.292907] [] (w1_bq27000_add_slave) from [] >> (w1_bus_notify.part.0+0x48/0xc8) >> [ 3.302337] [] (w1_bus_notify.part.0) from [] >> (notifier_call_chain+0x38/0x68) >> [ 3.311645] [] (notifier_call_chain) from [] >> (__blocking_notifier_call_chain+0x44/0x58) >> [ 3.321899] [] (__blocking_notifier_call_chain) from >> [] (blocking_notifier_call_chain+0x14/0x18) >> [ 3.332946] [] (blocking_notifier_call_chain) from >> [] (device_add+0x1f4/0x368) >> [ 3.342346] [] (device_add) from [] >> (__w1_attach_slave_device+0x9c/0x134) >> [ 3.351318] [] (__w1_attach_slave_device) from >> [] (w1_attach_slave_device+0x134/0x1c8) >> [ 3.361480] [] (w1_attach_slave_device) from [] >> (w1_slave_found+0x7c/0x98) >> [ 3.370513] [] (w1_slave_found) from [] >> (omap_w1_search_bus+0x54/0x5c) >> [ 3.379211] [] (omap_w1_search_bus) from [] >> (w1_search_devices+0x3c/0x48) >> [ 3.388153] [] (w1_search_devices) from [] >> (w1_search_process_cb+0x64/0x108) >> [ 3.397399] [] (w1_search_process_cb) from [] >> (w1_process+0x68/0x14c) >> [ 3.405975] [] (w1_process) from [] (kthread+0xdc/0xf0) >> [ 3.413299] [] (kthread) from [] >> (ret_from_fork+0x14/0x2c) >> [ 24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19) >> [ 24.463134] INFO: Stall ended before state dump start >> [ 87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19) >> [ 87.513122] INFO: Stall ended before state dump start >> [ 150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19) >> [ 150.563232] INFO: Stall ended before state dump start >> [ 213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19) >> [ 213.613220] INFO: Stall ended before state dump start >> [ 276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19) >> [ 276.663238] INFO: Stall ended before state dump start >> >> BR, >> >> marek >> >> -- >> as simple and primitive as possible >> ------------------------------------------------- >> Marek Belisko - OPEN-NANDRA >> Freelance Developer >> >> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic >> Tel: +421 915 052 184 >> skype: marekwhite >> twitter: #opennandra >> web: http://open-nandra.com >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > -- > David Fries PGP pub CB1EE8F0 > http://fries.net/~david/ > > > commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a > Author: David Fries > Date: Sun Apr 13 15:36:45 2014 -0500 > > w1: avoid recursive device_add > > __w1_attach_slave_device calls device_add which calls w1_bus_notify > which calls for the w1_bq27000 slave driver, which calls > platform_device_add and device_add and deadlocks on getting > &(&priv->bus_notifier)->rwsem as it is still held in the previous > device_add. This avoids the problem by processing the family > add/remove outside of the slave device_add call. This does mean that > the slave device isn't setup when the notification that a slave is > available goes out. > > Reported-by: Belisko Marek > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index b96f61b..dde09c1 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -614,27 +614,11 @@ end: > return err; > } > > -/* > - * Handle sysfs file creation and removal here, before userspace is told that > - * the device is added / removed from the system > - */ > -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > - void *data) > +static int w1_family_notify(unsigned long action, struct w1_slave *sl) > { > - struct device *dev = data; > - struct w1_slave *sl; > struct w1_family_ops *fops; > int err; > > - /* > - * Only care about slave devices at the moment. Yes, we should use a > - * separate "type" for this, but for now, look at the release function > - * to know which type it is... > - */ > - if (dev->release != w1_slave_release) > - return 0; > - > - sl = dev_to_w1_slave(dev); > fops = sl->family->fops; > > if (!fops) > @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > return 0; > } > > -static struct notifier_block w1_bus_nb = { > - .notifier_call = w1_bus_notify, > -}; > - > static int __w1_attach_slave_device(struct w1_slave *sl) > { > int err; > @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) > dev_name(&sl->dev), err); > return err; > } > + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); > > > dev_set_uevent_suppress(&sl->dev, false); > @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl) > msg.type = W1_SLAVE_REMOVE; > w1_netlink_send(sl->master, &msg); > > + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); > device_unregister(&sl->dev); > #ifdef DEBUG > memset(sl, 0, sizeof(*sl)); > @@ -1186,10 +1168,6 @@ static int __init w1_init(void) > goto err_out_exit_init; > } > > - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb); > - if (retval) > - goto err_out_bus_unregister; > - > retval = driver_register(&w1_master_driver); > if (retval) { > printk(KERN_ERR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/