2014-04-16 14:46:43

by Stanislav Meduna

[permalink] [raw]
Subject: w1-gpio: sleeping function called from invalid context

Hi,

trying 1-wire with rt-patched Linux 3.12.15-rt25+ and lock debugging:

[ 12.055550] BUG: sleeping function called from invalid context at /home/stano/Kernels/linux-3.12-rt/kernel/rtmutex.c:673
[ 12.055573] in_atomic(): 0, irqs_disabled(): 128, pid: 92, name: w1_bus_master1
[ 12.055588] 2 locks held by w1_bus_master1/92:
[ 12.055675] #0: (&dev->mutex){+.+...}, at: [<c0233560>] w1_process+0xbc/0xe0
[ 12.055729] #1: (&dev->bus_mutex){+.+...}, at: [<c0232a50>] w1_search+0x78/0x1e8
[ 12.055741] irq event stamp: 374
[ 12.055787] hardirqs last enabled at (373): [<c02e04cc>] _raw_spin_unlock_irqrestore+0x6c/0x74
[ 12.055815] hardirqs last disabled at (374): [<c0234598>] w1_touch_bit+0x50/0xc4
[ 12.055860] softirqs last enabled at (0): [<c001bacc>] copy_process+0x37c/0x1178
[ 12.055876] softirqs last disabled at (0): [< (null)>] (null)
[ 12.055899] CPU: 0 PID: 92 Comm: w1_bus_master1 Not tainted 3.12.15-rt25+ #66
[ 12.055973] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
[ 12.056020] [<c0012c00>] (show_stack+0x10/0x14) from [<c02df920>] (rt_spin_lock+0x20/0x60)
[ 12.056080] [<c02df920>] (rt_spin_lock+0x20/0x60) from [<c01bc98c>] (gpiod_direction_output+0x50/0x2fc)
[ 12.056122] [<c01bc98c>] (gpiod_direction_output+0x50/0x2fc) from [<c02345ac>] (w1_touch_bit+0x64/0xc4)
[ 12.056156] [<c02345ac>] (w1_touch_bit+0x64/0xc4) from [<c0234794>] (w1_write_8+0x54/0x7c)
[ 12.056196] [<c0234794>] (w1_write_8+0x54/0x7c) from [<c0232a78>] (w1_search+0xa0/0x1e8)
[ 12.056239] [<c0232a78>] (w1_search+0xa0/0x1e8) from [<c0233414>] (w1_search_process_cb+0x50/0xe0)
[ 12.056280] [<c0233414>] (w1_search_process_cb+0x50/0xe0) from [<c0233570>] (w1_process+0xcc/0xe0)
[ 12.056334] [<c0233570>] (w1_process+0xcc/0xe0) from [<c003cf24>] (kthread+0xa0/0xa8)
[ 12.056379] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)

This happens reproducibly on any access.

Please Cc: me when replying

Thanks
--
Stano


2014-04-16 19:21:42

by Paul Gortmaker

[permalink] [raw]
Subject: Re: w1-gpio: sleeping function called from invalid context

On 14-04-16 10:46 AM, Stanislav Meduna wrote:
> Hi,
>
> trying 1-wire with rt-patched Linux 3.12.15-rt25+ and lock debugging:

I didn't have 3.12-rt handy, but looking at vanilla 3.12, I see
that w1_touch_bit() turned off irqs via:

static int w1_disable_irqs = 0;
module_param_named(disable_irqs, w1_disable_irqs, int, 0);

...

static void w1_write_bit(struct w1_master *dev, int bit)
{
unsigned long flags = 0;

if(w1_disable_irqs) local_irq_save(flags);

...

so, if you are using the above module parameter, it will do the below
because gpiod_direction_output() does

spin_lock_irqsave(&gpio_lock, flags);

i.e. note it is not a raw lock, so it calls might_sleep, and then
you get the unravel below.

I've no idea why that module parameter exists. Why do you set it?

Paul.
--

>
> [ 12.055550] BUG: sleeping function called from invalid context at /home/stano/Kernels/linux-3.12-rt/kernel/rtmutex.c:673
> [ 12.055573] in_atomic(): 0, irqs_disabled(): 128, pid: 92, name: w1_bus_master1
> [ 12.055588] 2 locks held by w1_bus_master1/92:
> [ 12.055675] #0: (&dev->mutex){+.+...}, at: [<c0233560>] w1_process+0xbc/0xe0
> [ 12.055729] #1: (&dev->bus_mutex){+.+...}, at: [<c0232a50>] w1_search+0x78/0x1e8
> [ 12.055741] irq event stamp: 374
> [ 12.055787] hardirqs last enabled at (373): [<c02e04cc>] _raw_spin_unlock_irqrestore+0x6c/0x74
> [ 12.055815] hardirqs last disabled at (374): [<c0234598>] w1_touch_bit+0x50/0xc4
> [ 12.055860] softirqs last enabled at (0): [<c001bacc>] copy_process+0x37c/0x1178
> [ 12.055876] softirqs last disabled at (0): [< (null)>] (null)
> [ 12.055899] CPU: 0 PID: 92 Comm: w1_bus_master1 Not tainted 3.12.15-rt25+ #66
> [ 12.055973] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
> [ 12.056020] [<c0012c00>] (show_stack+0x10/0x14) from [<c02df920>] (rt_spin_lock+0x20/0x60)
> [ 12.056080] [<c02df920>] (rt_spin_lock+0x20/0x60) from [<c01bc98c>] (gpiod_direction_output+0x50/0x2fc)
> [ 12.056122] [<c01bc98c>] (gpiod_direction_output+0x50/0x2fc) from [<c02345ac>] (w1_touch_bit+0x64/0xc4)
> [ 12.056156] [<c02345ac>] (w1_touch_bit+0x64/0xc4) from [<c0234794>] (w1_write_8+0x54/0x7c)
> [ 12.056196] [<c0234794>] (w1_write_8+0x54/0x7c) from [<c0232a78>] (w1_search+0xa0/0x1e8)
> [ 12.056239] [<c0232a78>] (w1_search+0xa0/0x1e8) from [<c0233414>] (w1_search_process_cb+0x50/0xe0)
> [ 12.056280] [<c0233414>] (w1_search_process_cb+0x50/0xe0) from [<c0233570>] (w1_process+0xcc/0xe0)
> [ 12.056334] [<c0233570>] (w1_process+0xcc/0xe0) from [<c003cf24>] (kthread+0xa0/0xa8)
> [ 12.056379] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)
>
> This happens reproducibly on any access.
>
> Please Cc: me when replying
>
> Thanks
>

2014-04-16 20:00:19

by Stanislav Meduna

[permalink] [raw]
Subject: Re: w1-gpio: sleeping function called from invalid context

On April 16, 2014 9:21:23 PM CEST, Paul Gortmaker <[email protected]> wrote:

> static int w1_disable_irqs = 0;
> module_param_named(disable_irqs, w1_disable_irqs, int, 0);
...
>I've no idea why that module parameter exists. Why do you set it?

Hm... I do not, at least not knowingly. I'll try to trace what exactly happens.

Thanks
--
Stano

2014-04-17 14:18:34

by Stanislav Meduna

[permalink] [raw]
Subject: Re: w1-gpio: sleeping function called from invalid context

On 16.04.2014 21:21, Paul Gortmaker wrote:

> static void w1_write_bit(struct w1_master *dev, int bit)
> {
> unsigned long flags = 0;
>
> if(w1_disable_irqs) local_irq_save(flags);

Well, it is actually the w1_read_bit that does the writing

static u8 w1_read_bit(struct w1_master *dev)
{
int result;
unsigned long flags = 0;

/* sample timing is critical here */
local_irq_save(flags);
...

The timing is indeed sensitive - just commenting out gets rid
of the error but also makes the connected device not to appear
(at least when compiled into the kernel and many things run
concurrently at boot). Raising the priority of the master thread
makes it work (but obviously not necessarily 100% reliable,
the timings needed are comparable to jitter caused by interrupts).

The root of the problem is probably that the w1-gpio does
tricks with the direction to communicate with the 1-wire bus.
A normal gpio get/set value does not need to lock anything
(there are special _cansleep versions for pins beyond controllers
that can sleep), but manipulating the pin direction does. To me
it looks the 1-wire is only possible under RT with real
open drain ports.

Maybe the gpiod_direction_input/output could be split to do the
stuff without locking - but I understand that this has to be
carefully designed.

For the record: the following makes things work in my environment:

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index fa932c2..2223c87 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -1011,6 +1011,11 @@ int w1_process(void *data)
* time can be calculated in jiffies once.
*/
const unsigned long jtime = msecs_to_jiffies(w1_timeout * 1000);
+ static struct sched_param sp = {
+ .sched_priority = 55
+ };
+
+ sched_setscheduler(current, SCHED_RR, &sp);

while (!kthread_should_stop()) {
if (dev->search_count) {
diff --git a/drivers/w1/w1_io.c b/drivers/w1/w1_io.c
index e10acc2..7065486 100644
--- a/drivers/w1/w1_io.c
+++ b/drivers/w1/w1_io.c
@@ -170,14 +170,14 @@ static u8 w1_read_bit(struct w1_master *dev)
unsigned long flags = 0;

/* sample timing is critical here */
- local_irq_save(flags);
+ if(w1_disable_irqs) local_irq_save(flags);
dev->bus_master->write_bit(dev->bus_master->data, 0);
w1_delay(6);
dev->bus_master->write_bit(dev->bus_master->data, 1);
w1_delay(9);

result = dev->bus_master->read_bit(dev->bus_master->data);
- local_irq_restore(flags);
+ if(w1_disable_irqs) local_irq_restore(flags);

w1_delay(55);

--
Stano

Subject: Re: w1-gpio: sleeping function called from invalid context

* Stanislav Meduna | 2014-04-17 16:18:16 [+0200]:

>diff --git a/drivers/w1/w1_io.c b/drivers/w1/w1_io.c
>index e10acc2..7065486 100644
>--- a/drivers/w1/w1_io.c
>+++ b/drivers/w1/w1_io.c
>@@ -170,14 +170,14 @@ static u8 w1_read_bit(struct w1_master *dev)
> unsigned long flags = 0;
>
> /* sample timing is critical here */
>- local_irq_save(flags);
>+ if(w1_disable_irqs) local_irq_save(flags);
> dev->bus_master->write_bit(dev->bus_master->data, 0);
> w1_delay(6);
> dev->bus_master->write_bit(dev->bus_master->data, 1);
> w1_delay(9);
>
> result = dev->bus_master->read_bit(dev->bus_master->data);
>- local_irq_restore(flags);
>+ if(w1_disable_irqs) local_irq_restore(flags);
>
> w1_delay(55);
>

there is local_irq_save_nort() and its friends.

Sebastian