2012-06-25 20:52:19

by Franky Lin

[permalink] [raw]
Subject: Panda ES board hang when using GPIO as interrupt

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO
WiFi dongle on MMC2 with a level triggered interrupt signal connected to
GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
within 5 mins during a network traffic test. After bisecting we found
the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
*_runtime_suspend()" [1].

I noticed Kevin raised some similar cases on other platforms and also
provided two patches in the patch mail thread. But unfortunately those
two patches doesn't help in our case. I tested the driver with 3.5-rc3
mainline kernel and the issue is still there. I can only "fix" the hang
by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
hang only happens on Panda ES board. Old Panda with 4430 works good.

Any thoughts and suggestions?

Thanks,
Franky

[1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/



2012-06-29 00:59:13

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/28/2012 04:54 PM, Jon Hunter wrote:
> I am wondering if this could be the bug ... on start-up I see that we do
> a context restore on bank1 during the probe which is before we have done
> the first suspend! In other words, we could restore a bad/uninitialised
> context for bank1. In the case of bank1, the loss count starts at 1 and
> not 0 and so we falsely think we need to perform a restore :-(
>
> [ 0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
> [ 0.177276] omap_gpio_runtime_resume: count 0, now 1
> [ 0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
> [ 0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>
> Can you try ...
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..9623408 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
> #ifdef CONFIG_OF_GPIO
> bank->chip.of_node = of_node_get(node);
> #endif
> + if (bank->get_context_loss_count)
> + bank->context_loss_count =
> + bank->get_context_loss_count(bank->dev);
>
> bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
> if (bank->irq_base < 0) {
>

Looks like you found the culprit. :) It does fix the problem.

Franky


2012-06-28 15:37:01

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

Hi Franky,

On 06/27/2012 08:03 PM, Franky Lin wrote:
> On 06/27/2012 04:43 PM, Jon Hunter wrote:
>> Hi Franky,
>>
>> On 06/25/2012 03:52 PM, Franky Lin wrote:
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>>> within 5 mins during a network traffic test. After bisecting we found
>>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>>> *_runtime_suspend()" [1].
>>
>> I have been looking into this today to see if I can replicate the
>> problem that you have reported. However, so far I have not had any luck.
>> Please note that my test setup is not exactly the same as yours as I
>> don't have your wlan module. However, I have been using a 2nd board to
>> generate gpio events to a panda-es to see I can make it lock up. I have
>> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
>> problems after sending 100k gpio events (over many minutes). My setup is
>> as follows ...
>>
>> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
>> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
>> - Created a simple kernel module that acquires gpio-138 and sets up a
>> IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
>> - GPIO events are triggered roughly every 1ms
>
> Don't know if it's related, but we also mux several other pins on
> connector A:
> /* MMC2 Mux for extension board */
> /* MMC2 CMD */
> OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* MMC2 CLK */
> OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* MMC2 DAT 0-3 */
> OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* GPIO MUX for OOB interupt of dongle */
> OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
> /* GPIO MUX for WLAN_ENABLE for dongle */
> OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

I would not have thought so. However, I will think about that thanks.

>> Can you confirm ...
>> 1. You are just using omap2plus_defconfig with no changes?
> No, we enable following options
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_USB_OHCI_HCD=y

Ok, thanks.

>> 2. Rough frequency of gpio events?
> 3367 interrupts were triggered during a 10 secs throughput test.
>
>> 3. Is the gpio configured for active low or high?
> active high
>
>> 4. When the hang occurs, what is the state of the gpio? Active or
>> inactive? Can you probe it with a scope? If it was always active I
>> could see that this would lock the device up, but I am not sure how
>> that would relate to the results from your bisect???
>
> I dont have a scope nearby. Let me see if I can find one tomorrow.

Great, that would be good.

>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the hang
>>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>>
>> It does not make sense to me yet why this would only impact 4460, but I
>> will keep this in mind.
>>
>> In your wlan driver are you acquiring and freeing the gpio often? Or are
>> you only acquiring the gpio on boot?
>>
>> The reason I ask is because for omap4, it seems that we are not
>> currently calling omap2_gpio_prepare_for_idle() during idle and so the
>> only time I see us call the runtime_suspend/resume handlers for omap4 is
>> during probe and when we acquire and free the gpio.
>>
>> So if you were not acquiring and freeing the gpio and are using the
>> stock kernel, then as far as I can tell, the runtime pm code is not
>> being exercised much. My test is not acquiring and releasing the gpio
>> and so I am wondering if that is the secret to reproducing this
>> problem :-)
>
> We only request the irq once during initialization. But we do frequently
> disable and re-enable it since we need to access to the module through
> SDIO to clear the interrupt. Apparently we can't finish all this in irq
> handler.

Ok, thanks. I don't see why that would cause a problem, but I can try
that too.

> Hope these could help.

Yes, good info to have.

Thanks
Jon

2012-06-28 15:41:57

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/27/2012 07:41 PM, Franky Lin wrote:
> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>> "Franky Lin" <[email protected]> writes:
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the
>>> hang by either reverting the commit or disabling
>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>> Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> If reverting the patch fixes your problem, can you isolate down to which
>> part of that patch causes the problem? IOW, can you fix your problem if
>> you undo just the hunk added in runtime_suspend or undo just the moved
>> hunk runtime_resume? Or is reverting both required?
>>
>> I suspect the added runtime_suspend hunk is causing the problems, so can
>> you see if just undoing that part works[1]. If that works, I will give
>> a bit more of a thinking on it tomorrow.
>
> runtime_suspend hunk is fine. The hang still exist after reverting it.
> The culprit is the moved hunk in runtime_resume. Reverting it makes the
> hang disappear.

Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return.

Could you try the following change on top of v3.5-rc3?

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3b89e85 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1238,12 +1238,8 @@ static int omap_gpio_runtime_resume(struct device *dev)
if (bank->get_context_loss_count) {
context_lost_cnt_after =
bank->get_context_loss_count(bank->dev);
- if (context_lost_cnt_after != bank->context_loss_count) {
+ if (context_lost_cnt_after != bank->context_loss_count)
omap_gpio_restore_context(bank);
- } else {
- spin_unlock_irqrestore(&bank->lock, flags);
- return 0;
- }
}

Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.

Cheers
Jon

2012-06-28 21:55:46

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 04:24 PM, Franky Lin wrote:
> On 06/28/2012 08:42 AM, Jon Hunter wrote:
>>
>> On 06/27/2012 07:41 PM, Franky Lin wrote:
>>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>>> "Franky Lin" <[email protected]> writes:
>>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>>> provided two patches in the patch mail thread. But unfortunately those
>>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>>> hang by either reverting the commit or disabling
>>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>>> Panda with 4430 works good.
>>>>>
>>>>> Any thoughts and suggestions?
>>>>
>>>> If reverting the patch fixes your problem, can you isolate down to
>>>> which
>>>> part of that patch causes the problem? IOW, can you fix your
>>>> problem if
>>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>>> hunk runtime_resume? Or is reverting both required?
>>>>
>>>> I suspect the added runtime_suspend hunk is causing the problems, so
>>>> can
>>>> you see if just undoing that part works[1]. If that works, I will give
>>>> a bit more of a thinking on it tomorrow.
>>>
>>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>>> hang disappear.
>>
>> Thanks. From reviewing the code the only thing that appears suspect based
>> upon your findings is the return if we find the context has not been
>> lost.
>> We are not checking if "workaround_enabled" is set before we return.
>>
>> Could you try the following change on top of v3.5-rc3?
>>
>
> The patch doesn't help. And I also managed to probe the signal. It's
> active when it hung.

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.

>> Also, could you add a print in the runtime_suspend/resume() functions so
>> we can see how often these are being called. In my case, I really
>> don't see
>> these being exercised and I am wondering how often you see suspend/resume
>> being called in your setup.
>
> Well, the runtime_suspend/resume never get called during the test.

Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
#if defined(CONFIG_PM_RUNTIME)
static void omap_gpio_restore_context(struct gpio_bank *bank)
{
+ return;
+
__raw_writel(bank->context.wake_en,
bank->base + bank->regs->wkup_en);
__raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

Cheers
Jon

2012-06-28 22:59:05

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 05:53 PM, Franky Lin wrote:
> On 06/28/2012 02:55 PM, Jon Hunter wrote:
>> Ok. Any way to manually reset the wlan module to deactivate the gpio
>> when it is hung? I am wondering if the gpio is deactivated if the board
>> comes back to life, indicating it is stuck in the interrupt somewhere.
>
> The only way I can think of is removing the module manually. But it
> didn't bring the board back to live.
>
>> Well, at least that is consistent with what I see, but also perplexing
>> that it takes sometime to fail. Can you try the following as a debug
>> patch to see if it is in the context restore that is the problem. From
>> your testing and bisect, the only possible difference in the current
>> kernel is that it could perform the context restore when acquiring the
>> gpio.
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..a2401bd 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>> #if defined(CONFIG_PM_RUNTIME)
>> static void omap_gpio_restore_context(struct gpio_bank *bank)
>> {
>> + return;
>> +
>> __raw_writel(bank->context.wake_en,
>> bank->base + bank->regs->wkup_en);
>> __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>>
>
> This one works! It can run more than 20 mins.

Great! I need to dig into the context restore some more.

> I found one interesting thing. When I added the print info to see when
> runtime_suspend/resume get called, it seems like the suspend/resume is
> unbalance during boot. Resume got called more than suspend. So I hack
> the code to make sure suspend and resume are called in pair. A resume
> without suspend will do nothing and return immediately. This also makes
> the hang vanish.

I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.

Thanks
Jon

2012-06-27 03:36:59

by Kevin Hilman

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

Hello,

"Franky Lin" <[email protected]> writes:

> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected
> to GPIO 138. It's been working fine until 3.5 rc1. The board hang
> randomly within 5 mins during a network traffic test. After bisecting
> we found the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

<grumble>

As you might guess. That patch has caused me enough headaches that
reverting it sounds like a good idea now. But, I'd still like to better
understand exactly what's going on.

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem? IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume? Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1]. If that works, I will give
a bit more of a thinking on it tomorrow.

Thanks for reporting the problem! Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin

[1] patch against v3.5-rc4

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..2a6067f 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
__raw_writel(wake_hi | bank->context.risingdetect,
bank->base + bank->regs->risingdetect);

- if (!bank->enabled_non_wakeup_gpios)
- goto update_gpio_context_count;
-
if (bank->power_mode != OFF_MODE) {
bank->power_mode = 0;
goto update_gpio_context_count;




2012-06-28 01:04:06

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/27/2012 04:43 PM, Jon Hunter wrote:
> Hi Franky,
>
> On 06/25/2012 03:52 PM, Franky Lin wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO
>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>> within 5 mins during a network traffic test. After bisecting we found
>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>> *_runtime_suspend()" [1].
>
> I have been looking into this today to see if I can replicate the
> problem that you have reported. However, so far I have not had any luck.
> Please note that my test setup is not exactly the same as yours as I
> don't have your wlan module. However, I have been using a 2nd board to
> generate gpio events to a panda-es to see I can make it lock up. I have
> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
> problems after sending 100k gpio events (over many minutes). My setup is
> as follows ...
>
> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
> - Created a simple kernel module that acquires gpio-138 and sets up a
> IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
> - GPIO events are triggered roughly every 1ms

Don't know if it's related, but we also mux several other pins on
connector A:
/* MMC2 Mux for extension board */
/* MMC2 CMD */
OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* MMC2 CLK */
OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* MMC2 DAT 0-3 */
OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* GPIO MUX for OOB interupt of dongle */
OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
/* GPIO MUX for WLAN_ENABLE for dongle */
OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

> Can you confirm ...
> 1. You are just using omap2plus_defconfig with no changes?
No, we enable following options
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_USB_OHCI_HCD=y

> 2. Rough frequency of gpio events?
3367 interrupts were triggered during a 10 secs throughput test.

> 3. Is the gpio configured for active low or high?
active high

> 4. When the hang occurs, what is the state of the gpio? Active or
> inactive? Can you probe it with a scope? If it was always active I
> could see that this would lock the device up, but I am not sure how
> that would relate to the results from your bisect???

I dont have a scope nearby. Let me see if I can find one tomorrow.

>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the hang
>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>
> It does not make sense to me yet why this would only impact 4460, but I
> will keep this in mind.
>
> In your wlan driver are you acquiring and freeing the gpio often? Or are
> you only acquiring the gpio on boot?
>
> The reason I ask is because for omap4, it seems that we are not
> currently calling omap2_gpio_prepare_for_idle() during idle and so the
> only time I see us call the runtime_suspend/resume handlers for omap4 is
> during probe and when we acquire and free the gpio.
>
> So if you were not acquiring and freeing the gpio and are using the
> stock kernel, then as far as I can tell, the runtime pm code is not
> being exercised much. My test is not acquiring and releasing the gpio
> and so I am wondering if that is the secret to reproducing this problem :-)

We only request the irq once during initialization. But we do frequently
disable and re-enable it since we need to access to the module through
SDIO to clear the interrupt. Apparently we can't finish all this in irq
handler.

Hope these could help.

Regards,
Franky


2012-06-28 23:53:58

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
>
> OK.
> This is what I saw in the log:
> [ 0.171844] dummy:
> [ 0.172912] NET: Registered protocol family 16
> [ 0.173431] GPMC revision 6.0
> [ 0.173492] gpmc: irq-52 could not claim: err -22
> [ 0.177551] ??????omap_gpio_runtime_resume
> [ 0.178619] OMAP GPIO hardware version 0.1
> [ 0.178649] !!!!!omap_gpio_runtime_suspend
> [ 0.178771] ??????omap_gpio_runtime_resume
> [ 0.179351] !!!!!omap_gpio_runtime_suspend
> [ 0.179504] ??????omap_gpio_runtime_resume
> [ 0.180023] !!!!!omap_gpio_runtime_suspend
> [ 0.180145] ??????omap_gpio_runtime_resume
> [ 0.180694] !!!!!omap_gpio_runtime_suspend
> [ 0.180847] ??????omap_gpio_runtime_resume
> [ 0.181365] !!!!!omap_gpio_runtime_suspend
> [ 0.181518] ??????omap_gpio_runtime_resume
> [ 0.182037] !!!!!omap_gpio_runtime_suspend
> [ 0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [ 0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [ 0.186584] error setting wl12xx data: -38
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.239501] ??????omap_gpio_runtime_resume
> [ 0.239532] ??????omap_gpio_runtime_resume
> [ 0.241058] usbhs_omap: alias fck already exists
> [ 0.244781] ??????omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[ 0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
[ 0.177276] omap_gpio_runtime_resume: count 0, now 1
[ 0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[ 0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
#ifdef CONFIG_OF_GPIO
bank->chip.of_node = of_node_get(node);
#endif
+ if (bank->get_context_loss_count)
+ bank->context_loss_count =
+ bank->get_context_loss_count(bank->dev);

bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
if (bank->irq_base < 0) {

Thanks
Jon


2012-06-28 00:42:04

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/26/2012 08:37 PM, Kevin Hilman wrote:
> "Franky Lin" <[email protected]> writes:
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the
>> hang by either reverting the commit or disabling
>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>> Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
>
> If reverting the patch fixes your problem, can you isolate down to which
> part of that patch causes the problem? IOW, can you fix your problem if
> you undo just the hunk added in runtime_suspend or undo just the moved
> hunk runtime_resume? Or is reverting both required?
>
> I suspect the added runtime_suspend hunk is causing the problems, so can
> you see if just undoing that part works[1]. If that works, I will give
> a bit more of a thinking on it tomorrow.

runtime_suspend hunk is fine. The hang still exist after reverting it.
The culprit is the moved hunk in runtime_resume. Reverting it makes the
hang disappear.

>
> Thanks for reporting the problem! Bug reports like this that have
> clearly been thoroughly researched and bisected are greatly appreciated!
>
> Kevin
>

You are welcome.

Regards,
Franky


2012-06-28 21:24:45

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/28/2012 08:42 AM, Jon Hunter wrote:
>
> On 06/27/2012 07:41 PM, Franky Lin wrote:
>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>> "Franky Lin" <[email protected]> writes:
>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>> provided two patches in the patch mail thread. But unfortunately those
>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>> hang by either reverting the commit or disabling
>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>> Panda with 4430 works good.
>>>>
>>>> Any thoughts and suggestions?
>>>
>>> If reverting the patch fixes your problem, can you isolate down to which
>>> part of that patch causes the problem? IOW, can you fix your problem if
>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>> hunk runtime_resume? Or is reverting both required?
>>>
>>> I suspect the added runtime_suspend hunk is causing the problems, so can
>>> you see if just undoing that part works[1]. If that works, I will give
>>> a bit more of a thinking on it tomorrow.
>>
>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>> hang disappear.
>
> Thanks. From reviewing the code the only thing that appears suspect based
> upon your findings is the return if we find the context has not been lost.
> We are not checking if "workaround_enabled" is set before we return.
>
> Could you try the following change on top of v3.5-rc3?
>

The patch doesn't help. And I also managed to probe the signal. It's
active when it hung.

> Also, could you add a print in the runtime_suspend/resume() functions so
> we can see how often these are being called. In my case, I really don't see
> these being exercised and I am wondering how often you see suspend/resume
> being called in your setup.

Well, the runtime_suspend/resume never get called during the test.

Thanks,
Franky


2012-06-27 23:43:25

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:
> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
> within 5 mins during a network traffic test. After bisecting we found
> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms

Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?
2. Rough frequency of gpio events?
3. Is the gpio configured for active low or high?
4. When the hang occurs, what is the state of the gpio? Active or
inactive? Can you probe it with a scope? If it was always active I
could see that this would lock the device up, but I am not sure how
that would relate to the results from your bisect???

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the hang
> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
> hang only happens on Panda ES board. Old Panda with 4430 works good.

It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)

Cheers
Jon


2012-06-29 15:53:11

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <[email protected]> wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [ 0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>>> [ 0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [ 0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [ 0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>> #ifdef CONFIG_OF_GPIO
>>> bank->chip.of_node = of_node_get(node);
>>> #endif
>>> + if (bank->get_context_loss_count)
>>> + bank->context_loss_count =
>>> + bank->get_context_loss_count(bank->dev);
>>>
>>> bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>> if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown <[email protected]> reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon

2012-06-28 23:28:11

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
>
> OK.
> This is what I saw in the log:
> [ 0.171844] dummy:
> [ 0.172912] NET: Registered protocol family 16
> [ 0.173431] GPMC revision 6.0
> [ 0.173492] gpmc: irq-52 could not claim: err -22
> [ 0.177551] ??????omap_gpio_runtime_resume
> [ 0.178619] OMAP GPIO hardware version 0.1
> [ 0.178649] !!!!!omap_gpio_runtime_suspend
> [ 0.178771] ??????omap_gpio_runtime_resume
> [ 0.179351] !!!!!omap_gpio_runtime_suspend
> [ 0.179504] ??????omap_gpio_runtime_resume
> [ 0.180023] !!!!!omap_gpio_runtime_suspend
> [ 0.180145] ??????omap_gpio_runtime_resume
> [ 0.180694] !!!!!omap_gpio_runtime_suspend
> [ 0.180847] ??????omap_gpio_runtime_resume
> [ 0.181365] !!!!!omap_gpio_runtime_suspend
> [ 0.181518] ??????omap_gpio_runtime_resume
> [ 0.182037] !!!!!omap_gpio_runtime_suspend

There a 6 resume/suspend pairs here one for probing each of the 6 gpio
banks. So this makes sense.

> [ 0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [ 0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [ 0.186584] error setting wl12xx data: -38
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.239501] ??????omap_gpio_runtime_resume
> [ 0.239532] ??????omap_gpio_runtime_resume
> [ 0.241058] usbhs_omap: alias fck already exists
> [ 0.244781] ??????omap_gpio_runtime_resume

Yes, these 3 resumes at the end are most likely caused by calls to
omap_gpio_request(). In other words, 3 gpios are acquired. So that is
expected and looks fine to me.

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..bca3985 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
>
> #if defined(CONFIG_PM_RUNTIME)
> static void omap_gpio_restore_context(struct gpio_bank *bank);
> -
> +static int flag = 0;
> static int omap_gpio_runtime_suspend(struct device *dev)
> {
> struct platform_device *pdev = to_platform_device(dev);
> @@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
> *dev)
> unsigned long flags;
> u32 wake_low, wake_hi;
>
> + flag ++;
> +
> spin_lock_irqsave(&bank->lock, flags);
>
> /*
> @@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
> *dev)
> u32 l = 0, gen, gen0, gen1;
> unsigned long flags;
>
> + if (flag)
> + flag--;
> + else
> + return 0;
> +
> spin_lock_irqsave(&bank->lock, flags);
> _gpio_dbck_enable(bank);

I guess that this would also avoid the context restore, so I could see
it would work, but this is definitely not right. Ok, well let me look
into the restore.

Thanks
Jon

2012-06-28 22:53:40

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/28/2012 02:55 PM, Jon Hunter wrote:
> Ok. Any way to manually reset the wlan module to deactivate the gpio
> when it is hung? I am wondering if the gpio is deactivated if the board
> comes back to life, indicating it is stuck in the interrupt somewhere.

The only way I can think of is removing the module manually. But it
didn't bring the board back to live.

> Well, at least that is consistent with what I see, but also perplexing
> that it takes sometime to fail. Can you try the following as a debug
> patch to see if it is in the context restore that is the problem. From
> your testing and bisect, the only possible difference in the current
> kernel is that it could perform the context restore when acquiring the gpio.
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..a2401bd 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
> #if defined(CONFIG_PM_RUNTIME)
> static void omap_gpio_restore_context(struct gpio_bank *bank)
> {
> + return;
> +
> __raw_writel(bank->context.wake_en,
> bank->base + bank->regs->wkup_en);
> __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>

This one works! It can run more than 20 mins.

I found one interesting thing. When I added the print info to see when
runtime_suspend/resume get called, it seems like the suspend/resume is
unbalance during boot. Resume got called more than suspend. So I hack
the code to make sure suspend and resume are called in pair. A resume
without suspend will do nothing and return immediately. This also makes
the hang vanish.

Regards,
Franky



2012-06-29 04:07:19

by DebBarma, Tarun Kanti

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <[email protected]> wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [ ? ?0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>> [ ? ?0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [ ? ?0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [ ? ?0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>> ?#ifdef CONFIG_OF_GPIO
>> ? ? ? ? bank->chip.of_node = of_node_get(node);
>> ?#endif
>> + ? ? ? if (bank->get_context_loss_count)
>> + ? ? ? ? ? ? ? bank->context_loss_count =
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bank->get_context_loss_count(bank->dev);
>>
>> ? ? ? ? bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>> ? ? ? ? if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown <[email protected]> reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>

2012-06-27 13:29:42

by DebBarma, Tarun Kanti

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On Tue, Jun 26, 2012 at 11:50 PM, Franky Lin <[email protected]> wrote:
> On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
>>
>> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <[email protected]> wrote:
>>>
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi
>>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>>> 138. It's been working fine until 3.5 rc1. The board hang randomly within
>>> 5
>>> mins during a network traffic test. After bisecting we found the culprit
>>> is
>>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>>
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two
>>> patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline
>>> kernel and the issue is still there. I can only "fix" the hang by either
>>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>>> happens on Panda ES board. Old Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> I just had a quick look at the code. Can you please check if the
>> attached patch solves
>> the issue? I just boot tested on Panda and Blaze.
>> --
>> Tarun
>>
>
> Thanks for the prompt reply.
>
> Booting is fine even without the patch and revert. The wifi dongle generates
> interrupt whenever there is data packet available for host to read. So
> during a traffic test a significant numbers of interrupt will be triggered
> through the GPIO. So I assume it has something to do with the interrupt
> GPIO.
>
> With the patch, the kernel still crashes. But the symptom is slightly
> different. Now it has a panic log every time. See attachment.
I tried comparing the present code with older version with regard
to enabled_non_wakeup_gpios check. The obvious difference I
observed is that this check is performed after off-mode check,
unlike the present case where the check is done just prior to
off-mode check. But then, as Kevin pointed out, we need to understand
the exact problem. I am trying to have a setup to reproduce the
problem. BTW, you can ignore my patch because I realized that
saved_datain is part of the workaround.
---
Tarun

>
> Regards,
> Franky

2012-06-28 23:35:12

by Jon Hunter

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
>
> OK.
> This is what I saw in the log:
> [ 0.171844] dummy:
> [ 0.172912] NET: Registered protocol family 16
> [ 0.173431] GPMC revision 6.0
> [ 0.173492] gpmc: irq-52 could not claim: err -22
> [ 0.177551] ??????omap_gpio_runtime_resume
> [ 0.178619] OMAP GPIO hardware version 0.1
> [ 0.178649] !!!!!omap_gpio_runtime_suspend
> [ 0.178771] ??????omap_gpio_runtime_resume
> [ 0.179351] !!!!!omap_gpio_runtime_suspend
> [ 0.179504] ??????omap_gpio_runtime_resume
> [ 0.180023] !!!!!omap_gpio_runtime_suspend
> [ 0.180145] ??????omap_gpio_runtime_resume
> [ 0.180694] !!!!!omap_gpio_runtime_suspend
> [ 0.180847] ??????omap_gpio_runtime_resume
> [ 0.181365] !!!!!omap_gpio_runtime_suspend
> [ 0.181518] ??????omap_gpio_runtime_resume
> [ 0.182037] !!!!!omap_gpio_runtime_suspend
> [ 0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [ 0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [ 0.186584] error setting wl12xx data: -38
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [ 0.239501] ??????omap_gpio_runtime_resume
> [ 0.239532] ??????omap_gpio_runtime_resume
> [ 0.241058] usbhs_omap: alias fck already exists
> [ 0.244781] ??????omap_gpio_runtime_resume

Sorry, can you do one more test? :-)

Add the following and send me the output?

Thanks!
Jon

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3aa0f96 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1155,6 +1155,7 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
unsigned long flags;
u32 wake_low, wake_hi;

+ pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
spin_lock_irqsave(&bank->lock, flags);

/*
@@ -1221,6 +1222,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
u32 l = 0, gen, gen0, gen1;
unsigned long flags;

+ pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
spin_lock_irqsave(&bank->lock, flags);
_gpio_dbck_enable(bank);

@@ -1239,6 +1241,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
context_lost_cnt_after =
bank->get_context_loss_count(bank->dev);
if (context_lost_cnt_after != bank->context_loss_count) {
+ pr_info("%s: count %d, now %d", __func__,
bank->context_loss_count, context_lost_cnt_after);
omap_gpio_restore_context(bank);
} else {
spin_unlock_irqrestore(&bank->lock, flags);
@@ -1341,6 +1344,7 @@ void omap2_gpio_resume_after_idle(void)
#if defined(CONFIG_PM_RUNTIME)
static void omap_gpio_restore_context(struct gpio_bank *bank)
{
+ pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
__raw_writel(bank->context.wake_en,
bank->base + bank->regs->wkup_en);
__raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);


2012-06-26 07:21:26

by DebBarma, Tarun Kanti

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <[email protected]> wrote:
> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO WiFi
> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
> mins during a network traffic test. After bisecting we found the culprit is
> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>
> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those two
> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
> kernel and the issue is still there. I can only "fix" the hang by either
> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
> happens on Panda ES board. Old Panda with 4430 works good.
>
> Any thoughts and suggestions?
I just had a quick look at the code. Can you please check if the
attached patch solves
the issue? I just boot tested on Panda and Blaze.
--
Tarun

>From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma <[email protected]>
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips
bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma <[email protected]>
---
drivers/gpio/gpio-omap.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
__raw_writel(wake_hi | bank->context.risingdetect,
bank->base + bank->regs->risingdetect);

- if (!bank->enabled_non_wakeup_gpios)
- goto update_gpio_context_count;
-
if (bank->power_mode != OFF_MODE) {
bank->power_mode = 0;
goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
*/
bank->saved_datain = __raw_readl(bank->base +
bank->regs->datain);
+
+ if (!bank->enabled_non_wakeup_gpios)
+ goto update_gpio_context_count;
+
l1 = bank->context.fallingdetect;
l2 = bank->context.risingdetect;

--
1.7.0.4



>
> Thanks,
> Franky
>
> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/
>


Attachments:
0001-gpio-omap-enabled_non_wakeup_gpios-check-skips-bank-.patch (1.52 kB)

2012-06-26 18:20:58

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <[email protected]> wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO WiFi
>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
>> mins during a network traffic test. After bisecting we found the culprit is
>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those two
>> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
>> kernel and the issue is still there. I can only "fix" the hang by either
>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>> happens on Panda ES board. Old Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
> I just had a quick look at the code. Can you please check if the
> attached patch solves
> the issue? I just boot tested on Panda and Blaze.
> --
> Tarun
>

Thanks for the prompt reply.

Booting is fine even without the patch and revert. The wifi dongle
generates interrupt whenever there is data packet available for host to
read. So during a traffic test a significant numbers of interrupt will
be triggered through the GPIO. So I assume it has something to do with
the interrupt GPIO.

With the patch, the kernel still crashes. But the symptom is slightly
different. Now it has a panic log every time. See attachment.

Regards,
Franky


Attachments:
panic.log (10.15 kB)

2012-06-28 23:10:34

by Franky Lin

[permalink] [raw]
Subject: Re: Panda ES board hang when using GPIO as interrupt

On 06/28/2012 03:59 PM, Jon Hunter wrote:
>
> On 06/28/2012 05:53 PM, Franky Lin wrote:
>> I found one interesting thing. When I added the print info to see when
>> runtime_suspend/resume get called, it seems like the suspend/resume is
>> unbalance during boot. Resume got called more than suspend. So I hack
>> the code to make sure suspend and resume are called in pair. A resume
>> without suspend will do nothing and return immediately. This also makes
>> the hang vanish.
>
> I am not 100% sure I follow. On boot I would expect to see a
> resume/suspend due to the probe on the irq bank and then I would expect
> to see another resume from the acquisition of the gpio, however, I would
> not expect a suspend until the gpio is freed, which I don't believe you
> are doing.
>
> Can you share your hack? Just paste the diff? This may help me
> understand more.
>

OK.
This is what I saw in the log:
[ 0.171844] dummy:
[ 0.172912] NET: Registered protocol family 16
[ 0.173431] GPMC revision 6.0
[ 0.173492] gpmc: irq-52 could not claim: err -22
[ 0.177551] ??????omap_gpio_runtime_resume
[ 0.178619] OMAP GPIO hardware version 0.1
[ 0.178649] !!!!!omap_gpio_runtime_suspend
[ 0.178771] ??????omap_gpio_runtime_resume
[ 0.179351] !!!!!omap_gpio_runtime_suspend
[ 0.179504] ??????omap_gpio_runtime_resume
[ 0.180023] !!!!!omap_gpio_runtime_suspend
[ 0.180145] ??????omap_gpio_runtime_resume
[ 0.180694] !!!!!omap_gpio_runtime_suspend
[ 0.180847] ??????omap_gpio_runtime_resume
[ 0.181365] !!!!!omap_gpio_runtime_suspend
[ 0.181518] ??????omap_gpio_runtime_resume
[ 0.182037] !!!!!omap_gpio_runtime_suspend
[ 0.185089] omap_mux_init: Add partition: #1: core, flags: 2
[ 0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
[ 0.186584] error setting wl12xx data: -38
[ 0.189788] _omap_mux_get_by_name: Could not find signal
uart1_rx.uart1_rx
[ 0.189788] _omap_mux_get_by_name: Could not find signal
uart1_rx.uart1_rx
[ 0.239501] ??????omap_gpio_runtime_resume
[ 0.239532] ??????omap_gpio_runtime_resume
[ 0.241058] usbhs_omap: alias fck already exists
[ 0.244781] ??????omap_gpio_runtime_resume

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..bca3985 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)

#if defined(CONFIG_PM_RUNTIME)
static void omap_gpio_restore_context(struct gpio_bank *bank);
-
+static int flag = 0;
static int omap_gpio_runtime_suspend(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
@@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
unsigned long flags;
u32 wake_low, wake_hi;

+ flag ++;
+
spin_lock_irqsave(&bank->lock, flags);

/*
@@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
*dev)
u32 l = 0, gen, gen0, gen1;
unsigned long flags;

+ if (flag)
+ flag--;
+ else
+ return 0;
+
spin_lock_irqsave(&bank->lock, flags);
_gpio_dbck_enable(bank);

Regards,
Franky