2019-05-24 11:13:48

by Brian Masney

[permalink] [raw]
Subject: [PATCH] mmc: sdhci: queue work after sdhci_defer_done()

WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
moved from using a tasklet to a work queue. That patch also changed
sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
sdhci_defer_done() is true. Change it to queue work to the complete work
queue if sdhci_defer_done() is true so that the functionality is
equilivent to what was there when the finish_tasklet was present. This
corrects the WiFi breakage on the Nexus 5 phone.

Signed-off-by: Brian Masney <[email protected]>
Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
---
See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for
details about how the WiFi is wired into sdhci on this platform.

bisect log:

git bisect start
# bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux
git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf
# bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block
git bisect bad 67a242223958d628f0ba33283668e3ddd192d057
# good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3
# good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q
git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc
# bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38
# good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e
# good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf
# bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400
git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d
# good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant
git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18
# good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data'
git bisect good ade024f130f742725da9219624b01666f04bc4a6
# bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling
git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb
# good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible
git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903
# bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume
git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b
# bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612
# first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet

drivers/mmc/host/sdhci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 97158344b862..3563c3bc57c9 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
continue;

if (sdhci_defer_done(host, mrq)) {
- result = IRQ_WAKE_THREAD;
+ queue_work(host->complete_wq, &host->complete_work);
} else {
mrqs_done[i] = mrq;
host->mrqs_done[i] = NULL;
--
2.20.1


2019-05-24 12:19:03

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done()

On 24/05/19 2:10 PM, Brian Masney wrote:
> WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
> to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
> moved from using a tasklet to a work queue. That patch also changed
> sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
> sdhci_defer_done() is true. Change it to queue work to the complete work
> queue if sdhci_defer_done() is true so that the functionality is
> equilivent to what was there when the finish_tasklet was present. This
> corrects the WiFi breakage on the Nexus 5 phone.
>
> Signed-off-by: Brian Masney <[email protected]>
> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
> ---
> See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for
> details about how the WiFi is wired into sdhci on this platform.
>
> bisect log:
>
> git bisect start
> # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb
> # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
> git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
> # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux
> git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf
> # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block
> git bisect bad 67a242223958d628f0ba33283668e3ddd192d057
> # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3
> # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q
> git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc
> # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38
> # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e
> # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf
> # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400
> git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d
> # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant
> git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18
> # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data'
> git bisect good ade024f130f742725da9219624b01666f04bc4a6
> # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling
> git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb
> # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible
> git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903
> # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume
> git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b
> # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
> git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612
> # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
>
> drivers/mmc/host/sdhci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 97158344b862..3563c3bc57c9 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
> continue;
>
> if (sdhci_defer_done(host, mrq)) {
> - result = IRQ_WAKE_THREAD;
> + queue_work(host->complete_wq, &host->complete_work);

The IRQ thread has a lot less latency than the work queue, which is why it
is done that way.

I am not sure why you say this change is equivalent to what was there
before, nor why it fixes your problem.

Can you explain some more?

> } else {
> mrqs_done[i] = mrq;
> host->mrqs_done[i] = NULL;
>

2019-05-24 13:03:52

by Brian Masney

[permalink] [raw]
Subject: Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done()

On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
> On 24/05/19 2:10 PM, Brian Masney wrote:
> > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
> > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
> > moved from using a tasklet to a work queue. That patch also changed
> > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
> > sdhci_defer_done() is true. Change it to queue work to the complete work
> > queue if sdhci_defer_done() is true so that the functionality is
> > equilivent to what was there when the finish_tasklet was present. This
> > corrects the WiFi breakage on the Nexus 5 phone.
> >
> > Signed-off-by: Brian Masney <[email protected]>
> > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
> > ---
> > See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for
> > details about how the WiFi is wired into sdhci on this platform.
> >
> > bisect log:
> >
> > git bisect start
> > # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> > git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb
> > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
> > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
> > # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux
> > git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf
> > # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block
> > git bisect bad 67a242223958d628f0ba33283668e3ddd192d057
> > # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3
> > # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q
> > git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc
> > # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> > git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38
> > # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> > git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e
> > # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> > git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf
> > # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400
> > git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d
> > # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant
> > git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18
> > # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data'
> > git bisect good ade024f130f742725da9219624b01666f04bc4a6
> > # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling
> > git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb
> > # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible
> > git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903
> > # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume
> > git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b
> > # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
> > git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612
> > # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
> >
> > drivers/mmc/host/sdhci.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > index 97158344b862..3563c3bc57c9 100644
> > --- a/drivers/mmc/host/sdhci.c
> > +++ b/drivers/mmc/host/sdhci.c
> > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
> > continue;
> >
> > if (sdhci_defer_done(host, mrq)) {
> > - result = IRQ_WAKE_THREAD;
> > + queue_work(host->complete_wq, &host->complete_work);
>
> The IRQ thread has a lot less latency than the work queue, which is why it
> is done that way.
>
> I am not sure why you say this change is equivalent to what was there
> before, nor why it fixes your problem.
>
> Can you explain some more?

What I meant by the equivalent change was that tasklet_schedule() used
to be called in this situation rather than returning IRQ_WAKE_THREAD.

I'm honestly not sure exactly what's going on yet. Without the patch I
sent out, wlan0 is not detected on the phone. Perhaps there is a subtle
race condition somewhere that is exposed with the reduction in latency
since I assume tasklet_schedule() is more expensive than doing the
processing in the bottom half?

I'll do some more digging and see if I can find more information. I sent
this out to get a discussion started.

Brian

2019-05-24 15:51:36

by Brian Masney

[permalink] [raw]
Subject: Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done()

On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
> On 24/05/19 2:10 PM, Brian Masney wrote:
> > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
> > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
> > moved from using a tasklet to a work queue. That patch also changed
> > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
> > sdhci_defer_done() is true. Change it to queue work to the complete work
> > queue if sdhci_defer_done() is true so that the functionality is
> > equilivent to what was there when the finish_tasklet was present. This
> > corrects the WiFi breakage on the Nexus 5 phone.
> >
> > Signed-off-by: Brian Masney <[email protected]>
> > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
> > ---
> > See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for
> > details about how the WiFi is wired into sdhci on this platform.
> >
> > bisect log:
> >
> > git bisect start
> > # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> > git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb
> > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
> > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
> > # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux
> > git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf
> > # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block
> > git bisect bad 67a242223958d628f0ba33283668e3ddd192d057
> > # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3
> > # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q
> > git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc
> > # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> > git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38
> > # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> > git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e
> > # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> > git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf
> > # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400
> > git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d
> > # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant
> > git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18
> > # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data'
> > git bisect good ade024f130f742725da9219624b01666f04bc4a6
> > # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling
> > git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb
> > # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible
> > git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903
> > # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume
> > git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b
> > # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
> > git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612
> > # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet
> >
> > drivers/mmc/host/sdhci.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > index 97158344b862..3563c3bc57c9 100644
> > --- a/drivers/mmc/host/sdhci.c
> > +++ b/drivers/mmc/host/sdhci.c
> > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
> > continue;
> >
> > if (sdhci_defer_done(host, mrq)) {
> > - result = IRQ_WAKE_THREAD;
> > + queue_work(host->complete_wq, &host->complete_work);
>
> The IRQ thread has a lot less latency than the work queue, which is why it
> is done that way.
>
> I am not sure why you say this change is equivalent to what was there
> before, nor why it fixes your problem.
>
> Can you explain some more?

drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
sdio_claim_host() and it appears to never return.

I'm not going to be able to look into this more today but I should have
time this weekend to dig in more with ftrace.

Brian

2019-05-26 12:23:51

by Brian Masney

[permalink] [raw]
Subject: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

+ Broadcom wireless maintainers

On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote:
> On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
> > On 24/05/19 2:10 PM, Brian Masney wrote:
> > > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
> > > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
> > > moved from using a tasklet to a work queue. That patch also changed
> > > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
> > > sdhci_defer_done() is true. Change it to queue work to the complete work
> > > queue if sdhci_defer_done() is true so that the functionality is
> > > equilivent to what was there when the finish_tasklet was present. This
> > > corrects the WiFi breakage on the Nexus 5 phone.
> > >
> > > Signed-off-by: Brian Masney <[email protected]>
> > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
> > > ---
> > > [ ... ]
> > >
> > > drivers/mmc/host/sdhci.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > > index 97158344b862..3563c3bc57c9 100644
> > > --- a/drivers/mmc/host/sdhci.c
> > > +++ b/drivers/mmc/host/sdhci.c
> > > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
> > > continue;
> > >
> > > if (sdhci_defer_done(host, mrq)) {
> > > - result = IRQ_WAKE_THREAD;
> > > + queue_work(host->complete_wq, &host->complete_work);
> >
> > The IRQ thread has a lot less latency than the work queue, which is why it
> > is done that way.
> >
> > I am not sure why you say this change is equivalent to what was there
> > before, nor why it fixes your problem.
> >
> > Can you explain some more?
>
> [ ... ]
>
> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
> sdio_claim_host() and it appears to never return.

When the brcmfmac driver is loaded, the firmware is requested from disk,
and that's when the deadlock occurs in 5.2rc1. Specifically:

1) brcmf_sdio_download_firmware() in
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
sdio_claim_host()

2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw()
tries to claim the host, but has to wait since its already claimed
in #1 and the deadlock occurs.

I tried to release the host before the firmware is requested, however
parts of brcmf_chip_set_active() needs the host to be claimed, and a
similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host
before calling brcmf_chip_set_active().

I started to look at moving the sdio_{claim,release}_host() calls out of
brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to
get feedback about the best course of action here.

Brian

2019-05-26 18:44:22

by Arend Van Spriel

[permalink] [raw]
Subject: Re: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

On 5/26/2019 2:21 PM, Brian Masney wrote:
> + Broadcom wireless maintainers
>
> On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote:
>> On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
>>> On 24/05/19 2:10 PM, Brian Masney wrote:
>>>> WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
>>>> to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
>>>> moved from using a tasklet to a work queue. That patch also changed
>>>> sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
>>>> sdhci_defer_done() is true. Change it to queue work to the complete work
>>>> queue if sdhci_defer_done() is true so that the functionality is
>>>> equilivent to what was there when the finish_tasklet was present. This
>>>> corrects the WiFi breakage on the Nexus 5 phone.
>>>>
>>>> Signed-off-by: Brian Masney <[email protected]>
>>>> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
>>>> ---
>>>> [ ... ]
>>>>
>>>> drivers/mmc/host/sdhci.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>>> index 97158344b862..3563c3bc57c9 100644
>>>> --- a/drivers/mmc/host/sdhci.c
>>>> +++ b/drivers/mmc/host/sdhci.c
>>>> @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
>>>> continue;
>>>>
>>>> if (sdhci_defer_done(host, mrq)) {
>>>> - result = IRQ_WAKE_THREAD;
>>>> + queue_work(host->complete_wq, &host->complete_work);
>>>
>>> The IRQ thread has a lot less latency than the work queue, which is why it
>>> is done that way.
>>>
>>> I am not sure why you say this change is equivalent to what was there
>>> before, nor why it fixes your problem.
>>>
>>> Can you explain some more?
>>
>> [ ... ]
>>
>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
>> sdio_claim_host() and it appears to never return.
>
> When the brcmfmac driver is loaded, the firmware is requested from disk,
> and that's when the deadlock occurs in 5.2rc1. Specifically:
>
> 1) brcmf_sdio_download_firmware() in
> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
> sdio_claim_host()
>
> 2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw()
> tries to claim the host, but has to wait since its already claimed
> in #1 and the deadlock occurs.

This does not make any sense to me. brcmf_sdio_download_firmware() is
called from brcmf_sdio_firmware_callback() so they are in the same
context. So #2 is not waiting for #1, but something else I would say.
Also #2 calls sdio_claim_host() after brcmf_sdio_download_firmware has
completed so definitely not waiting for #1.

> I tried to release the host before the firmware is requested, however
> parts of brcmf_chip_set_active() needs the host to be claimed, and a
> similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host
> before calling brcmf_chip_set_active().
>
> I started to look at moving the sdio_{claim,release}_host() calls out of
> brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to
> get feedback about the best course of action here.

Long ago Franky reworked the sdio critical sections requiring sdio
claim/release and I am pretty sure they are correct.

Could you try with lockdep kernel and see if that brings any more
information. In the mean time I will update my dev branch to 5.2-rc1 and
see if I can find any clues.

Regards,
Arend

2019-05-27 07:50:06

by Adrian Hunter

[permalink] [raw]
Subject: Re: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

On 26/05/19 10:58 PM, Brian Masney wrote:
> On Sun, May 26, 2019 at 08:42:21PM +0200, Arend Van Spriel wrote:
>> On 5/26/2019 2:21 PM, Brian Masney wrote:
>>> + Broadcom wireless maintainers
>>>
>>> On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote:
>>>> On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
>>>>> On 24/05/19 2:10 PM, Brian Masney wrote:
>>>>>> WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
>>>>>> to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
>>>>>> moved from using a tasklet to a work queue. That patch also changed
>>>>>> sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
>>>>>> sdhci_defer_done() is true. Change it to queue work to the complete work
>>>>>> queue if sdhci_defer_done() is true so that the functionality is
>>>>>> equilivent to what was there when the finish_tasklet was present. This
>>>>>> corrects the WiFi breakage on the Nexus 5 phone.
>>>>>>
>>>>>> Signed-off-by: Brian Masney <[email protected]>
>>>>>> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
>>>>>> ---
>>>>>> [ ... ]
>>>>>>
>>>>>> drivers/mmc/host/sdhci.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>>>>> index 97158344b862..3563c3bc57c9 100644
>>>>>> --- a/drivers/mmc/host/sdhci.c
>>>>>> +++ b/drivers/mmc/host/sdhci.c
>>>>>> @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
>>>>>> continue;
>>>>>> if (sdhci_defer_done(host, mrq)) {
>>>>>> - result = IRQ_WAKE_THREAD;
>>>>>> + queue_work(host->complete_wq, &host->complete_work);
>>>>>
>>>>> The IRQ thread has a lot less latency than the work queue, which is why it
>>>>> is done that way.
>>>>>
>>>>> I am not sure why you say this change is equivalent to what was there
>>>>> before, nor why it fixes your problem.
>>>>>
>>>>> Can you explain some more?
>>>>
>>>> [ ... ]
>>>>
>>>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
>>>> sdio_claim_host() and it appears to never return.

This is because SDHCI is using the IRQ thread to process the SDIO card
interrupt (sdio_run_irqs()). When the card driver tries to use the card, it
causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove
finish_tasklet") has moved the tasklet processing to the IRQ thread.

I would expect to be able to use the IRQ thread to complete requests, and it
is desirable to do so because it is lower latency.

Probably, SDHCI should use sdio_signal_irq() which queues a work item, and
is what other drivers are doing.

I will investigate some more and send a patch.

2019-05-28 20:13:52

by Ulf Hansson

[permalink] [raw]
Subject: Re: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

On Mon, 27 May 2019 at 14:50, Brian Masney <[email protected]> wrote:
>
> On Mon, May 27, 2019 at 03:08:07PM +0300, Adrian Hunter wrote:
> > On 27/05/19 12:37 PM, Brian Masney wrote:
> > > On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote:
> > >> I attached a patch that shows how I was able to determine what had
> > >> already claimed the host.
> > > On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote:
> > >> This is because SDHCI is using the IRQ thread to process the SDIO card
> > >> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it
> > >> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove
> > >> finish_tasklet") has moved the tasklet processing to the IRQ thread.
> > >>
> > >> I would expect to be able to use the IRQ thread to complete requests, and it
> > >> is desirable to do so because it is lower latency.
> > >>
> > >> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and
> > >> is what other drivers are doing.
> > >>
> > >> I will investigate some more and send a patch.
> >
> > Please try the patch below:
> >
> > From: Adrian Hunter <[email protected]>
> > Date: Mon, 27 May 2019 14:45:55 +0300
> > Subject: [PATCH] mmc: sdhci: Fix SDIO IRQ thread deadlock
> >
> > Since commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet"), the IRQ
> > thread might be used to complete requests, but the IRQ thread is also used
> > to process SDIO card interrupts. This can cause a deadlock when the SDIO
> > processing tries to access the card since that would also require the IRQ
> > thread. Change SDHCI to use sdio_signal_irq() to schedule a work item
> > instead. That also requires implementing the ->ack_sdio_irq() mmc host op.
> >
> > Signed-off-by: Adrian Hunter <[email protected]>
> > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
>
> Yes, this fixes the issue for me. You can add my:
>
> Reported-by: Brian Masney <[email protected]>
> Tested-by: Brian Masney <[email protected]>
>

Applied for fixes, thanks!

Kind regards
Uffe

2019-06-04 11:35:00

by Arend Van Spriel

[permalink] [raw]
Subject: Re: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

On 5/27/2019 2:08 PM, Adrian Hunter wrote:
> On 27/05/19 12:37 PM, Brian Masney wrote:
>> On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote:
>>> I attached a patch that shows how I was able to determine what had
>>> already claimed the host.
>> On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote:
>>> This is because SDHCI is using the IRQ thread to process the SDIO card
>>> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it
>>> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove
>>> finish_tasklet") has moved the tasklet processing to the IRQ thread.
>>>
>>> I would expect to be able to use the IRQ thread to complete requests, and it
>>> is desirable to do so because it is lower latency.
>>>
>>> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and
>>> is what other drivers are doing.
>>>
>>> I will investigate some more and send a patch.
>
> Please try the patch below:

Finally got time to update my kernel to 5.2-rc2. This patch indeed
resolves the issue.

Thanks,
Arend