2019-02-22 17:07:57

by Rafał Miłecki

[permalink] [raw]
Subject: [PATCH] brcmfmac: run firmware state watchdog on the host machine

From: Rafał Miłecki <[email protected]>

FullMAC firmware may happen to crash due to some potential bugs exposed
by e.g. a specific traffic or host-requested setup. It usually results
in various timeouts & running our of resources (e.g. ring slots).

Monitoring firmware state allows handling such a situation more
gracefully. At this point the watchdog:
1) Prints a clear error message about a firmware crash
2) Tries to dump a crash info data

That should be helpful for users & should allow providing a valuable
reports for the Broadcom developers. It obviously doesn't really fix
anything.

This watchdog is important for USB & SDIO devices which don't have
anything alike implemented yet. It's also there to complement PCIe with
its FWHALT signal which does not work in all situations.

It has been successfully tested with the recently released
brcmfmac4366c-pcie.bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201853
Signed-off-by: Rafał Miłecki <[email protected]>
---
.../broadcom/brcm80211/brcmfmac/core.c | 28 +++++++++++++++++++
.../broadcom/brcm80211/brcmfmac/core.h | 2 ++
2 files changed, 30 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 4fbe8791f674..ab7a54b2d831 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -40,6 +40,7 @@
#include "common.h"

#define MAX_WAIT_FOR_8021X_TX msecs_to_jiffies(950)
+#define BRCMF_FW_WATCHDOG_POLL msecs_to_jiffies(1000)

#define BRCMF_BSSIDX_INVALID -1

@@ -1084,6 +1085,28 @@ static int brcmf_revinfo_read(struct seq_file *s, void *data)
return 0;
}

+static void brcmf_fw_watchdog(struct work_struct *work)
+{
+ struct brcmf_pub *drvr = container_of(work, struct brcmf_pub,
+ fw_watchdog.work);
+
+ if (drvr->bus_if->state == BRCMF_BUS_UP) {
+ struct brcmf_if *ifp = drvr->iflist[0];
+ s32 ver;
+ int err;
+
+ err = brcmf_fil_cmd_int_get(ifp, BRCMF_C_GET_VERSION, &ver);
+ if (err) {
+ bphy_err(drvr, "Firmware has most likely crashed and didn't respond: %d\n",
+ err);
+ brcmf_dev_coredump(drvr->bus_if->dev);
+ return;
+ }
+ }
+
+ schedule_delayed_work(&drvr->fw_watchdog, BRCMF_FW_WATCHDOG_POLL);
+}
+
static int brcmf_bus_started(struct brcmf_pub *drvr, struct cfg80211_ops *ops)
{
int ret = -1;
@@ -1155,6 +1178,9 @@ static int brcmf_bus_started(struct brcmf_pub *drvr, struct cfg80211_ops *ops)
#endif
#endif /* CONFIG_INET */

+ INIT_DELAYED_WORK(&drvr->fw_watchdog, brcmf_fw_watchdog);
+ schedule_delayed_work(&drvr->fw_watchdog, BRCMF_FW_WATCHDOG_POLL);
+
/* populate debugfs */
brcmf_debugfs_add_entry(drvr, "revinfo", brcmf_revinfo_read);
brcmf_feat_debugfs_create(drvr);
@@ -1284,6 +1310,8 @@ void brcmf_detach(struct device *dev)
if (drvr == NULL)
return;

+ cancel_delayed_work_sync(&drvr->fw_watchdog);
+
#ifdef CONFIG_INET
unregister_inetaddr_notifier(&drvr->inetaddr_notifier);
#endif
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
index d8085ce579f4..1b44d243d05f 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
@@ -143,6 +143,8 @@ struct brcmf_pub {
struct notifier_block inet6addr_notifier;
struct brcmf_mp_device *settings;

+ struct delayed_work fw_watchdog;
+
u8 clmver[BRCMF_DCMD_SMLEN];
};

--
2.20.1



2019-02-22 23:06:34

by Arend Van Spriel

[permalink] [raw]
Subject: Re: [PATCH] brcmfmac: run firmware state watchdog on the host machine

On 2/22/2019 6:07 PM, Rafał Miłecki wrote:
> From: Rafał Miłecki <[email protected]>
>
> FullMAC firmware may happen to crash due to some potential bugs exposed
> by e.g. a specific traffic or host-requested setup. It usually results
> in various timeouts & running our of resources (e.g. ring slots).
>
> Monitoring firmware state allows handling such a situation more
> gracefully. At this point the watchdog:
> 1) Prints a clear error message about a firmware crash
> 2) Tries to dump a crash info data

Hi Rafał,

I like the idea of having firmware crashes detected, but not sure I am a
big fan of the watchdog mechanism.

> That should be helpful for users & should allow providing a valuable
> reports for the Broadcom developers. It obviously doesn't really fix
> anything.

Agree on the value of it and it is new functionality indeed and not a fix.

> This watchdog is important for USB & SDIO devices which don't have
> anything alike implemented yet. It's also there to complement PCIe with
> its FWHALT signal which does not work in all situations.

Actually, SDIO does already have a watchdog, but it serves different
purpose (get firmware console, idle detect). There is also a
brcmf_sdio_checkdied() function to detect a firmware crash. It is being
called in brcmf_sdio_bus_rxctl() when firwmare does not respond as
expected [1]. Similar thing can be done for PCIE and I actually started
working on that last week. USB is a bit different as it will switch to
bootloader image when firmware crashes.

Regards,
Arend

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c#L3144