On this v2 I've added documenation over the taint flag, and updated our
script which parses existing taint flags to describe what has happened
when this taint flag is found. I've also updated the location of the
taint flag on the qed driver and updated the reviews.
The changes are based on linux-next tag next-20200515. You can find
these changes on my tree:
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200515-taint-firmware-net
Luis Chamberlain (15):
taint: add module firmware crash taint support
ethernet/839: use new module_firmware_crashed()
bnx2x: use new module_firmware_crashed()
bnxt: use new module_firmware_crashed()
bna: use new module_firmware_crashed()
liquidio: use new module_firmware_crashed()
cxgb4: use new module_firmware_crashed()
ehea: use new module_firmware_crashed()
qed: use new module_firmware_crashed()
soc: qcom: ipa: use new module_firmware_crashed()
wimax/i2400m: use new module_firmware_crashed()
ath10k: use new module_firmware_crashed()
ath6kl: use new module_firmware_crashed()
brcm80211: use new module_firmware_crashed()
mwl8k: use new module_firmware_crashed()
Documentation/admin-guide/tainted-kernels.rst | 6 ++++++
drivers/net/ethernet/8390/axnet_cs.c | 4 +++-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 +
drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 +
drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++
drivers/net/ethernet/qlogic/qed/qed_mcp.c | 1 +
drivers/net/ipa/ipa_modem.c | 1 +
drivers/net/wimax/i2400m/rx.c | 1 +
drivers/net/wireless/ath/ath10k/pci.c | 2 ++
drivers/net/wireless/ath/ath10k/sdio.c | 2 ++
drivers/net/wireless/ath/ath10k/snoc.c | 1 +
drivers/net/wireless/ath/ath6kl/hif.c | 1 +
.../net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 +
drivers/net/wireless/marvell/mwl8k.c | 1 +
include/linux/kernel.h | 3 ++-
include/linux/module.h | 13 +++++++++++++
include/trace/events/module.h | 3 ++-
kernel/module.c | 5 +++--
kernel/panic.c | 1 +
tools/debugging/kernel-chktaint | 7 +++++++
23 files changed, 55 insertions(+), 5 deletions(-)
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Ariel Elior <[email protected]>
Cc: Sudarsana Kalluru <[email protected]>
CC: [email protected]
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index db5107e7937c..c38b8c9c8af0 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -909,6 +909,7 @@ void bnx2x_panic_dump(struct bnx2x *bp, bool disable_int)
bp->eth_stats.unrecoverable_error++;
DP(BNX2X_MSG_STATS, "stats_state - DISABLED\n");
+ module_firmware_crashed();
BNX2X_ERR("begin crash dump -----------------\n");
/* Indices */
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Derek Chickles <[email protected]>
Cc: Satanand Burla <[email protected]>
Cc: Felix Manlunas <[email protected]>
Reviewed-by: Derek Chickles <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 66d31c018c7e..f18085262982 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -801,6 +801,7 @@ static int liquidio_watchdog(void *param)
continue;
WRITE_ONCE(oct->cores_crashed, true);
+ module_firmware_crashed();
other_oct = get_other_octeon_device(oct);
if (other_oct)
WRITE_ONCE(other_oct->cores_crashed, true);
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Rasesh Mody <[email protected]>
Cc: Sudarsana Kalluru <[email protected]>
Cc: [email protected]
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/brocade/bna/bfa_ioc.c b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
index e17bfc87da90..b3f44a912574 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_ioc.c
+++ b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
@@ -927,6 +927,7 @@ bfa_iocpf_sm_disabled(struct bfa_iocpf *iocpf, enum iocpf_event event)
static void
bfa_iocpf_sm_initfail_sync_entry(struct bfa_iocpf *iocpf)
{
+ module_firmware_crashed();
bfa_nw_ioc_debug_save_ftrc(iocpf->ioc);
bfa_ioc_hw_sem_get(iocpf->ioc);
}
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Douglas Miller <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index 0273fb7a9d01..6ae35067003f 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3285,6 +3285,8 @@ static void ehea_crash_handler(void)
{
int i;
+ module_firmware_crashed();
+
if (ehea_fw_handles.arr)
for (i = 0; i < ehea_fw_handles.num_entries; i++)
ehea_h_free_resource(ehea_fw_handles.arr[i].adh,
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: [email protected]
Cc: [email protected]
Cc: Kalle Valo <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/wireless/ath/ath6kl/hif.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath6kl/hif.c b/drivers/net/wireless/ath/ath6kl/hif.c
index d1942537ea10..cfd838607544 100644
--- a/drivers/net/wireless/ath/ath6kl/hif.c
+++ b/drivers/net/wireless/ath/ath6kl/hif.c
@@ -120,6 +120,7 @@ static int ath6kl_hif_proc_dbg_intr(struct ath6kl_device *dev)
int ret;
ath6kl_warn("firmware crashed\n");
+ module_firmware_crashed();
/*
* read counter to clear the interrupt, the debug error interrupt is
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arend van Spriel <[email protected]>
Cc: Franky Lin <[email protected]>
Cc: Hante Meuleman <[email protected]>
Cc: Chi-Hsien Lin <[email protected]>
Cc: Wright Feng <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: "Rafał Miłecki" <[email protected]>
Cc: Pieter-Paul Giesberts <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index c88655acc78c..d623f83568b3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -1393,6 +1393,7 @@ void brcmf_fw_crashed(struct device *dev)
struct brcmf_pub *drvr = bus_if->drvr;
bphy_err(drvr, "Firmware has halted or crashed\n");
+ module_firmware_crashed();
brcmf_dev_coredump(dev);
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: [email protected]
Cc: Inaky Perez-Gonzalez <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/wimax/i2400m/rx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c
index c9fb619a9e01..307a62ca183f 100644
--- a/drivers/net/wimax/i2400m/rx.c
+++ b/drivers/net/wimax/i2400m/rx.c
@@ -712,6 +712,7 @@ void __i2400m_roq_queue(struct i2400m *i2400m, struct i2400m_roq *roq,
dev_err(dev, "SW BUG? failed to insert packet\n");
dev_err(dev, "ERX: roq %p [ws %u] skb %p nsn %d sn %u\n",
roq, roq->ws, skb, nsn, roq_data->sn);
+ module_firmware_crashed();
skb_queue_walk(&roq->queue, skb_itr) {
roq_data_itr = (struct i2400m_roq_data *) &skb_itr->cb;
nsn_itr = __i2400m_roq_nsn(roq, roq_data_itr->sn);
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Vishal Kulkarni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index a70018f067aa..c67fc86c0e42 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3646,6 +3646,7 @@ void t4_fatal_err(struct adapter *adap)
* could be exposed to the adapter. RDMA MWs for example...
*/
t4_shutdown_adapter(adap);
+ module_firmware_crashed();
for_each_port(adap, port) {
struct net_device *dev = adap->port[port];
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Michael Chan <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index dd0c3f227009..5ba1bd0734e9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -3503,6 +3503,7 @@ static int bnxt_get_dump_data(struct net_device *dev, struct ethtool_dump *dump,
dump->flag = bp->dump_flag;
if (dump->flag == BNXT_DUMP_CRASH) {
+ module_firmware_crashed();
#ifdef CONFIG_TEE_BNXT_FW
return tee_bnxt_copy_coredump(buf, 0, dump->len);
#endif
--
2.26.2
Device driver firmware can crash, and sometimes, this can leave your
system in a state which makes the device or subsystem completely
useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
of scraping some magical words from the kernel log, which is driver
specific, is much easier. So instead provide a helper which lets drivers
annotate this.
Once this happens, scrapers can easily look for modules taint flags
for a firmware crash. This will taint both the kernel and respective
calling module.
The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
fact should in no way shape or form affect lockdep. This taint is device
driver specific.
Signed-off-by: Luis Chamberlain <[email protected]>
---
Documentation/admin-guide/tainted-kernels.rst | 6 ++++++
include/linux/kernel.h | 3 ++-
include/linux/module.h | 13 +++++++++++++
include/trace/events/module.h | 3 ++-
kernel/module.c | 5 +++--
kernel/panic.c | 1 +
tools/debugging/kernel-chktaint | 7 +++++++
7 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 71e9184a9079..92530f1d60ae 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted
15 _/K 32768 kernel has been live patched
16 _/X 65536 auxiliary taint, defined for and used by distros
17 _/T 131072 kernel was built with the struct randomization plugin
+ 18 _/Q 262144 driver firmware crash annotation
=== === ====== ========================================================
Note: The character ``_`` is representing a blank in this table to make reading
@@ -162,3 +163,8 @@ More detailed explanation for tainting
produce extremely unusual kernel structure layouts (even performance
pathological ones), which is important to know when debugging. Set at
build time.
+
+ 18) ``Q`` used by device drivers to annotate that the device driver's firmware
+ has crashed and the device's operation has been severely affected. The
+ device may be left in a crippled state, requiring full driver removal /
+ addition, system reboot, or it is unclear how long recovery will take.
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 04a5885cec1b..19e1541c82c7 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -601,7 +601,8 @@ extern enum system_states {
#define TAINT_LIVEPATCH 15
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
-#define TAINT_FLAGS_COUNT 18
+#define TAINT_FIRMWARE_CRASH 18
+#define TAINT_FLAGS_COUNT 19
struct taint_flag {
char c_true; /* character printed when tainted */
diff --git a/include/linux/module.h b/include/linux/module.h
index 2c2e988bcf10..221200078180 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module *mod)
bool is_module_sig_enforced(void);
void set_module_sig_enforced(void);
+void add_taint_module(struct module *mod, unsigned flag,
+ enum lockdep_ok lockdep_ok);
+
+static inline void module_firmware_crashed(void)
+{
+ add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
+}
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -844,6 +852,11 @@ void *dereference_module_function_descriptor(struct module *mod, void *ptr)
return ptr;
}
+static inline void module_firmware_crashed(void)
+{
+ add_taint(TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
+}
+
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/include/trace/events/module.h b/include/trace/events/module.h
index 097485c73c01..b749ea25affd 100644
--- a/include/trace/events/module.h
+++ b/include/trace/events/module.h
@@ -26,7 +26,8 @@ struct module;
{ (1UL << TAINT_OOT_MODULE), "O" }, \
{ (1UL << TAINT_FORCED_MODULE), "F" }, \
{ (1UL << TAINT_CRAP), "C" }, \
- { (1UL << TAINT_UNSIGNED_MODULE), "E" })
+ { (1UL << TAINT_UNSIGNED_MODULE), "E" }, \
+ { (1UL << TAINT_FIRMWARE_CRASH), "Q" })
TRACE_EVENT(module_load,
diff --git a/kernel/module.c b/kernel/module.c
index 80faaf2116dd..f98e8c25c6b4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -325,12 +325,13 @@ static inline int strong_try_module_get(struct module *mod)
return -ENOENT;
}
-static inline void add_taint_module(struct module *mod, unsigned flag,
- enum lockdep_ok lockdep_ok)
+void add_taint_module(struct module *mod, unsigned flag,
+ enum lockdep_ok lockdep_ok)
{
add_taint(flag, lockdep_ok);
set_bit(flag, &mod->taints);
}
+EXPORT_SYMBOL_GPL(add_taint_module);
/*
* A thread that wants to hold a reference to a module only while it
diff --git a/kernel/panic.c b/kernel/panic.c
index ec6d7d788ce7..504fb926947e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -384,6 +384,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
[ TAINT_LIVEPATCH ] = { 'K', ' ', true },
[ TAINT_AUX ] = { 'X', ' ', true },
[ TAINT_RANDSTRUCT ] = { 'T', ' ', true },
+ [ TAINT_FIRMWARE_CRASH ] = { 'Q', ' ', true },
};
/**
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 2240cb56e6e5..c397c6aabea7 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -194,6 +194,13 @@ else
addout "T"
echo " * kernel was built with the struct randomization plugin (#17)"
fi
+T=`expr $T / 2`
+if [ `expr $T % 2` -eq 0 ]; then
+ addout " "
+else
+ addout "Q"
+ echo " * a device driver's firmware has crashed (#18)"
+fi
echo "For a more detailed explanation of the various taint flags see"
echo " Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources"
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Alex Elder <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ipa/ipa_modem.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index ed10818dd99f..1790b87446ed 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -285,6 +285,7 @@ static void ipa_modem_crashed(struct ipa *ipa)
struct device *dev = &ipa->pdev->dev;
int ret;
+ module_firmware_crashed();
ipa_endpoint_modem_pause_all(ipa, true);
ipa_endpoint_modem_hol_block_clear_all(ipa);
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: Ariel Elior <[email protected]>
Cc: [email protected]
Reviewed-by: Igor Russkikh <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/qlogic/qed/qed_mcp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 9624616806e7..aea200d465ef 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -566,6 +566,7 @@ _qed_mcp_cmd_and_union(struct qed_hwfn *p_hwfn,
DP_NOTICE(p_hwfn,
"The MFW failed to respond to command 0x%08x [param 0x%08x].\n",
p_mb_params->cmd, p_mb_params->param);
+ module_firmware_crashed();
qed_mcp_print_cpu_info(p_hwfn, p_ptt);
spin_lock_bh(&p_hwfn->mcp_info->cmd_lock);
--
2.26.2
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.
Using a taint flag allows us to annotate when this happens clearly.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Shannon Nelson <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Heiner Kallweit <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
drivers/net/ethernet/8390/axnet_cs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/8390/axnet_cs.c b/drivers/net/ethernet/8390/axnet_cs.c
index aeae7966a082..8ad0200db8e9 100644
--- a/drivers/net/ethernet/8390/axnet_cs.c
+++ b/drivers/net/ethernet/8390/axnet_cs.c
@@ -1358,9 +1358,11 @@ static void ei_receive(struct net_device *dev)
*/
if ((netif_msg_rx_err(ei_local)) &&
this_frame != ei_local->current_page &&
- (this_frame != 0x0 || rxing_page != 0xFF))
+ (this_frame != 0x0 || rxing_page != 0xFF)) {
+ module_firmware_crashed();
netdev_err(dev, "mismatched read page pointers %2x vs %2x\n",
this_frame, ei_local->current_page);
+ }
if (this_frame == rxing_page) /* Read all the frames? */
break; /* Done for now */
--
2.26.2
On Fri, May 15, 2020 at 09:28:32PM +0000, Luis Chamberlain wrote:
> Device driver firmware can crash, and sometimes, this can leave your
> system in a state which makes the device or subsystem completely
> useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> of scraping some magical words from the kernel log, which is driver
> specific, is much easier. So instead provide a helper which lets drivers
> annotate this.
>
> Once this happens, scrapers can easily look for modules taint flags
> for a firmware crash. This will taint both the kernel and respective
> calling module.
>
> The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
> fact should in no way shape or form affect lockdep. This taint is device
> driver specific.
>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> Documentation/admin-guide/tainted-kernels.rst | 6 ++++++
> include/linux/kernel.h | 3 ++-
> include/linux/module.h | 13 +++++++++++++
> include/trace/events/module.h | 3 ++-
> kernel/module.c | 5 +++--
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 7 +++++++
> 7 files changed, 34 insertions(+), 4 deletions(-)
>
Reviewed-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:33PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Shannon Nelson <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Heiner Kallweit <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/8390/axnet_cs.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/8390/axnet_cs.c b/drivers/net/ethernet/8390/axnet_cs.c
> index aeae7966a082..8ad0200db8e9 100644
> --- a/drivers/net/ethernet/8390/axnet_cs.c
> +++ b/drivers/net/ethernet/8390/axnet_cs.c
> @@ -1358,9 +1358,11 @@ static void ei_receive(struct net_device *dev)
> */
> if ((netif_msg_rx_err(ei_local)) &&
> this_frame != ei_local->current_page &&
> - (this_frame != 0x0 || rxing_page != 0xFF))
> + (this_frame != 0x0 || rxing_page != 0xFF)) {
> + module_firmware_crashed();
> netdev_err(dev, "mismatched read page pointers %2x vs %2x\n",
> this_frame, ei_local->current_page);
> + }
>
> if (this_frame == rxing_page) /* Read all the frames? */
> break; /* Done for now */
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:34PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Ariel Elior <[email protected]>
> Cc: Sudarsana Kalluru <[email protected]>
> CC: [email protected]
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index db5107e7937c..c38b8c9c8af0 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -909,6 +909,7 @@ void bnx2x_panic_dump(struct bnx2x *bp, bool disable_int)
> bp->eth_stats.unrecoverable_error++;
> DP(BNX2X_MSG_STATS, "stats_state - DISABLED\n");
>
> + module_firmware_crashed();
> BNX2X_ERR("begin crash dump -----------------\n");
>
> /* Indices */
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:36PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Rasesh Mody <[email protected]>
> Cc: Sudarsana Kalluru <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/brocade/bna/bfa_ioc.c b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
> index e17bfc87da90..b3f44a912574 100644
> --- a/drivers/net/ethernet/brocade/bna/bfa_ioc.c
> +++ b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
> @@ -927,6 +927,7 @@ bfa_iocpf_sm_disabled(struct bfa_iocpf *iocpf, enum iocpf_event event)
> static void
> bfa_iocpf_sm_initfail_sync_entry(struct bfa_iocpf *iocpf)
> {
> + module_firmware_crashed();
> bfa_nw_ioc_debug_save_ftrc(iocpf->ioc);
> bfa_ioc_hw_sem_get(iocpf->ioc);
> }
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:37PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Derek Chickles <[email protected]>
> Cc: Satanand Burla <[email protected]>
> Cc: Felix Manlunas <[email protected]>
> Reviewed-by: Derek Chickles <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> index 66d31c018c7e..f18085262982 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> @@ -801,6 +801,7 @@ static int liquidio_watchdog(void *param)
> continue;
>
> WRITE_ONCE(oct->cores_crashed, true);
> + module_firmware_crashed();
> other_oct = get_other_octeon_device(oct);
> if (other_oct)
> WRITE_ONCE(other_oct->cores_crashed, true);
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:35PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Michael Chan <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> index dd0c3f227009..5ba1bd0734e9 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> @@ -3503,6 +3503,7 @@ static int bnxt_get_dump_data(struct net_device *dev, struct ethtool_dump *dump,
>
> dump->flag = bp->dump_flag;
> if (dump->flag == BNXT_DUMP_CRASH) {
> + module_firmware_crashed();
> #ifdef CONFIG_TEE_BNXT_FW
> return tee_bnxt_copy_coredump(buf, 0, dump->len);
> #endif
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:38PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Vishal Kulkarni <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index a70018f067aa..c67fc86c0e42 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -3646,6 +3646,7 @@ void t4_fatal_err(struct adapter *adap)
> * could be exposed to the adapter. RDMA MWs for example...
> */
> t4_shutdown_adapter(adap);
> + module_firmware_crashed();
> for_each_port(adap, port) {
> struct net_device *dev = adap->port[port];
>
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:39PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Douglas Miller <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 0273fb7a9d01..6ae35067003f 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -3285,6 +3285,8 @@ static void ehea_crash_handler(void)
> {
> int i;
>
> + module_firmware_crashed();
> +
> if (ehea_fw_handles.arr)
> for (i = 0; i < ehea_fw_handles.num_entries; i++)
> ehea_h_free_resource(ehea_fw_handles.arr[i].adh,
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:41PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Alex Elder <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ipa/ipa_modem.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
> index ed10818dd99f..1790b87446ed 100644
> --- a/drivers/net/ipa/ipa_modem.c
> +++ b/drivers/net/ipa/ipa_modem.c
> @@ -285,6 +285,7 @@ static void ipa_modem_crashed(struct ipa *ipa)
> struct device *dev = &ipa->pdev->dev;
> int ret;
>
> + module_firmware_crashed();
> ipa_endpoint_modem_pause_all(ipa, true);
>
> ipa_endpoint_modem_hol_block_clear_all(ipa);
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:42PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: [email protected]
> Cc: Inaky Perez-Gonzalez <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/wimax/i2400m/rx.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c
> index c9fb619a9e01..307a62ca183f 100644
> --- a/drivers/net/wimax/i2400m/rx.c
> +++ b/drivers/net/wimax/i2400m/rx.c
> @@ -712,6 +712,7 @@ void __i2400m_roq_queue(struct i2400m *i2400m, struct i2400m_roq *roq,
> dev_err(dev, "SW BUG? failed to insert packet\n");
> dev_err(dev, "ERX: roq %p [ws %u] skb %p nsn %d sn %u\n",
> roq, roq->ws, skb, nsn, roq_data->sn);
> + module_firmware_crashed();
> skb_queue_walk(&roq->queue, skb_itr) {
> roq_data_itr = (struct i2400m_roq_data *) &skb_itr->cb;
> nsn_itr = __i2400m_roq_nsn(roq, roq_data_itr->sn);
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:40PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Ariel Elior <[email protected]>
> Cc: [email protected]
> Reviewed-by: Igor Russkikh <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/qlogic/qed/qed_mcp.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> index 9624616806e7..aea200d465ef 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
> @@ -566,6 +566,7 @@ _qed_mcp_cmd_and_union(struct qed_hwfn *p_hwfn,
> DP_NOTICE(p_hwfn,
> "The MFW failed to respond to command 0x%08x [param 0x%08x].\n",
> p_mb_params->cmd, p_mb_params->param);
> + module_firmware_crashed();
> qed_mcp_print_cpu_info(p_hwfn, p_ptt);
>
> spin_lock_bh(&p_hwfn->mcp_info->cmd_lock);
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Fri, May 15, 2020 at 09:28:45PM +0000, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arend van Spriel <[email protected]>
> Cc: Franky Lin <[email protected]>
> Cc: Hante Meuleman <[email protected]>
> Cc: Chi-Hsien Lin <[email protected]>
> Cc: Wright Feng <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: "Rafał Miłecki" <[email protected]>
> Cc: Pieter-Paul Giesberts <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> index c88655acc78c..d623f83568b3 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> @@ -1393,6 +1393,7 @@ void brcmf_fw_crashed(struct device *dev)
> struct brcmf_pub *drvr = bus_if->drvr;
>
> bphy_err(drvr, "Firmware has halted or crashed\n");
> + module_firmware_crashed();
>
> brcmf_dev_coredump(dev);
>
> --
> 2.26.2
>
Acked-by: Rafael Aquini <[email protected]>
On Sat, May 16, 2020 at 3:00 AM Luis Chamberlain <[email protected]> wrote:
>
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Michael Chan <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> index dd0c3f227009..5ba1bd0734e9 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> @@ -3503,6 +3503,7 @@ static int bnxt_get_dump_data(struct net_device *dev, struct ethtool_dump *dump,
>
> dump->flag = bp->dump_flag;
> if (dump->flag == BNXT_DUMP_CRASH) {
> + module_firmware_crashed();
This is not the right place to annotate the taint flag.
Here the driver is just copying the dump after error recovery which is collected
by firmware to DDR, when firmware detects fatal conditions. Driver and firmware
will be healthy when the user calls this command.
Also, users can call this command a thousand times when there is no crash.
I will propose a patch to use this wrapper in the error recovery path,
where the driver
may not be able to recover.
> #ifdef CONFIG_TEE_BNXT_FW
> return tee_bnxt_copy_coredump(buf, 0, dump->len);
> #endif
> --
> 2.26.2
>
Nacked-by: Vasundhara Volam <[email protected]>
+++ Luis Chamberlain [15/05/20 21:28 +0000]:
>Device driver firmware can crash, and sometimes, this can leave your
>system in a state which makes the device or subsystem completely
>useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
>of scraping some magical words from the kernel log, which is driver
>specific, is much easier. So instead provide a helper which lets drivers
>annotate this.
>
>Once this happens, scrapers can easily look for modules taint flags
>for a firmware crash. This will taint both the kernel and respective
>calling module.
>
>The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
>fact should in no way shape or form affect lockdep. This taint is device
>driver specific.
>
>Signed-off-by: Luis Chamberlain <[email protected]>
>---
> Documentation/admin-guide/tainted-kernels.rst | 6 ++++++
> include/linux/kernel.h | 3 ++-
> include/linux/module.h | 13 +++++++++++++
> include/trace/events/module.h | 3 ++-
> kernel/module.c | 5 +++--
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 7 +++++++
> 7 files changed, 34 insertions(+), 4 deletions(-)
>
>diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
>index 71e9184a9079..92530f1d60ae 100644
>--- a/Documentation/admin-guide/tainted-kernels.rst
>+++ b/Documentation/admin-guide/tainted-kernels.rst
>@@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted
> 15 _/K 32768 kernel has been live patched
> 16 _/X 65536 auxiliary taint, defined for and used by distros
> 17 _/T 131072 kernel was built with the struct randomization plugin
>+ 18 _/Q 262144 driver firmware crash annotation
> === === ====== ========================================================
>
> Note: The character ``_`` is representing a blank in this table to make reading
>@@ -162,3 +163,8 @@ More detailed explanation for tainting
> produce extremely unusual kernel structure layouts (even performance
> pathological ones), which is important to know when debugging. Set at
> build time.
>+
>+ 18) ``Q`` used by device drivers to annotate that the device driver's firmware
>+ has crashed and the device's operation has been severely affected. The
>+ device may be left in a crippled state, requiring full driver removal /
>+ addition, system reboot, or it is unclear how long recovery will take.
>diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>index 04a5885cec1b..19e1541c82c7 100644
>--- a/include/linux/kernel.h
>+++ b/include/linux/kernel.h
>@@ -601,7 +601,8 @@ extern enum system_states {
> #define TAINT_LIVEPATCH 15
> #define TAINT_AUX 16
> #define TAINT_RANDSTRUCT 17
>-#define TAINT_FLAGS_COUNT 18
>+#define TAINT_FIRMWARE_CRASH 18
>+#define TAINT_FLAGS_COUNT 19
>
> struct taint_flag {
> char c_true; /* character printed when tainted */
>diff --git a/include/linux/module.h b/include/linux/module.h
>index 2c2e988bcf10..221200078180 100644
>--- a/include/linux/module.h
>+++ b/include/linux/module.h
>@@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module *mod)
> bool is_module_sig_enforced(void);
> void set_module_sig_enforced(void);
>
>+void add_taint_module(struct module *mod, unsigned flag,
>+ enum lockdep_ok lockdep_ok);
>+
>+static inline void module_firmware_crashed(void)
>+{
>+ add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
>+}
Just a nit: I think module_firmware_crashed() is a confusing name - it
doesn't really tell me what it's doing, and it's not really related to
the rest of the module_* symbols, which mostly have to do with module
loader/module specifics. Especially since a driver can be built-in, too.
How about taint_firmware_crashed() or something similar?
Also, I think we might crash in add_taint_module() if a driver is
built into the kernel, because THIS_MODULE will be null and there is
no null pointer check in add_taint_module(). We could unify the
CONFIG_MODULES and !CONFIG_MODULES stubs and either add an `if (mod)`
check in add_taint_module() or add an #ifdef MODULE check in the stub
itself to call add_taint() or add_taint_module() as appropriate. Hope
that makes sense.
Thanks!
Jessica
On 5/15/20 4:28 PM, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
I don't fully understand what this is meant to do, so I can't
fully assess whether it's the right thing to do.
But in this particular place in the IPA code, the *modem* has
crashed. And the IPA driver is not responsible for modem
firmware, remoteproc is.
The IPA driver *can* be responsible for loading some other
firmware, but even in that case, it only happens on initial
boot, and it's basically assumed to never crash.
So regardless of whether this module_firmware_crashed() call is
appropriate in some places, I believe it should not be used here.
-Alex
>
> Cc: Alex Elder <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> drivers/net/ipa/ipa_modem.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
> index ed10818dd99f..1790b87446ed 100644
> --- a/drivers/net/ipa/ipa_modem.c
> +++ b/drivers/net/ipa/ipa_modem.c
> @@ -285,6 +285,7 @@ static void ipa_modem_crashed(struct ipa *ipa)
> struct device *dev = &ipa->pdev->dev;
> int ret;
>
> + module_firmware_crashed();
> ipa_endpoint_modem_pause_all(ipa, true);
>
> ipa_endpoint_modem_hol_block_clear_all(ipa);
>
On Tue, May 19, 2020 at 06:42:31PM +0200, Jessica Yu wrote:
> +++ Luis Chamberlain [15/05/20 21:28 +0000]:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead provide a helper which lets drivers
> > annotate this.
> >
> > Once this happens, scrapers can easily look for modules taint flags
> > for a firmware crash. This will taint both the kernel and respective
> > calling module.
> >
> > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
> > fact should in no way shape or form affect lockdep. This taint is device
> > driver specific.
> >
> > Signed-off-by: Luis Chamberlain <[email protected]>
> > ---
> > Documentation/admin-guide/tainted-kernels.rst | 6 ++++++
> > include/linux/kernel.h | 3 ++-
> > include/linux/module.h | 13 +++++++++++++
> > include/trace/events/module.h | 3 ++-
> > kernel/module.c | 5 +++--
> > kernel/panic.c | 1 +
> > tools/debugging/kernel-chktaint | 7 +++++++
> > 7 files changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> > index 71e9184a9079..92530f1d60ae 100644
> > --- a/Documentation/admin-guide/tainted-kernels.rst
> > +++ b/Documentation/admin-guide/tainted-kernels.rst
> > @@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted
> > 15 _/K 32768 kernel has been live patched
> > 16 _/X 65536 auxiliary taint, defined for and used by distros
> > 17 _/T 131072 kernel was built with the struct randomization plugin
> > + 18 _/Q 262144 driver firmware crash annotation
> > === === ====== ========================================================
> >
> > Note: The character ``_`` is representing a blank in this table to make reading
> > @@ -162,3 +163,8 @@ More detailed explanation for tainting
> > produce extremely unusual kernel structure layouts (even performance
> > pathological ones), which is important to know when debugging. Set at
> > build time.
> > +
> > + 18) ``Q`` used by device drivers to annotate that the device driver's firmware
> > + has crashed and the device's operation has been severely affected. The
> > + device may be left in a crippled state, requiring full driver removal /
> > + addition, system reboot, or it is unclear how long recovery will take.
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 04a5885cec1b..19e1541c82c7 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -601,7 +601,8 @@ extern enum system_states {
> > #define TAINT_LIVEPATCH 15
> > #define TAINT_AUX 16
> > #define TAINT_RANDSTRUCT 17
> > -#define TAINT_FLAGS_COUNT 18
> > +#define TAINT_FIRMWARE_CRASH 18
> > +#define TAINT_FLAGS_COUNT 19
> >
> > struct taint_flag {
> > char c_true; /* character printed when tainted */
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 2c2e988bcf10..221200078180 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module *mod)
> > bool is_module_sig_enforced(void);
> > void set_module_sig_enforced(void);
> >
> > +void add_taint_module(struct module *mod, unsigned flag,
> > + enum lockdep_ok lockdep_ok);
> > +
> > +static inline void module_firmware_crashed(void)
> > +{
> > + add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
> > +}
>
> Just a nit: I think module_firmware_crashed() is a confusing name - it
> doesn't really tell me what it's doing, and it's not really related to
> the rest of the module_* symbols, which mostly have to do with module
> loader/module specifics. Especially since a driver can be built-in, too.
> How about taint_firmware_crashed() or something similar?
Sure.
> Also, I think we might crash in add_taint_module() if a driver is
> built into the kernel, because THIS_MODULE will be null and there is
> no null pointer check in add_taint_module(). We could unify the
> CONFIG_MODULES and !CONFIG_MODULES stubs and either add an `if (mod)`
> check in add_taint_module() or add an #ifdef MODULE check in the stub
> itself to call add_taint() or add_taint_module() as appropriate. Hope
> that makes sense.
I had to do something a bit different but I think you'll agree with it.
Will include it in my next iteration.
Luis
On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
> On 5/15/20 4:28 PM, Luis Chamberlain wrote:
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> >
> > Using a taint flag allows us to annotate when this happens clearly.
>
> I don't fully understand what this is meant to do, so I can't
> fully assess whether it's the right thing to do.
It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.
>
> But in this particular place in the IPA code, the *modem* has
> crashed. And the IPA driver is not responsible for modem
> firmware, remoteproc is.
Oi vei. So the device it depends on has crashed.
> The IPA driver *can* be responsible for loading some other
> firmware, but even in that case, it only happens on initial
> boot, and it's basically assumed to never crash.
OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.
This patch set is missing uevents for these issues, but I just added
support for this.
> So regardless of whether this module_firmware_crashed() call is
> appropriate in some places, I believe it should not be used here.
OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?
Luis
On 5/22/20 12:28 AM, Luis Chamberlain wrote:
> On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
>> On 5/15/20 4:28 PM, Luis Chamberlain wrote:
>>> This makes use of the new module_firmware_crashed() to help
>>> annotate when firmware for device drivers crash. When firmware
>>> crashes devices can sometimes become unresponsive, and recovery
>>> sometimes requires a driver unload / reload and in the worst cases
>>> a reboot.
>>>
>>> Using a taint flag allows us to annotate when this happens clearly.
>>
>> I don't fully understand what this is meant to do, so I can't
>> fully assess whether it's the right thing to do.
>
> It is meant to taint the kernel to ensure it is clear that something
> critically bad has happened with the device firmware, it crashed, and
> recovery may or may not happen, we are not 100% certain.
>>
>> But in this particular place in the IPA code, the *modem* has
>> crashed. And the IPA driver is not responsible for modem
>> firmware, remoteproc is.
>
> Oi vei. So the device it depends on has crashed.
Yes, more or less. It could be considered a little more
nuanced than even that, but I won't get into it here.
>> The IPA driver *can* be responsible for loading some other
>> firmware, but even in that case, it only happens on initial
>> boot, and it's basically assumed to never crash.
>
> OK is this an issue which we can recover from? If for the slightest bit
> this can affect users it is something we should inform them over.
If the IPA driver successfully loads this firmware, it should
be assumed to never crash. So in that respect, there is no
recovery required.
Again, the modem (with which the IPA hardware and driver
interacts) can crash, or it can be shut down intentionally.
And in either case it can recover, automatically or manually.
But all of that (and the modem's firmware loading) is the
responsibility of the remoteproc subsystem, not IPA.
> This patch set is missing uevents for these issues, but I just added
> support for this.
>
>> So regardless of whether this module_firmware_crashed() call is
>> appropriate in some places, I believe it should not be used here.
>
> OK thanks. Can the user be affected by this crash? If so how? Can
> we recover ? Is that always guaranteed?
We can't guarantee anything about recovering from a crash of
an independent entity. But by design, recovery from a modem
crash is possible and is supposed to leave Linux in a
consistent state. A modem crash is likely to be observable
to the user.
I'll repeat that I don't believe the new call you inserted
in the IPA driver belongs there.
-Alex
>
> Luis
>
On Fri, May 22, 2020 at 03:52:44PM -0500, Alex Elder wrote:
> On 5/22/20 12:28 AM, Luis Chamberlain wrote:
> > OK thanks. Can the user be affected by this crash? If so how? Can
> > we recover ? Is that always guaranteed?
>
> We can't guarantee anything about recovering from a crash of
> an independent entity. But by design, recovery from a modem
> crash is possible and is supposed to leave Linux in a
> consistent state. A modem crash is likely to be observable
> to the user.
Thanks this helps, I'll drop this patch!
Luis