2024-01-29 23:41:26

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 0/6] pds_core: Various fixes

This series includes the following changes:

There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.

Issues around how resets are handled, specifically regarding the driver's
error handlers.

Originally these patches were aimed at net-next, but it was requested to
push the fixes patches to net. The original patches can be found here:

https://lore.kernel.org/netdev/[email protected]/

Also, the Reviewed-by tags were left in place from net-next reviews as the
patches didn't change.

Brett Creeley (6):
pds_core: Prevent health thread from running during reset/remove
pds_core: Cancel AQ work on teardown
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
pds_core: Prevent race issues involving the adminq
pds_core: Clear BARs on reset
pds_core: Rework teardown/setup flow to be more common

drivers/net/ethernet/amd/pds_core/adminq.c | 64 +++++++++++++++------
drivers/net/ethernet/amd/pds_core/core.c | 46 +++++++++++----
drivers/net/ethernet/amd/pds_core/core.h | 2 +-
drivers/net/ethernet/amd/pds_core/debugfs.c | 4 ++
drivers/net/ethernet/amd/pds_core/dev.c | 16 +++---
drivers/net/ethernet/amd/pds_core/devlink.c | 3 +-
drivers/net/ethernet/amd/pds_core/fw.c | 3 +
drivers/net/ethernet/amd/pds_core/main.c | 26 +++++++--
8 files changed, 124 insertions(+), 40 deletions(-)

--
2.17.1



2024-01-29 23:41:38

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 2/6] pds_core: Cancel AQ work on teardown

There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.

Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
---
drivers/net/ethernet/amd/pds_core/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index 0d2091e9eb28..b582729331eb 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -464,6 +464,8 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)

if (!pdsc->pdev->is_virtfn)
pdsc_devcmd_reset(pdsc);
+ if (pdsc->adminqcq.work.func)
+ cancel_work_sync(&pdsc->adminqcq.work);
pdsc_qcq_free(pdsc, &pdsc->notifyqcq);
pdsc_qcq_free(pdsc, &pdsc->adminqcq);

--
2.17.1


2024-01-29 23:41:56

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 4/6] pds_core: Prevent race issues involving the adminq

There are multiple paths that can result in using the pdsc's
adminq.

[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()

[2] pdsc_adminq_post()

When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.

A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.

In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.

The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
---
drivers/net/ethernet/amd/pds_core/adminq.c | 31 +++++++++++++++++-----
drivers/net/ethernet/amd/pds_core/core.c | 21 +++++++++++++++
drivers/net/ethernet/amd/pds_core/core.h | 1 +
3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c
index 68be5ea251fc..5edff33d56f3 100644
--- a/drivers/net/ethernet/amd/pds_core/adminq.c
+++ b/drivers/net/ethernet/amd/pds_core/adminq.c
@@ -63,6 +63,15 @@ static int pdsc_process_notifyq(struct pdsc_qcq *qcq)
return nq_work;
}

+static bool pdsc_adminq_inc_if_up(struct pdsc *pdsc)
+{
+ if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER) ||
+ pdsc->state & BIT_ULL(PDSC_S_FW_DEAD))
+ return false;
+
+ return refcount_inc_not_zero(&pdsc->adminq_refcnt);
+}
+
void pdsc_process_adminq(struct pdsc_qcq *qcq)
{
union pds_core_adminq_comp *comp;
@@ -75,9 +84,9 @@ void pdsc_process_adminq(struct pdsc_qcq *qcq)
int aq_work = 0;
int credits;

- /* Don't process AdminQ when shutting down */
- if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
- dev_err(pdsc->dev, "%s: called while PDSC_S_STOPPING_DRIVER\n",
+ /* Don't process AdminQ when it's not up */
+ if (!pdsc_adminq_inc_if_up(pdsc)) {
+ dev_err(pdsc->dev, "%s: called while adminq is unavailable\n",
__func__);
return;
}
@@ -124,6 +133,7 @@ void pdsc_process_adminq(struct pdsc_qcq *qcq)
pds_core_intr_credits(&pdsc->intr_ctrl[qcq->intx],
credits,
PDS_CORE_INTR_CRED_REARM);
+ refcount_dec(&pdsc->adminq_refcnt);
}

void pdsc_work_thread(struct work_struct *work)
@@ -138,9 +148,9 @@ irqreturn_t pdsc_adminq_isr(int irq, void *data)
struct pdsc *pdsc = data;
struct pdsc_qcq *qcq;

- /* Don't process AdminQ when shutting down */
- if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
- dev_err(pdsc->dev, "%s: called while PDSC_S_STOPPING_DRIVER\n",
+ /* Don't process AdminQ when it's not up */
+ if (!pdsc_adminq_inc_if_up(pdsc)) {
+ dev_err(pdsc->dev, "%s: called while adminq is unavailable\n",
__func__);
return IRQ_HANDLED;
}
@@ -148,6 +158,7 @@ irqreturn_t pdsc_adminq_isr(int irq, void *data)
qcq = &pdsc->adminqcq;
queue_work(pdsc->wq, &qcq->work);
pds_core_intr_mask(&pdsc->intr_ctrl[qcq->intx], PDS_CORE_INTR_MASK_CLEAR);
+ refcount_dec(&pdsc->adminq_refcnt);

return IRQ_HANDLED;
}
@@ -231,6 +242,12 @@ int pdsc_adminq_post(struct pdsc *pdsc,
int err = 0;
int index;

+ if (!pdsc_adminq_inc_if_up(pdsc)) {
+ dev_dbg(pdsc->dev, "%s: preventing adminq cmd %u\n",
+ __func__, cmd->opcode);
+ return -ENXIO;
+ }
+
wc.qcq = &pdsc->adminqcq;
index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, &wc);
if (index < 0) {
@@ -286,6 +303,8 @@ int pdsc_adminq_post(struct pdsc *pdsc,
queue_work(pdsc->wq, &pdsc->health_work);
}

+ refcount_dec(&pdsc->adminq_refcnt);
+
return err;
}
EXPORT_SYMBOL_GPL(pdsc_adminq_post);
diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index 0356e56a6e99..f44333bd1256 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -450,6 +450,7 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
pdsc_debugfs_add_viftype(pdsc);
}

+ refcount_set(&pdsc->adminq_refcnt, 1);
clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
return 0;

@@ -514,6 +515,24 @@ void pdsc_stop(struct pdsc *pdsc)
PDS_CORE_INTR_MASK_SET);
}

+static void pdsc_adminq_wait_and_dec_once_unused(struct pdsc *pdsc)
+{
+ /* The driver initializes the adminq_refcnt to 1 when the adminq is
+ * allocated and ready for use. Other users/requesters will increment
+ * the refcnt while in use. If the refcnt is down to 1 then the adminq
+ * is not in use and the refcnt can be cleared and adminq freed. Before
+ * calling this function the driver will set PDSC_S_FW_DEAD, which
+ * prevent subsequent attempts to use the adminq and increment the
+ * refcnt to fail. This guarantees that this function will eventually
+ * exit.
+ */
+ while (!refcount_dec_if_one(&pdsc->adminq_refcnt)) {
+ dev_dbg_ratelimited(pdsc->dev, "%s: adminq in use\n",
+ __func__);
+ cpu_relax();
+ }
+}
+
void pdsc_fw_down(struct pdsc *pdsc)
{
union pds_core_notifyq_comp reset_event = {
@@ -529,6 +548,8 @@ void pdsc_fw_down(struct pdsc *pdsc)
if (pdsc->pdev->is_virtfn)
return;

+ pdsc_adminq_wait_and_dec_once_unused(pdsc);
+
/* Notify clients of fw_down */
if (pdsc->fw_reporter)
devlink_health_report(pdsc->fw_reporter, "FW down reported", pdsc);
diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h
index e35d3e7006bf..cbd5716f46e6 100644
--- a/drivers/net/ethernet/amd/pds_core/core.h
+++ b/drivers/net/ethernet/amd/pds_core/core.h
@@ -184,6 +184,7 @@ struct pdsc {
struct mutex devcmd_lock; /* lock for dev_cmd operations */
struct mutex config_lock; /* lock for configuration operations */
spinlock_t adminq_lock; /* lock for adminq operations */
+ refcount_t adminq_refcnt;
struct pds_core_dev_info_regs __iomem *info_regs;
struct pds_core_dev_cmd_regs __iomem *cmd_regs;
struct pds_core_intr __iomem *intr_ctrl;
--
2.17.1


2024-01-29 23:42:14

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 3/6] pds_core: Use struct pdsc for the pdsc_adminq_isr private data

The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.

This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
---
drivers/net/ethernet/amd/pds_core/adminq.c | 5 +++--
drivers/net/ethernet/amd/pds_core/core.c | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c
index 5beadabc2136..68be5ea251fc 100644
--- a/drivers/net/ethernet/amd/pds_core/adminq.c
+++ b/drivers/net/ethernet/amd/pds_core/adminq.c
@@ -135,8 +135,8 @@ void pdsc_work_thread(struct work_struct *work)

irqreturn_t pdsc_adminq_isr(int irq, void *data)
{
- struct pdsc_qcq *qcq = data;
- struct pdsc *pdsc = qcq->pdsc;
+ struct pdsc *pdsc = data;
+ struct pdsc_qcq *qcq;

/* Don't process AdminQ when shutting down */
if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
@@ -145,6 +145,7 @@ irqreturn_t pdsc_adminq_isr(int irq, void *data)
return IRQ_HANDLED;
}

+ qcq = &pdsc->adminqcq;
queue_work(pdsc->wq, &qcq->work);
pds_core_intr_mask(&pdsc->intr_ctrl[qcq->intx], PDS_CORE_INTR_MASK_CLEAR);

diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index b582729331eb..0356e56a6e99 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -125,7 +125,7 @@ static int pdsc_qcq_intr_alloc(struct pdsc *pdsc, struct pdsc_qcq *qcq)

snprintf(name, sizeof(name), "%s-%d-%s",
PDS_CORE_DRV_NAME, pdsc->pdev->bus->number, qcq->q.name);
- index = pdsc_intr_alloc(pdsc, name, pdsc_adminq_isr, qcq);
+ index = pdsc_intr_alloc(pdsc, name, pdsc_adminq_isr, pdsc);
if (index < 0)
return index;
qcq->intx = index;
--
2.17.1


2024-01-29 23:42:41

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 5/6] pds_core: Clear BARs on reset

During reset the BARs might be accessed when they are
unmapped. This can cause unexpected issues, so fix it by
clearing the cached BAR values so they are not accessed
until they are re-mapped.

Also, make sure any places that can access the BARs
when they are NULL are prevented.

Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink")
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
---
drivers/net/ethernet/amd/pds_core/adminq.c | 28 +++++++++++++++------
drivers/net/ethernet/amd/pds_core/core.c | 8 +++++-
drivers/net/ethernet/amd/pds_core/dev.c | 9 ++++++-
drivers/net/ethernet/amd/pds_core/devlink.c | 3 ++-
drivers/net/ethernet/amd/pds_core/fw.c | 3 +++
drivers/net/ethernet/amd/pds_core/main.c | 5 ++++
6 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c
index 5edff33d56f3..ea773cfa0af6 100644
--- a/drivers/net/ethernet/amd/pds_core/adminq.c
+++ b/drivers/net/ethernet/amd/pds_core/adminq.c
@@ -191,10 +191,16 @@ static int __pdsc_adminq_post(struct pdsc *pdsc,

/* Check that the FW is running */
if (!pdsc_is_fw_running(pdsc)) {
- u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
-
- dev_info(pdsc->dev, "%s: post failed - fw not running %#02x:\n",
- __func__, fw_status);
+ if (pdsc->info_regs) {
+ u8 fw_status =
+ ioread8(&pdsc->info_regs->fw_status);
+
+ dev_info(pdsc->dev, "%s: post failed - fw not running %#02x:\n",
+ __func__, fw_status);
+ } else {
+ dev_info(pdsc->dev, "%s: post failed - BARs not setup\n",
+ __func__);
+ }
ret = -ENXIO;

goto err_out_unlock;
@@ -266,10 +272,16 @@ int pdsc_adminq_post(struct pdsc *pdsc,
break;

if (!pdsc_is_fw_running(pdsc)) {
- u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
-
- dev_dbg(pdsc->dev, "%s: post wait failed - fw not running %#02x:\n",
- __func__, fw_status);
+ if (pdsc->info_regs) {
+ u8 fw_status =
+ ioread8(&pdsc->info_regs->fw_status);
+
+ dev_dbg(pdsc->dev, "%s: post wait failed - fw not running %#02x:\n",
+ __func__, fw_status);
+ } else {
+ dev_dbg(pdsc->dev, "%s: post wait failed - BARs not setup\n",
+ __func__);
+ }
err = -ENXIO;
break;
}
diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index f44333bd1256..65c8a7072e35 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -600,7 +600,13 @@ void pdsc_fw_up(struct pdsc *pdsc)

static void pdsc_check_pci_health(struct pdsc *pdsc)
{
- u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
+ u8 fw_status;
+
+ /* some sort of teardown already in progress */
+ if (!pdsc->info_regs)
+ return;
+
+ fw_status = ioread8(&pdsc->info_regs->fw_status);

/* is PCI broken? */
if (fw_status != PDS_RC_BAD_PCI)
diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c
index 31940b857e0e..62a38e0a8454 100644
--- a/drivers/net/ethernet/amd/pds_core/dev.c
+++ b/drivers/net/ethernet/amd/pds_core/dev.c
@@ -57,6 +57,9 @@ int pdsc_err_to_errno(enum pds_core_status_code code)

bool pdsc_is_fw_running(struct pdsc *pdsc)
{
+ if (!pdsc->info_regs)
+ return false;
+
pdsc->fw_status = ioread8(&pdsc->info_regs->fw_status);
pdsc->last_fw_time = jiffies;
pdsc->last_hb = ioread32(&pdsc->info_regs->fw_heartbeat);
@@ -182,13 +185,17 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
{
int err;

+ if (!pdsc->cmd_regs)
+ return -ENXIO;
+
memcpy_toio(&pdsc->cmd_regs->cmd, cmd, sizeof(*cmd));
pdsc_devcmd_dbell(pdsc);
err = pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds);
- memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));

if ((err == -ENXIO || err == -ETIMEDOUT) && pdsc->wq)
queue_work(pdsc->wq, &pdsc->health_work);
+ else
+ memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));

return err;
}
diff --git a/drivers/net/ethernet/amd/pds_core/devlink.c b/drivers/net/ethernet/amd/pds_core/devlink.c
index e9948ea5bbcd..54864f27c87a 100644
--- a/drivers/net/ethernet/amd/pds_core/devlink.c
+++ b/drivers/net/ethernet/amd/pds_core/devlink.c
@@ -111,7 +111,8 @@ int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,

mutex_lock(&pdsc->devcmd_lock);
err = pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout * 2);
- memcpy_fromio(&fw_list, pdsc->cmd_regs->data, sizeof(fw_list));
+ if (!err)
+ memcpy_fromio(&fw_list, pdsc->cmd_regs->data, sizeof(fw_list));
mutex_unlock(&pdsc->devcmd_lock);
if (err && err != -EIO)
return err;
diff --git a/drivers/net/ethernet/amd/pds_core/fw.c b/drivers/net/ethernet/amd/pds_core/fw.c
index 90a811f3878a..fa626719e68d 100644
--- a/drivers/net/ethernet/amd/pds_core/fw.c
+++ b/drivers/net/ethernet/amd/pds_core/fw.c
@@ -107,6 +107,9 @@ int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,

dev_info(pdsc->dev, "Installing firmware\n");

+ if (!pdsc->cmd_regs)
+ return -ENXIO;
+
dl = priv_to_devlink(pdsc);
devlink_flash_update_status_notify(dl, "Preparing to flash",
NULL, 0, 0);
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index 5172a5ad8ec6..05fdeb235e5f 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -37,6 +37,11 @@ static void pdsc_unmap_bars(struct pdsc *pdsc)
struct pdsc_dev_bar *bars = pdsc->bars;
unsigned int i;

+ pdsc->info_regs = NULL;
+ pdsc->cmd_regs = NULL;
+ pdsc->intr_status = NULL;
+ pdsc->intr_ctrl = NULL;
+
for (i = 0; i < PDS_CORE_BARS_MAX; i++) {
if (bars[i].vaddr)
pci_iounmap(pdsc->pdev, bars[i].vaddr);
--
2.17.1


2024-01-29 23:43:01

by Brett Creeley

[permalink] [raw]
Subject: [PATCH net 6/6] pds_core: Rework teardown/setup flow to be more common

Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.

Fix this inconsistency by making the following changes:
1. Always call pdsc_dev_init() in pdsc_setup(), which calls
pci_alloc_irq_vectors() and get rid of the now unused
pds_dev_reinit().
2. Always free/clear the pdsc->intr_info in pdsc_teardown()
since this structure will get re-alloced in pdsc_setup().
3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
since pci_alloc_irq_vectors() will always be called in
pdsc_setup()->pdsc_dev_init() for both the probe/remove and
reset flows.
4. Make sure to only create the debugfs "identity" entry when it
doesn't already exist, which it will in the reset case because
it's already been created in the initial call to pdsc_dev_init().

Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
drivers/net/ethernet/amd/pds_core/core.c | 13 +++++--------
drivers/net/ethernet/amd/pds_core/core.h | 1 -
drivers/net/ethernet/amd/pds_core/debugfs.c | 4 ++++
drivers/net/ethernet/amd/pds_core/dev.c | 7 -------
drivers/net/ethernet/amd/pds_core/main.c | 2 --
5 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index 65c8a7072e35..7658a7286767 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -404,10 +404,7 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
int numdescs;
int err;

- if (init)
- err = pdsc_dev_init(pdsc);
- else
- err = pdsc_dev_reinit(pdsc);
+ err = pdsc_dev_init(pdsc);
if (err)
return err;

@@ -479,10 +476,9 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
for (i = 0; i < pdsc->nintrs; i++)
pdsc_intr_free(pdsc, i);

- if (removing) {
- kfree(pdsc->intr_info);
- pdsc->intr_info = NULL;
- }
+ kfree(pdsc->intr_info);
+ pdsc->intr_info = NULL;
+ pdsc->nintrs = 0;
}

if (pdsc->kern_dbpage) {
@@ -490,6 +486,7 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
pdsc->kern_dbpage = NULL;
}

+ pci_free_irq_vectors(pdsc->pdev);
set_bit(PDSC_S_FW_DEAD, &pdsc->state);
}

diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h
index cbd5716f46e6..110c4b826b22 100644
--- a/drivers/net/ethernet/amd/pds_core/core.h
+++ b/drivers/net/ethernet/amd/pds_core/core.h
@@ -281,7 +281,6 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
union pds_core_dev_comp *comp, int max_seconds);
int pdsc_devcmd_init(struct pdsc *pdsc);
int pdsc_devcmd_reset(struct pdsc *pdsc);
-int pdsc_dev_reinit(struct pdsc *pdsc);
int pdsc_dev_init(struct pdsc *pdsc);

void pdsc_reset_prepare(struct pci_dev *pdev);
diff --git a/drivers/net/ethernet/amd/pds_core/debugfs.c b/drivers/net/ethernet/amd/pds_core/debugfs.c
index 8ec392299b7d..4e8579ca1c8c 100644
--- a/drivers/net/ethernet/amd/pds_core/debugfs.c
+++ b/drivers/net/ethernet/amd/pds_core/debugfs.c
@@ -64,6 +64,10 @@ DEFINE_SHOW_ATTRIBUTE(identity);

void pdsc_debugfs_add_ident(struct pdsc *pdsc)
{
+ /* This file will already exist in the reset flow */
+ if (debugfs_lookup("identity", pdsc->dentry))
+ return;
+
debugfs_create_file("identity", 0400, pdsc->dentry,
pdsc, &identity_fops);
}
diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c
index 62a38e0a8454..e65a1632df50 100644
--- a/drivers/net/ethernet/amd/pds_core/dev.c
+++ b/drivers/net/ethernet/amd/pds_core/dev.c
@@ -316,13 +316,6 @@ static int pdsc_identify(struct pdsc *pdsc)
return 0;
}

-int pdsc_dev_reinit(struct pdsc *pdsc)
-{
- pdsc_init_devinfo(pdsc);
-
- return pdsc_identify(pdsc);
-}
-
int pdsc_dev_init(struct pdsc *pdsc)
{
unsigned int nintrs;
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index 05fdeb235e5f..cdbf053b5376 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -438,7 +438,6 @@ static void pdsc_remove(struct pci_dev *pdev)
mutex_destroy(&pdsc->config_lock);
mutex_destroy(&pdsc->devcmd_lock);

- pci_free_irq_vectors(pdev);
pdsc_unmap_bars(pdsc);
pci_release_regions(pdev);
}
@@ -470,7 +469,6 @@ void pdsc_reset_prepare(struct pci_dev *pdev)
pdsc_stop_health_thread(pdsc);
pdsc_fw_down(pdsc);

- pci_free_irq_vectors(pdev);
pdsc_unmap_bars(pdsc);
pci_release_regions(pdev);
pci_disable_device(pdev);
--
2.17.1


2024-02-01 03:04:09

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net 0/6] pds_core: Various fixes

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <[email protected]>:

On Mon, 29 Jan 2024 15:40:29 -0800 you wrote:
> This series includes the following changes:
>
> There can be many users of the pds_core's adminq. This includes
> pds_core's uses and any clients that depend on it. When the pds_core
> device goes through a reset for any reason the adminq is freed
> and reconfigured. There are some gaps in the current implementation
> that will cause crashes during reset if any of the previously mentioned
> users of the adminq attempt to use it after it's been freed.
>
> [...]

Here is the summary with links:
- [net,1/6] pds_core: Prevent health thread from running during reset/remove
https://git.kernel.org/netdev/net/c/d9407ff11809
- [net,2/6] pds_core: Cancel AQ work on teardown
https://git.kernel.org/netdev/net/c/d321067e2cfa
- [net,3/6] pds_core: Use struct pdsc for the pdsc_adminq_isr private data
https://git.kernel.org/netdev/net/c/951705151e50
- [net,4/6] pds_core: Prevent race issues involving the adminq
https://git.kernel.org/netdev/net/c/7e82a8745b95
- [net,5/6] pds_core: Clear BARs on reset
https://git.kernel.org/netdev/net/c/e96094c1d11c
- [net,6/6] pds_core: Rework teardown/setup flow to be more common
https://git.kernel.org/netdev/net/c/bc90fbe0c318

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html