2021-10-14 07:28:34

by Kelvin Cao

[permalink] [raw]
Subject: [PATCH v2 0/5] Switchtec Fixes and Improvements

From: Kelvin Cao <[email protected]>

Hi,

This patchset is mainly for improving the commit message of [PATCH 1/5]
in v1[1] to elaborate the root cause, together with a function renaming
and some other tweaks.

This patchset is based on v5.15-rc5.

Thanks,
Kelvin

[1] https://lore.kernel.org/lkml/[email protected]/

Changes since v1:
- Rebase on 5.15-rc5
- Tweak some comment spacing (as suggested by Bjorn)
- Update commit message to elaborate the root cause of MRPC execution
hanging (as suggested by Bjorn)
- Rename function check_access() to is_firmware_running()
(as suggested by Logan) so the function name suggests the meaning of
the return values (as suggested by Bjorn)
- Add comment to function is_firmware_running() (as suggested by Logan)


Kelvin Cao (4):
PCI/switchtec: Error out MRPC execution when MMIO reads fail
PCI/switchtec: Fix a MRPC error status handling issue
PCI/switchtec: Update the way of getting management VEP instance ID
PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP

Logan Gunthorpe (1):
PCI/switchtec: Add check of event support

drivers/pci/switch/switchtec.c | 95 ++++++++++++++++++++++++++++------
include/linux/switchtec.h | 1 +
2 files changed, 79 insertions(+), 17 deletions(-)

--
2.25.1


2021-10-14 07:29:03

by Kelvin Cao

[permalink] [raw]
Subject: [PATCH v2 2/5] PCI/switchtec: Fix a MRPC error status handling issue

From: Kelvin Cao <[email protected]>

If an error is encountered when executing a MRPC command, the firmware
will set the status register to SWITCHTEC_MRPC_STATUS_ERROR and return
the error code in the return value register.

Add handling of SWITCHTEC_MRPC_STATUS_ERROR on status register when
completing a MRPC command.

Signed-off-by: Kelvin Cao <[email protected]>
---
drivers/pci/switch/switchtec.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index e5bb2ac0e7bb..5c300ff3921d 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -238,7 +238,8 @@ static void mrpc_complete_cmd(struct switchtec_dev *stdev)
stuser_set_state(stuser, MRPC_DONE);
stuser->return_code = 0;

- if (stuser->status != SWITCHTEC_MRPC_STATUS_DONE)
+ if (stuser->status != SWITCHTEC_MRPC_STATUS_DONE &&
+ stuser->status != SWITCHTEC_MRPC_STATUS_ERROR)
goto out;

if (stdev->dma_mrpc)
@@ -622,7 +623,8 @@ static ssize_t switchtec_dev_read(struct file *filp, char __user *data,
out:
mutex_unlock(&stdev->mrpc_mutex);

- if (stuser->status == SWITCHTEC_MRPC_STATUS_DONE)
+ if (stuser->status == SWITCHTEC_MRPC_STATUS_DONE ||
+ stuser->status == SWITCHTEC_MRPC_STATUS_ERROR)
return size;
else if (stuser->status == SWITCHTEC_MRPC_STATUS_INTERRUPTED)
return -ENXIO;
--
2.25.1

2021-10-14 07:29:23

by Kelvin Cao

[permalink] [raw]
Subject: [PATCH v2 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP

From: Kelvin Cao <[email protected]>

ENOTSUPP is not a SUSV4 error code, and the following checkpatch.pl
warning will be given for new patches which still use ENOTSUPP.

WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP

See the link below for the discussion.

https://lore.kernel.org/netdev/[email protected]/

Replace ENOTSUPP with EOPNOTSUPP to align with future patches which will
be using EOPNOTSUPP.

Signed-off-by: Kelvin Cao <[email protected]>
---
drivers/pci/switch/switchtec.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 97a93c9b4629..236cb40cc7c5 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -376,7 +376,7 @@ static ssize_t field ## _show(struct device *dev, \
return io_string_show(buf, &si->gen4.field, \
sizeof(si->gen4.field)); \
else \
- return -ENOTSUPP; \
+ return -EOPNOTSUPP; \
} \
\
static DEVICE_ATTR_RO(field)
@@ -668,7 +668,7 @@ static int ioctl_flash_info(struct switchtec_dev *stdev,
info.flash_length = ioread32(&fi->gen4.flash_length);
info.num_partitions = SWITCHTEC_NUM_PARTITIONS_GEN4;
} else {
- return -ENOTSUPP;
+ return -EOPNOTSUPP;
}

if (copy_to_user(uinfo, &info, sizeof(info)))
@@ -876,7 +876,7 @@ static int ioctl_flash_part_info(struct switchtec_dev *stdev,
if (ret)
return ret;
} else {
- return -ENOTSUPP;
+ return -EOPNOTSUPP;
}

if (copy_to_user(uinfo, &info, sizeof(info)))
@@ -1611,7 +1611,7 @@ static int switchtec_init_pci(struct switchtec_dev *stdev,
else if (stdev->gen == SWITCHTEC_GEN4)
part_id = &stdev->mmio_sys_info->gen4.partition_id;
else
- return -ENOTSUPP;
+ return -EOPNOTSUPP;

stdev->partition = ioread8(part_id);
stdev->partition_count = ioread8(&stdev->mmio_ntb->partition_count);
--
2.25.1

2021-10-14 07:31:17

by Kelvin Cao

[permalink] [raw]
Subject: [PATCH v2 1/5] PCI/switchtec: Error out MRPC execution when MMIO reads fail

From: Kelvin Cao <[email protected]>

A firmware hard reset may be initiated by various mechanisms including a
UART interface, TWI sideband interface from BMC, MRPC command from
userspace, etc. The switchtec management driver is unaware of these
resets.

The reset clears PCI state including the BARs and Memory Space Enable
bits, so the device no longer responds to the MMIO accesses the driver
uses to operate it.

MMIO reads to the device will fail with a PCIe error. When the root
complex handles that error, it typically fabricates ~0 data to complete
the CPU read.

Check for this sort of error by reading the device ID from MMIO space.
This ID can never be ~0, so if we see that value, it probably means the
PCIe Memory Read failed and we should return an error indication to the
application using the switchtec driver.

Signed-off-by: Kelvin Cao <[email protected]>
---
drivers/pci/switch/switchtec.c | 67 ++++++++++++++++++++++++++++++----
1 file changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 0b301f8be9ed..e5bb2ac0e7bb 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -45,6 +45,7 @@ enum mrpc_state {
MRPC_QUEUED,
MRPC_RUNNING,
MRPC_DONE,
+ MRPC_IO_ERROR,
};

struct switchtec_user {
@@ -66,6 +67,19 @@ struct switchtec_user {
int event_cnt;
};

+/*
+ * The MMIO reads to the device_id register should always return the device ID
+ * of the device, otherwise the firmware is probably stuck or unreachable
+ * due to a firmware reset which clears PCI state including the BARs and Memory
+ * Space Enable bits.
+ */
+static int is_firmware_running(struct switchtec_dev *stdev)
+{
+ u32 device = ioread32(&stdev->mmio_sys_info->device_id);
+
+ return stdev->pdev->device == device;
+}
+
static struct switchtec_user *stuser_create(struct switchtec_dev *stdev)
{
struct switchtec_user *stuser;
@@ -113,6 +127,7 @@ static void stuser_set_state(struct switchtec_user *stuser,
[MRPC_QUEUED] = "QUEUED",
[MRPC_RUNNING] = "RUNNING",
[MRPC_DONE] = "DONE",
+ [MRPC_IO_ERROR] = "IO_ERROR",
};

stuser->state = state;
@@ -184,9 +199,26 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
return 0;
}

+static void mrpc_cleanup_cmd(struct switchtec_dev *stdev)
+{
+ /* requires the mrpc_mutex to already be held when called */
+
+ struct switchtec_user *stuser = list_entry(stdev->mrpc_queue.next,
+ struct switchtec_user, list);
+
+ stuser->cmd_done = true;
+ wake_up_interruptible(&stuser->cmd_comp);
+ list_del_init(&stuser->list);
+ stuser_put(stuser);
+ stdev->mrpc_busy = 0;
+
+ mrpc_cmd_submit(stdev);
+}
+
static void mrpc_complete_cmd(struct switchtec_dev *stdev)
{
/* requires the mrpc_mutex to already be held when called */
+
struct switchtec_user *stuser;

if (list_empty(&stdev->mrpc_queue))
@@ -223,13 +255,7 @@ static void mrpc_complete_cmd(struct switchtec_dev *stdev)
memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
stuser->read_len);
out:
- stuser->cmd_done = true;
- wake_up_interruptible(&stuser->cmd_comp);
- list_del_init(&stuser->list);
- stuser_put(stuser);
- stdev->mrpc_busy = 0;
-
- mrpc_cmd_submit(stdev);
+ mrpc_cleanup_cmd(stdev);
}

static void mrpc_event_work(struct work_struct *work)
@@ -246,6 +272,23 @@ static void mrpc_event_work(struct work_struct *work)
mutex_unlock(&stdev->mrpc_mutex);
}

+static void mrpc_error_complete_cmd(struct switchtec_dev *stdev)
+{
+ /* requires the mrpc_mutex to already be held when called */
+
+ struct switchtec_user *stuser;
+
+ if (list_empty(&stdev->mrpc_queue))
+ return;
+
+ stuser = list_entry(stdev->mrpc_queue.next,
+ struct switchtec_user, list);
+
+ stuser_set_state(stuser, MRPC_IO_ERROR);
+
+ mrpc_cleanup_cmd(stdev);
+}
+
static void mrpc_timeout_work(struct work_struct *work)
{
struct switchtec_dev *stdev;
@@ -257,6 +300,11 @@ static void mrpc_timeout_work(struct work_struct *work)

mutex_lock(&stdev->mrpc_mutex);

+ if (!is_firmware_running(stdev)) {
+ mrpc_error_complete_cmd(stdev);
+ goto out;
+ }
+
if (stdev->dma_mrpc)
status = stdev->dma_mrpc->status;
else
@@ -544,6 +592,11 @@ static ssize_t switchtec_dev_read(struct file *filp, char __user *data,
if (rc)
return rc;

+ if (stuser->state == MRPC_IO_ERROR) {
+ mutex_unlock(&stdev->mrpc_mutex);
+ return -EIO;
+ }
+
if (stuser->state != MRPC_DONE) {
mutex_unlock(&stdev->mrpc_mutex);
return -EBADE;
--
2.25.1

2021-10-14 16:24:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Switchtec Fixes and Improvements

On Thu, Oct 14, 2021 at 02:18:54PM +0000, [email protected] wrote:
> From: Kelvin Cao <[email protected]>
>
> Hi,
>
> This patchset is mainly for improving the commit message of [PATCH 1/5]
> in v1[1] to elaborate the root cause, together with a function renaming
> and some other tweaks.
>
> This patchset is based on v5.15-rc5.

Applied, replacing the previous set, thanks!

> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Changes since v1:
> - Rebase on 5.15-rc5
> - Tweak some comment spacing (as suggested by Bjorn)
> - Update commit message to elaborate the root cause of MRPC execution
> hanging (as suggested by Bjorn)
> - Rename function check_access() to is_firmware_running()
> (as suggested by Logan) so the function name suggests the meaning of
> the return values (as suggested by Bjorn)
> - Add comment to function is_firmware_running() (as suggested by Logan)
>
>
> Kelvin Cao (4):
> PCI/switchtec: Error out MRPC execution when MMIO reads fail
> PCI/switchtec: Fix a MRPC error status handling issue
> PCI/switchtec: Update the way of getting management VEP instance ID
> PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP
>
> Logan Gunthorpe (1):
> PCI/switchtec: Add check of event support
>
> drivers/pci/switch/switchtec.c | 95 ++++++++++++++++++++++++++++------
> include/linux/switchtec.h | 1 +
> 2 files changed, 79 insertions(+), 17 deletions(-)
>
> --
> 2.25.1
>

2021-10-15 02:53:07

by Kelvin Cao

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Switchtec Fixes and Improvements

On Thu, 2021-10-14 at 09:47 -0500, Bjorn Helgaas wrote:
> On Thu, Oct 14, 2021 at 02:18:54PM +0000, [email protected]
> wrote:
> > From: Kelvin Cao <[email protected]>
> >
> > Hi,
> >
> > This patchset is mainly for improving the commit message of [PATCH
> > 1/5]
> > in v1[1] to elaborate the root cause, together with a function
> > renaming
> > and some other tweaks.
> >
> > This patchset is based on v5.15-rc5.
>
> Applied, replacing the previous set, thanks!

Thank you Bjorn!
>
> > [1]
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes since v1:
> > - Rebase on 5.15-rc5
> > - Tweak some comment spacing (as suggested by Bjorn)
> > - Update commit message to elaborate the root cause of MRPC
> > execution
> > hanging (as suggested by Bjorn)
> > - Rename function check_access() to is_firmware_running()
> > (as suggested by Logan) so the function name suggests the meaning
> > of
> > the return values (as suggested by Bjorn)
> > - Add comment to function is_firmware_running() (as suggested by
> > Logan)
> >
> >
> > Kelvin Cao (4):
> > PCI/switchtec: Error out MRPC execution when MMIO reads fail
> > PCI/switchtec: Fix a MRPC error status handling issue
> > PCI/switchtec: Update the way of getting management VEP instance
> > ID
> > PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP
> >
> > Logan Gunthorpe (1):
> > PCI/switchtec: Add check of event support
> >
> > drivers/pci/switch/switchtec.c | 95 ++++++++++++++++++++++++++++
> > ------
> > include/linux/switchtec.h | 1 +
> > 2 files changed, 79 insertions(+), 17 deletions(-)
> >
> > --
> > 2.25.1
> >

2021-10-15 10:22:17

by Krzysztof Wilczyński

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] PCI/switchtec: Error out MRPC execution when MMIO reads fail

Hi Kelvin,

Thank you for sending the series over!

I am terribly sorry for a very late comment, especially since Bjorn already
accepted this series to be included, but allow me for a small question
below.

[...]
> @@ -113,6 +127,7 @@ static void stuser_set_state(struct switchtec_user *stuser,
> [MRPC_QUEUED] = "QUEUED",
> [MRPC_RUNNING] = "RUNNING",
> [MRPC_DONE] = "DONE",
> + [MRPC_IO_ERROR] = "IO_ERROR",

Looking at the above, and then looking at stuser_set_state(), which
contains the following local array definition:

const char * const state_names[] = {
[MRPC_IDLE] = "IDLE",
[MRPC_QUEUED] = "QUEUED",
[MRPC_RUNNING] = "RUNNING",
[MRPC_DONE] = "DONE",
};

I was wondering if there might be a small benefit of declaring this array
state_names[], or list of states if you wish, as static so that we avoid
having to allocate space and fill it in with values every time this
functions runs?

The function itself if referenced in few places as per:

Index File Line Content
1 drivers/pci/switch/switchtec.c 159 stuser_set_state(stuser, MRPC_RUNNING);
2 drivers/pci/switch/switchtec.c 178 stuser_set_state(stuser, MRPC_QUEUED);
3 drivers/pci/switch/switchtec.c 206 stuser_set_state(stuser, MRPC_DONE);
4 drivers/pci/switch/switchtec.c 567 stuser_set_state(stuser, MRPC_IDLE);

Even though the string representation of the state is ever only printed if
a debug logging is requested, we would allocate and popular this array
every time anyway, regardless of whether we print any debug information or
not.

What do you think?

Krzysztof

2021-10-15 10:42:38

by Krzysztof Wilczyński

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP

[+cc Jakub for visibility]

Hi,

> ENOTSUPP is not a SUSV4 error code, and the following checkpatch.pl
> warning will be given for new patches which still use ENOTSUPP.
>
> WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
>
> See the link below for the discussion.
>
> https://lore.kernel.org/netdev/[email protected]/

This makes me wonder whether we should fix other occurrences of ENOTSUPP we
have in the PCI tree, as per:

Index File Line Content
0 drivers/pci/msi.c 1304 * -ENOTSUPP otherwise
1 drivers/pci/msi.c 1316 return -ENOTSUPP;
2 drivers/pci/pcie/dpc.c 355 return -ENOTSUPP;
3 drivers/pci/setup-res.c 421 return -ENOTSUPP;
4 drivers/pci/setup-res.c 433 return -ENOTSUPP;
5 drivers/pci/pci.c 3600 * Returns -ENOTSUPP if resizable BARs are not supported at all.
6 drivers/pci/pci.c 3610 return -ENOTSUPP;
7 drivers/pci/switch/switchtec.c 330 return -ENOTSUPP; \
8 drivers/pci/switch/switchtec.c 616 return -ENOTSUPP;
9 drivers/pci/switch/switchtec.c 824 return -ENOTSUPP;
10 drivers/pci/switch/switchtec.c 1559 return -ENOTSUPP;
11 drivers/pci/controller/dwc/pcie-tegra194.c 2244 return -ENOTSUPP;

What do you think Bjorn? Jakub?

Krzysztof

2021-10-18 03:14:14

by Kelvin Cao

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] PCI/switchtec: Error out MRPC execution when MMIO reads fail

On Fri, 2021-10-15 at 03:21 +0200, Krzysztof Wilczyński wrote:
> Hi Kelvin,
>
> Thank you for sending the series over!
>
> I am terribly sorry for a very late comment, especially since Bjorn
> already
> accepted this series to be included, but allow me for a small
> question
> below.
>
> [...]
> > @@ -113,6 +127,7 @@ static void stuser_set_state(struct
> > switchtec_user *stuser,
> > [MRPC_QUEUED] = "QUEUED",
> > [MRPC_RUNNING] = "RUNNING",
> > [MRPC_DONE] = "DONE",
> > + [MRPC_IO_ERROR] = "IO_ERROR",
>
> Looking at the above, and then looking at stuser_set_state(), which
> contains the following local array definition:
>
> const char * const state_names[] = {
> [MRPC_IDLE] = "IDLE",
> [MRPC_QUEUED] = "QUEUED",
> [MRPC_RUNNING] = "RUNNING",
> [MRPC_DONE] = "DONE",
> };
>
> I was wondering if there might be a small benefit of declaring this
> array
> state_names[], or list of states if you wish, as static so that we
> avoid
> having to allocate space and fill it in with values every time this
> functions runs?
>
> The function itself if referenced in few places as per:
>
> Index File Line Content
> 1 drivers/pci/switch/switchtec.c 159 stuser_set_state(stuser,
> MRPC_RUNNING);
> 2 drivers/pci/switch/switchtec.c 178 stuser_set_state(stuser,
> MRPC_QUEUED);
> 3 drivers/pci/switch/switchtec.c 206 stuser_set_state(stuser,
> MRPC_DONE);
> 4 drivers/pci/switch/switchtec.c 567 stuser_set_state(stuser,
> MRPC_IDLE);
>
> Even though the string representation of the state is ever only
> printed if
> a debug logging is requested, we would allocate and popular this
> array
> every time anyway, regardless of whether we print any debug
> information or
> not.
>
> What do you think?

Thank you Krzysztof. That will be an improvement. I can probably tweak
it in the next patchset (coming soon).

Kelvin
>
> Krzysztof