2024-04-02 08:14:23

by Kwangjin Ko

[permalink] [raw]
Subject: [PATCH v2 0/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

This patch fixes the issue of accessing incorrect values when getting
multiple event records in cxl_mem_get_records_log().

Changes from v1:
1. Rewrite the commit message to indicate that it can read an
incorrect value, rather than causing a buffer overflow.
(feedbacked by Ira Weiny)
2. Remove the following commit that has already been added into
the 'fixes' branch of the CXL repository.
(feedbacked by Alison Schofield)

cxl/core: Fix incorrect array index in cxl_clear_event_record()

Kwangjin Ko (1):
cxl/core: Fix initialization of mbox_cmd.size_out in get event

drivers/cxl/core/mbox.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)


base-commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
--
2.34.1



2024-04-02 08:14:37

by Kwangjin Ko

[permalink] [raw]
Subject: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Since mbox_cmd.size_out is overwritten with the actual output size in
the function below, it needs to be initialized every time.

cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd

Problem scenario:

1) The size_out variable is initially set to the size of the mailbox.
2) Read an event.
- size_out is set to 160 bytes(header 32B + one event 128B).
- Two event are created while reading.
3) Read the new *two* events.
- size_out is still set to 160 bytes.
- Although the value of out_len is 288 bytes, only 160 bytes are
copied from the mailbox register to the local variable.
- record_count is set to 2.
- Accessing records[1] will result in reading incorrect data.

Signed-off-by: Kwangjin Ko <[email protected]>
---
drivers/cxl/core/mbox.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..a38531a055c8 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
.payload_in = &log_type,
.size_in = sizeof(log_type),
.payload_out = payload,
- .size_out = mds->payload_size,
.min_out = struct_size(payload, records, 0),
};

do {
int rc, i;

+ mbox_cmd.size_out = mds->payload_size;
+
rc = cxl_internal_send_cmd(mds, &mbox_cmd);
if (rc) {
dev_err_ratelimited(dev,
--
2.34.1


2024-04-02 14:48:03

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

On Tue, 2 Apr 2024 17:14:03 +0900
Kwangjin Ko <[email protected]> wrote:

> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
>
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
>
> Problem scenario:
>
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
> - size_out is set to 160 bytes(header 32B + one event 128B).
> - Two event are created while reading.
> 3) Read the new *two* events.
> - size_out is still set to 160 bytes.
> - Although the value of out_len is 288 bytes, only 160 bytes are
> copied from the mailbox register to the local variable.
> - record_count is set to 2.
> - Accessing records[1] will result in reading incorrect data.
>
> Signed-off-by: Kwangjin Ko <[email protected]>

Looks correct to me.

Reviewed-by: Jonathan Cameron <[email protected]>

> ---
> drivers/cxl/core/mbox.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> .payload_in = &log_type,
> .size_in = sizeof(log_type),
> .payload_out = payload,
> - .size_out = mds->payload_size,
> .min_out = struct_size(payload, records, 0),
> };
>
> do {
> int rc, i;
>
> + mbox_cmd.size_out = mds->payload_size;
> +
> rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> if (rc) {
> dev_err_ratelimited(dev,


2024-04-02 16:26:35

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
>
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
>
> Problem scenario:
>
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
> - size_out is set to 160 bytes(header 32B + one event 128B).
> - Two event are created while reading.
> 3) Read the new *two* events.
> - size_out is still set to 160 bytes.
> - Although the value of out_len is 288 bytes, only 160 bytes are
> copied from the mailbox register to the local variable.
> - record_count is set to 2.
> - Accessing records[1] will result in reading incorrect data.
>
> Signed-off-by: Kwangjin Ko <[email protected]>


Tested-by: Ira Weiny <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>

> ---
> drivers/cxl/core/mbox.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> .payload_in = &log_type,
> .size_in = sizeof(log_type),
> .payload_out = payload,
> - .size_out = mds->payload_size,
> .min_out = struct_size(payload, records, 0),
> };
>
> do {
> int rc, i;
>
> + mbox_cmd.size_out = mds->payload_size;
> +
> rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> if (rc) {
> dev_err_ratelimited(dev,
> --
> 2.34.1
>



2024-04-03 15:00:46

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
>
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
>
> Problem scenario:
>
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
> - size_out is set to 160 bytes(header 32B + one event 128B).
> - Two event are created while reading.
> 3) Read the new *two* events.
> - size_out is still set to 160 bytes.
> - Although the value of out_len is 288 bytes, only 160 bytes are
> copied from the mailbox register to the local variable.
> - record_count is set to 2.
> - Accessing records[1] will result in reading incorrect data.
>
> Signed-off-by: Kwangjin Ko <[email protected]>
> ---
> drivers/cxl/core/mbox.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> .payload_in = &log_type,
> .size_in = sizeof(log_type),
> .payload_out = payload,
> - .size_out = mds->payload_size,
> .min_out = struct_size(payload, records, 0),
> };
>
> do {
> int rc, i;
>
> + mbox_cmd.size_out = mds->payload_size;
> +

Fix looks correct, but I am concerned it is a band-aid for a more
general problem. For example, if I am not mistaken, we have a similar
bug in cxl_mem_get_poison().

So perhaps a convention to always define @mbox_cmd immediately before
cxl_internal_send_cmd() like this:

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..5d44b5c095b7 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -946,24 +946,22 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
struct device *dev = mds->cxlds.dev;
struct cxl_get_event_payload *payload;
- struct cxl_mbox_cmd mbox_cmd;
u8 log_type = type;
u16 nr_rec;

mutex_lock(&mds->event.log_lock);
payload = mds->event.buf;

- mbox_cmd = (struct cxl_mbox_cmd) {
- .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
- .payload_in = &log_type,
- .size_in = sizeof(log_type),
- .payload_out = payload,
- .size_out = mds->payload_size,
- .min_out = struct_size(payload, records, 0),
- };
-
do {
int rc, i;
+ struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
+ .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
+ .payload_in = &log_type,
+ .size_in = sizeof(log_type),
+ .payload_out = payload,
+ .size_out = mds->payload_size,
+ .min_out = struct_size(payload, records, 0),
+ };

rc = cxl_internal_send_cmd(mds, &mbox_cmd);
if (rc) {
@@ -1296,7 +1294,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
struct cxl_mbox_poison_out *po;
struct cxl_mbox_poison_in pi;
- struct cxl_mbox_cmd mbox_cmd;
int nr_records = 0;
int rc;

@@ -1308,16 +1305,16 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
pi.offset = cpu_to_le64(offset);
pi.length = cpu_to_le64(len / CXL_POISON_LEN_MULT);

- mbox_cmd = (struct cxl_mbox_cmd) {
- .opcode = CXL_MBOX_OP_GET_POISON,
- .size_in = sizeof(pi),
- .payload_in = &pi,
- .size_out = mds->payload_size,
- .payload_out = po,
- .min_out = struct_size(po, record, 0),
- };
-
do {
+ struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
+ .opcode = CXL_MBOX_OP_GET_POISON,
+ .size_in = sizeof(pi),
+ .payload_in = &pi,
+ .size_out = mds->payload_size,
+ .payload_out = po,
+ .min_out = struct_size(po, record, 0),
+ };
+
rc = cxl_internal_send_cmd(mds, &mbox_cmd);
if (rc)
break;

2024-04-04 13:56:58

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

On Wed, 3 Apr 2024 07:53:24 -0700
Dan Williams <[email protected]> wrote:

> Kwangjin Ko wrote:
> > Since mbox_cmd.size_out is overwritten with the actual output size in
> > the function below, it needs to be initialized every time.
> >
> > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> >
> > Problem scenario:
> >
> > 1) The size_out variable is initially set to the size of the mailbox.
> > 2) Read an event.
> > - size_out is set to 160 bytes(header 32B + one event 128B).
> > - Two event are created while reading.
> > 3) Read the new *two* events.
> > - size_out is still set to 160 bytes.
> > - Although the value of out_len is 288 bytes, only 160 bytes are
> > copied from the mailbox register to the local variable.
> > - record_count is set to 2.
> > - Accessing records[1] will result in reading incorrect data.
> >
> > Signed-off-by: Kwangjin Ko <[email protected]>
> > ---
> > drivers/cxl/core/mbox.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 9adda4795eb7..a38531a055c8 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > .payload_in = &log_type,
> > .size_in = sizeof(log_type),
> > .payload_out = payload,
> > - .size_out = mds->payload_size,
> > .min_out = struct_size(payload, records, 0),
> > };
> >
> > do {
> > int rc, i;
> >
> > + mbox_cmd.size_out = mds->payload_size;
> > +
>
> Fix looks correct, but I am concerned it is a band-aid for a more
> general problem. For example, if I am not mistaken, we have a similar
> bug in cxl_mem_get_poison().
>
> So perhaps a convention to always define @mbox_cmd immediately before
> cxl_internal_send_cmd() like this:

Makes sense to me. These aren't hot paths, so safe code is worth the
possible extra writes.

>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..5d44b5c095b7 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -946,24 +946,22 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> struct device *dev = mds->cxlds.dev;
> struct cxl_get_event_payload *payload;
> - struct cxl_mbox_cmd mbox_cmd;
> u8 log_type = type;
> u16 nr_rec;
>
> mutex_lock(&mds->event.log_lock);
> payload = mds->event.buf;
>
> - mbox_cmd = (struct cxl_mbox_cmd) {
> - .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
> - .payload_in = &log_type,
> - .size_in = sizeof(log_type),
> - .payload_out = payload,
> - .size_out = mds->payload_size,
> - .min_out = struct_size(payload, records, 0),
> - };
> -
> do {
> int rc, i;
> + struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
> + .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
> + .payload_in = &log_type,
> + .size_in = sizeof(log_type),
> + .payload_out = payload,
> + .size_out = mds->payload_size,
> + .min_out = struct_size(payload, records, 0),
> + };
>
> rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> if (rc) {
> @@ -1296,7 +1294,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> struct cxl_mbox_poison_out *po;
> struct cxl_mbox_poison_in pi;
> - struct cxl_mbox_cmd mbox_cmd;
> int nr_records = 0;
> int rc;
>
> @@ -1308,16 +1305,16 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> pi.offset = cpu_to_le64(offset);
> pi.length = cpu_to_le64(len / CXL_POISON_LEN_MULT);
>
> - mbox_cmd = (struct cxl_mbox_cmd) {
> - .opcode = CXL_MBOX_OP_GET_POISON,
> - .size_in = sizeof(pi),
> - .payload_in = &pi,
> - .size_out = mds->payload_size,
> - .payload_out = po,
> - .min_out = struct_size(po, record, 0),
> - };
> -
> do {
> + struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
> + .opcode = CXL_MBOX_OP_GET_POISON,
> + .size_in = sizeof(pi),
> + .payload_in = &pi,
> + .size_out = mds->payload_size,
> + .payload_out = po,
> + .min_out = struct_size(po, record, 0),
> + };
> +
> rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> if (rc)
> break;
>


2024-04-04 17:36:07

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Jonathan Cameron wrote:
[..]
> >
> > Fix looks correct, but I am concerned it is a band-aid for a more
> > general problem. For example, if I am not mistaken, we have a similar
> > bug in cxl_mem_get_poison().
> >
> > So perhaps a convention to always define @mbox_cmd immediately before
> > cxl_internal_send_cmd() like this:
>
> Makes sense to me. These aren't hot paths, so safe code is worth the
> possible extra writes.

Yeah, the before and after size wise is small:

text data bss dec hex filename
15407 2129 49 17585 44b1 drivers/cxl/core/mbox.o

text data bss dec hex filename
15461 2129 49 17639 44e7 drivers/cxl/core/mbox.o

..which I think is worth the peace of mind, and it matches what is
currently done in cxl_xfer_log()

I will send a follow on with this since this patch is already in
cxl-fixes.

2024-04-04 20:21:04

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Dan Williams wrote:
> Jonathan Cameron wrote:
> [..]
> > >
> > > Fix looks correct, but I am concerned it is a band-aid for a more
> > > general problem. For example, if I am not mistaken, we have a similar
> > > bug in cxl_mem_get_poison().
> > >
> > > So perhaps a convention to always define @mbox_cmd immediately before
> > > cxl_internal_send_cmd() like this:
> >
> > Makes sense to me. These aren't hot paths, so safe code is worth the
> > possible extra writes.
>
> Yeah, the before and after size wise is small:
>
> text data bss dec hex filename
> 15407 2129 49 17585 44b1 drivers/cxl/core/mbox.o
>
> text data bss dec hex filename
> 15461 2129 49 17639 44e7 drivers/cxl/core/mbox.o
>
> ...which I think is worth the peace of mind, and it matches what is
> currently done in cxl_xfer_log()
>
> I will send a follow on with this since this patch is already in
> cxl-fixes.

Thanks Dan.
Ira

2024-04-05 16:04:35

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
>
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
>
> Problem scenario:
>
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
> - size_out is set to 160 bytes(header 32B + one event 128B).
> - Two event are created while reading.
> 3) Read the new *two* events.
> - size_out is still set to 160 bytes.
> - Although the value of out_len is 288 bytes, only 160 bytes are
> copied from the mailbox register to the local variable.
> - record_count is set to 2.
> - Accessing records[1] will result in reading incorrect data.

Agree with the other comments on need to set .out_size when doing
cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
this case if the MORE flag is set and a follow on read of the list
delivers more records than the previous read. ie. device gives one
record, sets the _MORE flag, then gives 5.

2 other things appeared to me while looking at this:

First, it seems that there is another cleanup wrt accessing records
with invalid data. Still focusing on get_events and get_poison
since those loop through output data based on a device supplied
record count. The min_out check means the driver at least gets a
count of records to expect. That's good. The problem occurs::

if (mbox.size_out != struct_size(payload, records, 'record_count'))

The driver will log garbage trace events, and that could lead to
bad actions based on bad data. (like a needless scan of device based
on a false overflow flag). So, checking that size.out is the proper
multiple of record_count protects driver from bad device behavior.

I think that can be combined w the patch Dan is suggesting to
reset mbox.size_out on each loop.

Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
values reported by the device. It seems like at a minimum the
pci-driver could emit an info or debug message when the device
is reporting payload lengths that exceed what the driver can
copy in. I'm referring to the mbox.size_out adjustment in
__cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
to judge, pass that actual payload length value back in the
mbox structure (new field) so that the cxl-driver can use it.
The pci driver would still do it's "#8 Sanitize the copy" work,
but it would allow the cxl-driver to clearly see why it got the
size_out it got, and squawk about it if needed.

--Alison

>
> Signed-off-by: Kwangjin Ko <[email protected]>
> ---
> drivers/cxl/core/mbox.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> .payload_in = &log_type,
> .size_in = sizeof(log_type),
> .payload_out = payload,
> - .size_out = mds->payload_size,
> .min_out = struct_size(payload, records, 0),
> };
>
> do {
> int rc, i;
>
> + mbox_cmd.size_out = mds->payload_size;
> +
> rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> if (rc) {
> dev_err_ratelimited(dev,
> --
> 2.34.1
>

2024-04-05 16:41:14

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

On Fri, 5 Apr 2024 09:04:16 -0700
Alison Schofield <[email protected]> wrote:

> On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> > Since mbox_cmd.size_out is overwritten with the actual output size in
> > the function below, it needs to be initialized every time.
> >
> > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> >
> > Problem scenario:
> >
> > 1) The size_out variable is initially set to the size of the mailbox.
> > 2) Read an event.
> > - size_out is set to 160 bytes(header 32B + one event 128B).
> > - Two event are created while reading.
> > 3) Read the new *two* events.
> > - size_out is still set to 160 bytes.
> > - Although the value of out_len is 288 bytes, only 160 bytes are
> > copied from the mailbox register to the local variable.
> > - record_count is set to 2.
> > - Accessing records[1] will result in reading incorrect data.
>
> Agree with the other comments on need to set .out_size when doing
> cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
> this case if the MORE flag is set and a follow on read of the list
> delivers more records than the previous read. ie. device gives one
> record, sets the _MORE flag, then gives 5.
>
> 2 other things appeared to me while looking at this:
>
> First, it seems that there is another cleanup wrt accessing records
> with invalid data. Still focusing on get_events and get_poison
> since those loop through output data based on a device supplied
> record count. The min_out check means the driver at least gets a
> count of records to expect. That's good. The problem occurs::
>
> if (mbox.size_out != struct_size(payload, records, 'record_count'))
>
> The driver will log garbage trace events, and that could lead to
> bad actions based on bad data. (like a needless scan of device based
> on a false overflow flag). So, checking that size.out is the proper
> multiple of record_count protects driver from bad device behavior.
>
> I think that can be combined w the patch Dan is suggesting to
> reset mbox.size_out on each loop.
Hi Alison,

I'd split it. Dan's one is a bug fix, this is hardening against
a device bug. Good to have but not really backport material unless
we think there are devices like this out there.

>
> Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
> values reported by the device. It seems like at a minimum the
> pci-driver could emit an info or debug message when the device
> is reporting payload lengths that exceed what the driver can
> copy in.

When does this happen?
1. New fields on end of a fixed length message.
Correct to silently eat it as the spec is buggy if we don't
have backwards compatibility.
I don't think the spec has had that particular type of bug yet,
but maybe I'm forgetting one.
2. Device bug. Can't tell that is different from 1.

So maybe dev_dbg(). I'm not sure why the cxl-driver would ever want to
know.

> I'm referring to the mbox.size_out adjustment in
> __cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
> to judge, pass that actual payload length value back in the
> mbox structure (new field) so that the cxl-driver can use it.
> The pci driver would still do it's "#8 Sanitize the copy" work,
> but it would allow the cxl-driver to clearly see why it got the
> .size_out it got, and squawk about it if needed.
>
> --Alison
>
> >
> > Signed-off-by: Kwangjin Ko <[email protected]>
> > ---
> > drivers/cxl/core/mbox.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 9adda4795eb7..a38531a055c8 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > .payload_in = &log_type,
> > .size_in = sizeof(log_type),
> > .payload_out = payload,
> > - .size_out = mds->payload_size,
> > .min_out = struct_size(payload, records, 0),
> > };
> >
> > do {
> > int rc, i;
> >
> > + mbox_cmd.size_out = mds->payload_size;
> > +
> > rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> > if (rc) {
> > dev_err_ratelimited(dev,
> > --
> > 2.34.1
> >


2024-04-05 17:47:09

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

On Fri, Apr 05, 2024 at 05:40:56PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 09:04:16 -0700
> Alison Schofield <[email protected]> wrote:
>
> > On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> > > Since mbox_cmd.size_out is overwritten with the actual output size in
> > > the function below, it needs to be initialized every time.
> > >
> > > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> > >
> > > Problem scenario:
> > >
> > > 1) The size_out variable is initially set to the size of the mailbox.
> > > 2) Read an event.
> > > - size_out is set to 160 bytes(header 32B + one event 128B).
> > > - Two event are created while reading.
> > > 3) Read the new *two* events.
> > > - size_out is still set to 160 bytes.
> > > - Although the value of out_len is 288 bytes, only 160 bytes are
> > > copied from the mailbox register to the local variable.
> > > - record_count is set to 2.
> > > - Accessing records[1] will result in reading incorrect data.
> >
> > Agree with the other comments on need to set .out_size when doing
> > cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
> > this case if the MORE flag is set and a follow on read of the list
> > delivers more records than the previous read. ie. device gives one
> > record, sets the _MORE flag, then gives 5.
> >
> > 2 other things appeared to me while looking at this:
> >
> > First, it seems that there is another cleanup wrt accessing records
> > with invalid data. Still focusing on get_events and get_poison
> > since those loop through output data based on a device supplied
> > record count. The min_out check means the driver at least gets a
> > count of records to expect. That's good. The problem occurs::
> >
> > if (mbox.size_out != struct_size(payload, records, 'record_count'))
> >
> > The driver will log garbage trace events, and that could lead to
> > bad actions based on bad data. (like a needless scan of device based
> > on a false overflow flag). So, checking that size.out is the proper
> > multiple of record_count protects driver from bad device behavior.
> >
> > I think that can be combined w the patch Dan is suggesting to
> > reset mbox.size_out on each loop.
> Hi Alison,
>
> I'd split it. Dan's one is a bug fix, this is hardening against
> a device bug. Good to have but not really backport material unless
> we think there are devices like this out there.

Agree, not aware of actual device behavior.

>
> >
> > Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
> > values reported by the device. It seems like at a minimum the
> > pci-driver could emit an info or debug message when the device
> > is reporting payload lengths that exceed what the driver can
> > copy in.
>
> When does this happen?
> 1. New fields on end of a fixed length message.
> Correct to silently eat it as the spec is buggy if we don't
> have backwards compatibility.
> I don't think the spec has had that particular type of bug yet,
> but maybe I'm forgetting one.
> 2. Device bug. Can't tell that is different from 1.
>

My thought was 2) device bug. Bad device is returning payload length
that exceeds what pci/cxl-driver can consume, so gets ignored. Am I
worrying about debugging the hardware needlessly?

--Alison

> So maybe dev_dbg(). I'm not sure why the cxl-driver would ever want to
> know.
>
> > I'm referring to the mbox.size_out adjustment in
> > __cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
> > to judge, pass that actual payload length value back in the
> > mbox structure (new field) so that the cxl-driver can use it.
> > The pci driver would still do it's "#8 Sanitize the copy" work,
> > but it would allow the cxl-driver to clearly see why it got the
> > .size_out it got, and squawk about it if needed.
> >
> > --Alison
> >
> > >
> > > Signed-off-by: Kwangjin Ko <[email protected]>
> > > ---
> > > drivers/cxl/core/mbox.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 9adda4795eb7..a38531a055c8 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > > .payload_in = &log_type,
> > > .size_in = sizeof(log_type),
> > > .payload_out = payload,
> > > - .size_out = mds->payload_size,
> > > .min_out = struct_size(payload, records, 0),
> > > };
> > >
> > > do {
> > > int rc, i;
> > >
> > > + mbox_cmd.size_out = mds->payload_size;
> > > +
> > > rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> > > if (rc) {
> > > dev_err_ratelimited(dev,
> > > --
> > > 2.34.1
> > >
>

2024-04-05 17:52:44

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Alison Schofield wrote:
[..]
> My thought was 2) device bug. Bad device is returning payload length
> that exceeds what pci/cxl-driver can consume, so gets ignored. Am I
> worrying about debugging the hardware needlessly?

I would not go so far as to say "needlessly", but the number of fun and
interesting ways that hardware can violate software expectations is
myriad, so it will always be game of after-the-fact quirks and fixups.

A payload truncation would seem to fail safely in the sense of no buffer
overrun and no stale data exposure. Still a bug, but no riskier than all
the other potential hardware bugs / spec violations.