2022-12-15 17:34:09

by Fan Ni

[permalink] [raw]
Subject: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

Not all decoders have a reset callback.

The CXL specification allows a host bridge with a single root port to
have no explicit HDM decoders. Currently the region driver assumes there
are none. As such the CXL core creates a special pass through decoder
instance without a commit/reset callback.

Prior to this patch, the ->reset() callback was called unconditionally when
calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
1 Root Port, and one directly attached CXL type 3 device or multiple CXL
type 3 devices attached to downstream ports of a switch can cause a null
pointer dereference.

Before the fix, a kernel crash was observed when we destroy the region, and
a pass through decoder is reset.

The issue can be reproduced as below,
1) create a region with a CXL setup which includes a HB with a
single root port under which a memdev is attached directly.
2) destroy the region with cxl destroy-region regionX -f.

Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Fan Ni <[email protected]>
---
drivers/cxl/core/region.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f9ae5ad284ff..3931793a13ac 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_port *iter = cxled_to_port(cxled);
struct cxl_ep *ep;
- int rc;
+ int rc = 0;

while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
iter = to_cxl_port(iter->dev.parent);
@@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)

cxl_rr = cxl_rr_load(iter, cxlr);
cxld = cxl_rr->decoder;
- rc = cxld->reset(cxld);
+ if (cxld->reset)
+ rc = cxld->reset(cxld);
if (rc)
return rc;
}
@@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr)
iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
cxl_rr = cxl_rr_load(iter, cxlr);
cxld = cxl_rr->decoder;
- cxld->reset(cxld);
+ if (cxld->reset)
+ cxld->reset(cxld);
}

cxled->cxld.reset(&cxled->cxld);
--
2.25.1


2023-01-13 11:52:55

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

On Thu, 15 Dec 2022 17:09:14 +0000
Fan Ni <[email protected]> wrote:

> Not all decoders have a reset callback.
>
> The CXL specification allows a host bridge with a single root port to
> have no explicit HDM decoders. Currently the region driver assumes there
> are none. As such the CXL core creates a special pass through decoder
> instance without a commit/reset callback.
>
> Prior to this patch, the ->reset() callback was called unconditionally when
> calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
> 1 Root Port, and one directly attached CXL type 3 device or multiple CXL
> type 3 devices attached to downstream ports of a switch can cause a null
> pointer dereference.
>
> Before the fix, a kernel crash was observed when we destroy the region, and
> a pass through decoder is reset.
>
> The issue can be reproduced as below,
> 1) create a region with a CXL setup which includes a HB with a
> single root port under which a memdev is attached directly.
> 2) destroy the region with cxl destroy-region regionX -f.
>
> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> Signed-off-by: Fan Ni <[email protected]>

Explanation seems correct to me. Only question (and it's one for the
Maintainers) is whether they prefer optionality here or a stub reset()
implementation for the pass through decoder.

either way
Reviewed-by: Jonathan Cameron <[email protected]>

> ---
> drivers/cxl/core/region.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f9ae5ad284ff..3931793a13ac 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *iter = cxled_to_port(cxled);
> struct cxl_ep *ep;
> - int rc;
> + int rc = 0;
>
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> iter = to_cxl_port(iter->dev.parent);
> @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - rc = cxld->reset(cxld);
> + if (cxld->reset)
> + rc = cxld->reset(cxld);
> if (rc)
> return rc;
> }
> @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr)
> iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - cxld->reset(cxld);
> + if (cxld->reset)
> + cxld->reset(cxld);
> }
>
> cxled->cxld.reset(&cxled->cxld);

2023-01-17 18:32:00

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder



On 12/15/22 10:09 AM, Fan Ni wrote:
> Not all decoders have a reset callback.
>
> The CXL specification allows a host bridge with a single root port to
> have no explicit HDM decoders. Currently the region driver assumes there
> are none. As such the CXL core creates a special pass through decoder
> instance without a commit/reset callback.
>
> Prior to this patch, the ->reset() callback was called unconditionally when
> calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
> 1 Root Port, and one directly attached CXL type 3 device or multiple CXL
> type 3 devices attached to downstream ports of a switch can cause a null
> pointer dereference.
>
> Before the fix, a kernel crash was observed when we destroy the region, and
> a pass through decoder is reset.
>
> The issue can be reproduced as below,
> 1) create a region with a CXL setup which includes a HB with a
> single root port under which a memdev is attached directly.
> 2) destroy the region with cxl destroy-region regionX -f.
>
> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> Signed-off-by: Fan Ni <[email protected]>

Makes sense, especially with the emulated decoders coming w/o ->reset().

Reviewed-by: Dave Jiang <[email protected]>

> ---
> drivers/cxl/core/region.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f9ae5ad284ff..3931793a13ac 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *iter = cxled_to_port(cxled);
> struct cxl_ep *ep;
> - int rc;
> + int rc = 0;
>
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> iter = to_cxl_port(iter->dev.parent);
> @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - rc = cxld->reset(cxld);
> + if (cxld->reset)
> + rc = cxld->reset(cxld);
> if (rc)
> return rc;
> }
> @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr)
> iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - cxld->reset(cxld);
> + if (cxld->reset)
> + cxld->reset(cxld);
> }
>
> cxled->cxld.reset(&cxled->cxld);

2023-02-01 16:24:51

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

On Thu, 15 Dec 2022, Fan Ni wrote:

>Not all decoders have a reset callback.
>
>The CXL specification allows a host bridge with a single root port to
>have no explicit HDM decoders. Currently the region driver assumes there
>are none. As such the CXL core creates a special pass through decoder
>instance without a commit/reset callback.
>
>Prior to this patch, the ->reset() callback was called unconditionally when
>calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
>1 Root Port, and one directly attached CXL type 3 device or multiple CXL
>type 3 devices attached to downstream ports of a switch can cause a null
>pointer dereference.
>
>Before the fix, a kernel crash was observed when we destroy the region, and
>a pass through decoder is reset.
>
>The issue can be reproduced as below,
> 1) create a region with a CXL setup which includes a HB with a
> single root port under which a memdev is attached directly.
> 2) destroy the region with cxl destroy-region regionX -f.
>
>Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
>Signed-off-by: Fan Ni <[email protected]>

Reviewed-by: Davidlohr Bueso <[email protected]>

2023-02-01 17:58:51

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

Jonathan Cameron wrote:
> On Thu, 15 Dec 2022 17:09:14 +0000
> Fan Ni <[email protected]> wrote:
>
> > Not all decoders have a reset callback.
> >
> > The CXL specification allows a host bridge with a single root port to
> > have no explicit HDM decoders. Currently the region driver assumes there
> > are none. As such the CXL core creates a special pass through decoder
> > instance without a commit/reset callback.
> >
> > Prior to this patch, the ->reset() callback was called unconditionally when
> > calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
> > 1 Root Port, and one directly attached CXL type 3 device or multiple CXL
> > type 3 devices attached to downstream ports of a switch can cause a null
> > pointer dereference.
> >
> > Before the fix, a kernel crash was observed when we destroy the region, and
> > a pass through decoder is reset.
> >
> > The issue can be reproduced as below,
> > 1) create a region with a CXL setup which includes a HB with a
> > single root port under which a memdev is attached directly.
> > 2) destroy the region with cxl destroy-region regionX -f.
> >
> > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> > Signed-off-by: Fan Ni <[email protected]>
>
> Explanation seems correct to me. Only question (and it's one for the
> Maintainers) is whether they prefer optionality here or a stub reset()
> implementation for the pass through decoder.

Yeah, I think this fix as is works for the purposes of the -stable
backport and then a follow-on can add the optionality.

2023-02-06 15:48:17

by Gregory Price

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

On Thu, Dec 15, 2022 at 05:09:14PM +0000, Fan Ni wrote:
> Not all decoders have a reset callback.
>
> The CXL specification allows a host bridge with a single root port to
> have no explicit HDM decoders. Currently the region driver assumes there
> are none. As such the CXL core creates a special pass through decoder
> instance without a commit/reset callback.
>
> Prior to this patch, the ->reset() callback was called unconditionally when
> calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
> 1 Root Port, and one directly attached CXL type 3 device or multiple CXL
> type 3 devices attached to downstream ports of a switch can cause a null
> pointer dereference.
>
> Before the fix, a kernel crash was observed when we destroy the region, and
> a pass through decoder is reset.
>
> The issue can be reproduced as below,
> 1) create a region with a CXL setup which includes a HB with a
> single root port under which a memdev is attached directly.
> 2) destroy the region with cxl destroy-region regionX -f.
>
> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> Signed-off-by: Fan Ni <[email protected]>
> ---
> drivers/cxl/core/region.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f9ae5ad284ff..3931793a13ac 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *iter = cxled_to_port(cxled);
> struct cxl_ep *ep;
> - int rc;
> + int rc = 0;
>
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> iter = to_cxl_port(iter->dev.parent);
> @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - rc = cxld->reset(cxld);
> + if (cxld->reset)
> + rc = cxld->reset(cxld);
> if (rc)
> return rc;
> }
> @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr)
> iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> cxl_rr = cxl_rr_load(iter, cxlr);
> cxld = cxl_rr->decoder;
> - cxld->reset(cxld);
> + if (cxld->reset)
> + cxld->reset(cxld);
> }
>
> cxled->cxld.reset(&cxled->cxld);
> --
> 2.25.1


Should we try to get this upstreamed in 6.2-final? Seems like a good
stable addition. Probably doesn't affect real hardware, but it certainly
affects QEMU.


Tested-by: Gregory Price <[email protected]>
Reviewed-by: Gregory Price <[email protected]>

2023-02-06 19:18:25

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] cxl/region: Fix null pointer dereference for resetting decoder

Gregory Price wrote:
[..]
> Should we try to get this upstreamed in 6.2-final? Seems like a good
> stable addition. Probably doesn't affect real hardware, but it certainly
> affects QEMU.

Yes, that's the plan.

https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/commit/?h=fixes&id=01d2cb2593b1