2022-02-02 12:51:12

by Ben Widawsky

[permalink] [raw]
Subject: Re: [PATCH V6 09/10] cxl/mem: Retry reading CDAT on failure

On 22-01-31 23:19:51, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> The CDAT read may fail for a number of reasons but mainly it is possible
> to get different parts of a valid state. The checksum in the CDAT table
> protects against this.
>
> Now that the checksum is validated issue a retry if the CDAT read fails.
> For now 2 retries are implemented.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> NOTE: Is 2 enough? Should this just be delayed until the time when the
> data is actually needed and not there?

I can't speak to retries at all, but one small issue below. It might make sense
if we keep this to make it a modparam.

>
> Changes from V5:
> New patch -- easy to push off or drop.
> ---
> drivers/cxl/core/memdev.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index a01068e98333..11d721c56f08 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -356,7 +356,8 @@ static const struct file_operations cxl_memdev_fops = {
> .llseek = noop_llseek,
> };
>
> -static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> +static int __read_cdat_data(struct cxl_memdev *cxlmd,
> + struct cxl_dev_state *cxlds)
> {
> struct device *dev = &cxlmd->dev;
> size_t cdat_length;
> @@ -371,6 +372,20 @@ static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> return cxl_mem_cdat_read_table(cxlds, &cxlmd->cdat);
> }
>
> +static int read_cdat_data(struct cxl_memdev *cxlmd,
> + struct cxl_dev_state *cxlds)
> +{
> + int retries = 2;
> + int rc;
> +
> + while (--retries) {

You either want retries--, or retries = 3...

> + rc = __read_cdat_data(cxlmd, cxlds);
> + if (!rc)
> + break;
> + }
> + return rc;
> +}
> +
> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> {
> struct cxl_memdev *cxlmd;
> --
> 2.31.1
>


2022-02-03 21:08:55

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V6 09/10] cxl/mem: Retry reading CDAT on failure

On Tue, Feb 01, 2022 at 10:59:28AM -0800, Widawsky, Ben wrote:
> On 22-01-31 23:19:51, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > The CDAT read may fail for a number of reasons but mainly it is possible
> > to get different parts of a valid state. The checksum in the CDAT table
> > protects against this.
> >
> > Now that the checksum is validated issue a retry if the CDAT read fails.
> > For now 2 retries are implemented.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > NOTE: Is 2 enough? Should this just be delayed until the time when the
> > data is actually needed and not there?
>
> I can't speak to retries at all, but one small issue below. It might make sense
> if we keep this to make it a modparam.

Not a bad idea.

>
> >
> > Changes from V5:
> > New patch -- easy to push off or drop.
> > ---
> > drivers/cxl/core/memdev.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index a01068e98333..11d721c56f08 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -356,7 +356,8 @@ static const struct file_operations cxl_memdev_fops = {
> > .llseek = noop_llseek,
> > };
> >
> > -static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> > +static int __read_cdat_data(struct cxl_memdev *cxlmd,
> > + struct cxl_dev_state *cxlds)
> > {
> > struct device *dev = &cxlmd->dev;
> > size_t cdat_length;
> > @@ -371,6 +372,20 @@ static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> > return cxl_mem_cdat_read_table(cxlds, &cxlmd->cdat);
> > }
> >
> > +static int read_cdat_data(struct cxl_memdev *cxlmd,
> > + struct cxl_dev_state *cxlds)
> > +{
> > + int retries = 2;
> > + int rc;
> > +
> > + while (--retries) {
>
> You either want retries--, or retries = 3...

Opps yea.

Thanks,
Ira

>
> > + rc = __read_cdat_data(cxlmd, cxlds);
> > + if (!rc)
> > + break;
> > + }
> > + return rc;
> > +}
> > +
> > struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> > {
> > struct cxl_memdev *cxlmd;
> > --
> > 2.31.1
> >

2022-02-07 16:36:02

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V6 09/10] cxl/mem: Retry reading CDAT on failure

On Tue, 1 Feb 2022 14:31:59 -0800
Ira Weiny <[email protected]> wrote:

> On Tue, Feb 01, 2022 at 10:59:28AM -0800, Widawsky, Ben wrote:
> > On 22-01-31 23:19:51, [email protected] wrote:
> > > From: Ira Weiny <[email protected]>
> > >
> > > The CDAT read may fail for a number of reasons but mainly it is possible
> > > to get different parts of a valid state. The checksum in the CDAT table
> > > protects against this.
> > >
> > > Now that the checksum is validated issue a retry if the CDAT read fails.
> > > For now 2 retries are implemented.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> > > NOTE: Is 2 enough? Should this just be delayed until the time when the
> > > data is actually needed and not there?
> >
> > I can't speak to retries at all, but one small issue below. It might make sense
> > if we keep this to make it a modparam.
>
> Not a bad idea.

Ah. Here is the retry - I should have read the rest of the thread :)

This whole cycle isn't in a hot path and is fairly cheap. I'd just
go with c. 5 and assume that is enough for anyone. If we need a module
parameter later because this race turns out to be something that
actually happens then it is easy enough to add then.




>
> >
> > >
> > > Changes from V5:
> > > New patch -- easy to push off or drop.
> > > ---
> > > drivers/cxl/core/memdev.c | 17 ++++++++++++++++-
> > > 1 file changed, 16 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > index a01068e98333..11d721c56f08 100644
> > > --- a/drivers/cxl/core/memdev.c
> > > +++ b/drivers/cxl/core/memdev.c
> > > @@ -356,7 +356,8 @@ static const struct file_operations cxl_memdev_fops = {
> > > .llseek = noop_llseek,
> > > };
> > >
> > > -static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> > > +static int __read_cdat_data(struct cxl_memdev *cxlmd,
> > > + struct cxl_dev_state *cxlds)
> > > {
> > > struct device *dev = &cxlmd->dev;
> > > size_t cdat_length;
> > > @@ -371,6 +372,20 @@ static int read_cdat_data(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
> > > return cxl_mem_cdat_read_table(cxlds, &cxlmd->cdat);
> > > }
> > >
> > > +static int read_cdat_data(struct cxl_memdev *cxlmd,
> > > + struct cxl_dev_state *cxlds)
> > > +{
> > > + int retries = 2;
> > > + int rc;
> > > +
> > > + while (--retries) {
> >
> > You either want retries--, or retries = 3...
>
> Opps yea.
>
> Thanks,
> Ira
>
> >
> > > + rc = __read_cdat_data(cxlmd, cxlds);
> > > + if (!rc)
> > > + break;
> > > + }
> > > + return rc;
> > > +}
> > > +
> > > struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> > > {
> > > struct cxl_memdev *cxlmd;
> > > --
> > > 2.31.1
> > >