2019-01-17 08:52:31

by Ching Huang

[permalink] [raw]
Subject: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

From Ching Huang <[email protected]>

Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.

Signed-off-by: Ching Huang <[email protected]>
---

diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
index a94c513..b98c632 100755
--- a/drivers/scsi/arcmsr/arcmsr.h
+++ b/drivers/scsi/arcmsr/arcmsr.h
@@ -508,9 +508,9 @@ struct MessageUnit_A
struct MessageUnit_B
{
uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
- uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
+ volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
uint32_t postq_index;
- uint32_t doneq_index;
+ volatile uint32_t doneq_index;
uint32_t __iomem *drv2iop_doorbell;
uint32_t __iomem *drv2iop_doorbell_mask;
uint32_t __iomem *iop2drv_doorbell;
diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 5736434..88053b1 100755
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -1113,7 +1113,11 @@ static int arcmsr_resume(struct pci_dev *pdev)
switch (acb->adapter_type) {
case ACB_ADAPTER_TYPE_B: {
struct MessageUnit_B *reg = acb->pmuB;
- reg->post_qbuffer[0] = 0;
+ uint32_t i;
+ for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
+ reg->post_qbuffer[i] = 0;
+ reg->done_qbuffer[i] = 0;
+ }
reg->postq_index = 0;
reg->doneq_index = 0;
break;




2019-01-17 09:43:06

by Ching Huang

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > >From Ching Huang <[email protected]>
> >
> > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> >
>
> What does this look like from a user perspective? Does it fail every
> time or does it only fail sometimes?
>
> What's the bug exactly?
>
> There is no Fixes tag...
From user's perspective, hibernate/resume are OK.
But following IO may cause 'isr get an illegal ccb command' in
log/messages sometime.
>
> > Signed-off-by: Ching Huang <[email protected]>
> > ---
> >
> > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > index a94c513..b98c632 100755
> > --- a/drivers/scsi/arcmsr/arcmsr.h
> > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > struct MessageUnit_B
> > {
> > uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
>
> There is a well known rule of thumb that when someone uses "volatile"
> in the kernel it means there is a locking problem... Is this __iomem or
> something?
The done_qbuffer was a command completion queue, it was an area written
by IO processor and read by device driver. So, ...
>
> > uint32_t postq_index;
> > - uint32_t doneq_index;
> > + volatile uint32_t doneq_index;
> > uint32_t __iomem *drv2iop_doorbell;
> > uint32_t __iomem *drv2iop_doorbell_mask;
> > uint32_t __iomem *iop2drv_doorbell;
> > diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
> > index 5736434..88053b1 100755
> > --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> > @@ -1113,7 +1113,11 @@ static int arcmsr_resume(struct pci_dev *pdev)
> > switch (acb->adapter_type) {
> > case ACB_ADAPTER_TYPE_B: {
> > struct MessageUnit_B *reg = acb->pmuB;
> > - reg->post_qbuffer[0] = 0;
> > + uint32_t i;
> > + for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
> > + reg->post_qbuffer[i] = 0;
> > + reg->done_qbuffer[i] = 0;
> > + }
>
> Is this cause by patch 1 changing the zalloc to regular alloc?? If so
> then it should be folded into that patch instead of sent separately.
These fully clear delivery and completion queues are for fixing
'isr get an illegal ccb command'. It is nothing related to Zalloc or alloc.
>
> regards,
> dan carpenter
>
>



2019-01-17 09:49:05

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > >From Ching Huang <[email protected]>
> > >
> > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > >
> >
> > What does this look like from a user perspective? Does it fail every
> > time or does it only fail sometimes?
> >
> > What's the bug exactly?
> >
> > There is no Fixes tag...
> >From user's perspective, hibernate/resume are OK.
> But following IO may cause 'isr get an illegal ccb command' in
> log/messages sometime.
> >


You will need to resend with that information included in the commit
message.

> > > Signed-off-by: Ching Huang <[email protected]>
> > > ---
> > >
> > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > index a94c513..b98c632 100755
> > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > struct MessageUnit_B
> > > {
> > > uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> >
> > There is a well known rule of thumb that when someone uses "volatile"
> > in the kernel it means there is a locking problem... Is this __iomem or
> > something?
> The done_qbuffer was a command completion queue, it was an area written
> by IO processor and read by device driver. So, ...

I'm not totally positive I understand this sentence. I can find a bunch
of places which read from this buffer, but I haven't immediately found
which place writes to it. Can you give me a function name that I should
read?

> >
> > > uint32_t postq_index;
> > > - uint32_t doneq_index;
> > > + volatile uint32_t doneq_index;

The volatile here is not right. It's just normal memory.

regards,
dan carpenter

2019-01-17 09:55:56

by Ching Huang

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > >From Ching Huang <[email protected]>
> > > >
> > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > >
> > >
> > > What does this look like from a user perspective? Does it fail every
> > > time or does it only fail sometimes?
> > >
> > > What's the bug exactly?
> > >
> > > There is no Fixes tag...
> > >From user's perspective, hibernate/resume are OK.
> > But following IO may cause 'isr get an illegal ccb command' in
> > log/messages sometime.
> > >
>
>
> You will need to resend with that information included in the commit
> message.
OK. I will resend this patch later.
>
> > > > Signed-off-by: Ching Huang <[email protected]>
> > > > ---
> > > >
> > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > index a94c513..b98c632 100755
> > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > > struct MessageUnit_B
> > > > {
> > > > uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > >
> > > There is a well known rule of thumb that when someone uses "volatile"
> > > in the kernel it means there is a locking problem... Is this __iomem or
> > > something?
> > The done_qbuffer was a command completion queue, it was an area written
> > by IO processor and read by device driver. So, ...
>
> I'm not totally positive I understand this sentence. I can find a bunch
> of places which read from this buffer, but I haven't immediately found
> which place writes to it. Can you give me a function name that I should
> read?
Well, we allocate memory for struct MessageUnit_B in
arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
Then we tell IO controller its DMA address in arcmsr_iop_confirm().
When a command was completed, controller's firmware program will write a
completion ccb in done_qbuffer through DMA. So, you can't see any driver
funtion write to it.
>
> > >
> > > > uint32_t postq_index;
> > > > - uint32_t doneq_index;
> > > > + volatile uint32_t doneq_index;
>
> The volatile here is not right. It's just normal memory.
Right. this volatile is not necessary.
>
> regards,
> dan carpenter



2019-01-17 11:43:06

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> >From Ching Huang <[email protected]>
>
> Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
>

What does this look like from a user perspective? Does it fail every
time or does it only fail sometimes?

What's the bug exactly?

There is no Fixes tag...

> Signed-off-by: Ching Huang <[email protected]>
> ---
>
> diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> index a94c513..b98c632 100755
> --- a/drivers/scsi/arcmsr/arcmsr.h
> +++ b/drivers/scsi/arcmsr/arcmsr.h
> @@ -508,9 +508,9 @@ struct MessageUnit_A
> struct MessageUnit_B
> {
> uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];

There is a well known rule of thumb that when someone uses "volatile"
in the kernel it means there is a locking problem... Is this __iomem or
something?

> uint32_t postq_index;
> - uint32_t doneq_index;
> + volatile uint32_t doneq_index;
> uint32_t __iomem *drv2iop_doorbell;
> uint32_t __iomem *drv2iop_doorbell_mask;
> uint32_t __iomem *iop2drv_doorbell;
> diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
> index 5736434..88053b1 100755
> --- a/drivers/scsi/arcmsr/arcmsr_hba.c
> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c
> @@ -1113,7 +1113,11 @@ static int arcmsr_resume(struct pci_dev *pdev)
> switch (acb->adapter_type) {
> case ACB_ADAPTER_TYPE_B: {
> struct MessageUnit_B *reg = acb->pmuB;
> - reg->post_qbuffer[0] = 0;
> + uint32_t i;
> + for (i = 0; i < ARCMSR_MAX_HBB_POSTQUEUE; i++) {
> + reg->post_qbuffer[i] = 0;
> + reg->done_qbuffer[i] = 0;
> + }

Is this cause by patch 1 changing the zalloc to regular alloc?? If so
then it should be folded into that patch instead of sent separately.

regards,
dan carpenter



2019-01-22 07:52:16

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Thu, Jan 17, 2019 at 05:52:28PM +0800, Ching Huang wrote:
> On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> > On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > > >From Ching Huang <[email protected]>
> > > > >
> > > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > > >
> > > >
> > > > What does this look like from a user perspective? Does it fail every
> > > > time or does it only fail sometimes?
> > > >
> > > > What's the bug exactly?
> > > >
> > > > There is no Fixes tag...
> > > >From user's perspective, hibernate/resume are OK.
> > > But following IO may cause 'isr get an illegal ccb command' in
> > > log/messages sometime.
> > > >
> >
> >
> > You will need to resend with that information included in the commit
> > message.
> OK. I will resend this patch later.
> >
> > > > > Signed-off-by: Ching Huang <[email protected]>
> > > > > ---
> > > > >
> > > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > > index a94c513..b98c632 100755
> > > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > > > struct MessageUnit_B
> > > > > {
> > > > > uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > >
> > > > There is a well known rule of thumb that when someone uses "volatile"
> > > > in the kernel it means there is a locking problem... Is this __iomem or
> > > > something?
> > > The done_qbuffer was a command completion queue, it was an area written
> > > by IO processor and read by device driver. So, ...
> >
> > I'm not totally positive I understand this sentence. I can find a bunch
> > of places which read from this buffer, but I haven't immediately found
> > which place writes to it. Can you give me a function name that I should
> > read?
> Well, we allocate memory for struct MessageUnit_B in
> arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
> Then we tell IO controller its DMA address in arcmsr_iop_confirm().
> When a command was completed, controller's firmware program will write a
> completion ccb in done_qbuffer through DMA. So, you can't see any driver
> funtion write to it.

DMA memory doesn't need to be marked as volatile.

regards,
dan carpenter


2019-01-22 08:17:38

by Ching Huang

[permalink] [raw]
Subject: Re: [PATCH 2/3] scsi: arcmsr: Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2

On Tue, 2019-01-22 at 10:48 +0300, Dan Carpenter wrote:
> On Thu, Jan 17, 2019 at 05:52:28PM +0800, Ching Huang wrote:
> > On Thu, 2019-01-17 at 12:16 +0300, Dan Carpenter wrote:
> > > On Thu, Jan 17, 2019 at 04:47:07PM +0800, Ching Huang wrote:
> > > > On Thu, 2019-01-17 at 10:59 +0300, Dan Carpenter wrote:
> > > > > On Thu, Jan 17, 2019 at 11:45:03AM +0800, Ching Huang wrote:
> > > > > > >From Ching Huang <[email protected]>
> > > > > >
> > > > > > Fix suspend/resume of ACB_ADAPTER_TYPE_B part 2.
> > > > > >
> > > > >
> > > > > What does this look like from a user perspective? Does it fail every
> > > > > time or does it only fail sometimes?
> > > > >
> > > > > What's the bug exactly?
> > > > >
> > > > > There is no Fixes tag...
> > > > >From user's perspective, hibernate/resume are OK.
> > > > But following IO may cause 'isr get an illegal ccb command' in
> > > > log/messages sometime.
> > > > >
> > >
> > >
> > > You will need to resend with that information included in the commit
> > > message.
> > OK. I will resend this patch later.
> > >
> > > > > > Signed-off-by: Ching Huang <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > diff --git a/drivers/scsi/arcmsr/arcmsr.h b/drivers/scsi/arcmsr/arcmsr.h
> > > > > > index a94c513..b98c632 100755
> > > > > > --- a/drivers/scsi/arcmsr/arcmsr.h
> > > > > > +++ b/drivers/scsi/arcmsr/arcmsr.h
> > > > > > @@ -508,9 +508,9 @@ struct MessageUnit_A
> > > > > > struct MessageUnit_B
> > > > > > {
> > > > > > uint32_t post_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > > - uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > > > + volatile uint32_t done_qbuffer[ARCMSR_MAX_HBB_POSTQUEUE];
> > > > >
> > > > > There is a well known rule of thumb that when someone uses "volatile"
> > > > > in the kernel it means there is a locking problem... Is this __iomem or
> > > > > something?
> > > > The done_qbuffer was a command completion queue, it was an area written
> > > > by IO processor and read by device driver. So, ...
> > >
> > > I'm not totally positive I understand this sentence. I can find a bunch
> > > of places which read from this buffer, but I haven't immediately found
> > > which place writes to it. Can you give me a function name that I should
> > > read?
> > Well, we allocate memory for struct MessageUnit_B in
> > arcmsr_alloc_ccb_pool(), by assign to acb->dma_coherent_handle2.
> > Then we tell IO controller its DMA address in arcmsr_iop_confirm().
> > When a command was completed, controller's firmware program will write a
> > completion ccb in done_qbuffer through DMA. So, you can't see any driver
> > funtion write to it.
>
> DMA memory doesn't need to be marked as volatile.
I see. So I have removed the volatile in patch v2.
>
> regards,
> dan carpenter
>