2009-07-02 09:36:47

by Hannes Reinecke

[permalink] [raw]
Subject: [PATCH] cciss: Ignore stale commands after reboot


When doing an unexpected shutdown like kexec the cciss
firmware might still have some commands in flight, which
it is trying to complete.
The driver is doing it's best on resetting the HBA,
but sadly there's a firmware issue causing the firmware
_not_ to abort or drop old commands.
So the firmware will send us commands which we haven't
accounted for, causing the driver to panic.

With this patch we're just ignoring these commands as
there is nothing we could be doing with them anyway.

Signed-off-by: Hannes Reinecke <[email protected]>
---
drivers/block/cciss.c | 15 +++++++++++++--
drivers/block/cciss_cmd.h | 1 +
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index c7a527c..65a0655 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -226,8 +226,18 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)

static inline void removeQ(CommandList_struct *c)
{
- if (WARN_ON(hlist_unhashed(&c->list)))
+ /*
+ * After kexec/dump some commands might still
+ * be in flight, which the firmware will try
+ * to complete. Resetting the firmware doesn't work
+ * with old fw revisions, so we have to mark
+ * them off as 'stale' to prevent the driver from
+ * falling over.
+ */
+ if (WARN_ON(hlist_unhashed(&c->list))) {
+ c->cmd_type = CMD_MSG_STALE;
return;
+ }

hlist_del_init(&c->list);
}
@@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
while (!hlist_empty(&h->cmpQ)) {
c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
removeQ(c);
- c->err_info->CommandStatus = CMD_HARDWARE_ERR;
+ if (c->cmd_type != CMD_MSG_STALE)
+ c->err_info->CommandStatus = CMD_HARDWARE_ERR;
if (c->cmd_type == CMD_RWREQ) {
complete_command(h, c, 0);
} else if (c->cmd_type == CMD_IOCTL_PEND)
diff --git a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h
index cd665b0..dbaed1e 100644
--- a/drivers/block/cciss_cmd.h
+++ b/drivers/block/cciss_cmd.h
@@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
#define CMD_SCSI 0x03
#define CMD_MSG_DONE 0x04
#define CMD_MSG_TIMEOUT 0x05
+#define CMD_MSG_STALE 0xff

/* This structure needs to be divisible by 8 for new
* indexing method.
--
1.5.3.2


2009-07-02 19:00:55

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] cciss: Ignore stale commands after reboot

On Thu, Jul 02 2009, Hannes Reinecke wrote:
>
> When doing an unexpected shutdown like kexec the cciss
> firmware might still have some commands in flight, which
> it is trying to complete.
> The driver is doing it's best on resetting the HBA,
> but sadly there's a firmware issue causing the firmware
> _not_ to abort or drop old commands.
> So the firmware will send us commands which we haven't
> accounted for, causing the driver to panic.
>
> With this patch we're just ignoring these commands as
> there is nothing we could be doing with them anyway.

Looks good to me. Mike, Stephen?

>
> Signed-off-by: Hannes Reinecke <[email protected]>
> ---
> drivers/block/cciss.c | 15 +++++++++++++--
> drivers/block/cciss_cmd.h | 1 +
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index c7a527c..65a0655 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -226,8 +226,18 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
>
> static inline void removeQ(CommandList_struct *c)
> {
> - if (WARN_ON(hlist_unhashed(&c->list)))
> + /*
> + * After kexec/dump some commands might still
> + * be in flight, which the firmware will try
> + * to complete. Resetting the firmware doesn't work
> + * with old fw revisions, so we have to mark
> + * them off as 'stale' to prevent the driver from
> + * falling over.
> + */
> + if (WARN_ON(hlist_unhashed(&c->list))) {
> + c->cmd_type = CMD_MSG_STALE;
> return;
> + }
>
> hlist_del_init(&c->list);
> }
> @@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
> while (!hlist_empty(&h->cmpQ)) {
> c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
> removeQ(c);
> - c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> + if (c->cmd_type != CMD_MSG_STALE)
> + c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> if (c->cmd_type == CMD_RWREQ) {
> complete_command(h, c, 0);
> } else if (c->cmd_type == CMD_IOCTL_PEND)
> diff --git a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h
> index cd665b0..dbaed1e 100644
> --- a/drivers/block/cciss_cmd.h
> +++ b/drivers/block/cciss_cmd.h
> @@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
> #define CMD_SCSI 0x03
> #define CMD_MSG_DONE 0x04
> #define CMD_MSG_TIMEOUT 0x05
> +#define CMD_MSG_STALE 0xff
>
> /* This structure needs to be divisible by 8 for new
> * indexing method.
> --
> 1.5.3.2
>

--
Jens Axboe

2009-07-02 19:59:36

by Mike Miller

[permalink] [raw]
Subject: RE: Re: [PATCH] cciss: Ignore stale commands after reboot



> -----Original Message-----
> From: Andrew Morton [mailto:[email protected]]
> Sent: Thursday, July 02, 2009 2:51 PM
> To: Miller, Mike (OS Dev)
> Subject: Fw: Re: [PATCH] cciss: Ignore stale commands after reboot
>
>
> oh, Jens already did it.
>
> Begin forwarded message:
>
> Date: Thu, 2 Jul 2009 21:00:49 +0200
> From: Jens Axboe <[email protected]>
> To: Hannes Reinecke <[email protected]>
> Cc: [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH] cciss: Ignore stale commands after reboot
>
>
> On Thu, Jul 02 2009, Hannes Reinecke wrote:
> >
> > When doing an unexpected shutdown like kexec the cciss
> firmware might
> > still have some commands in flight, which it is trying to complete.
> > The driver is doing it's best on resetting the HBA, but
> sadly there's
> > a firmware issue causing the firmware _not_ to abort or drop old
> > commands.
> > So the firmware will send us commands which we haven't
> accounted for,
> > causing the driver to panic.
> >
> > With this patch we're just ignoring these commands as there
> is nothing
> > we could be doing with them anyway.
>
> Looks good to me. Mike, Stephen?

Sorry I haven't seen this before. The beardog addresses are no longer valid. We moved into a dungeon and into a new domain. The good folks in IT have yet to assign another IP address/domain name or an MX record for the mail servers. I hope that by next week that will be corrected. Until then all Steve and I have to use is some form of OutHouse mail client.

Acked-by: Mike Miller <[email protected]>

>
> >
> > Signed-off-by: Hannes Reinecke <[email protected]>
> > ---
> > drivers/block/cciss.c | 15 +++++++++++++--
> > drivers/block/cciss_cmd.h | 1 +
> > 2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index
> > c7a527c..65a0655 100644
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -226,8 +226,18 @@ static inline void addQ(struct
> hlist_head *list,
> > CommandList_struct *c)
> >
> > static inline void removeQ(CommandList_struct *c) {
> > - if (WARN_ON(hlist_unhashed(&c->list)))
> > + /*
> > + * After kexec/dump some commands might still
> > + * be in flight, which the firmware will try
> > + * to complete. Resetting the firmware doesn't work
> > + * with old fw revisions, so we have to mark
> > + * them off as 'stale' to prevent the driver from
> > + * falling over.
> > + */
> > + if (WARN_ON(hlist_unhashed(&c->list))) {
> > + c->cmd_type = CMD_MSG_STALE;
> > return;
> > + }
> >
> > hlist_del_init(&c->list);
> > }
> > @@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
> > while (!hlist_empty(&h->cmpQ)) {
> > c = hlist_entry(h->cmpQ.first,
> CommandList_struct, list);
> > removeQ(c);
> > - c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> > + if (c->cmd_type != CMD_MSG_STALE)
> > + c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> > if (c->cmd_type == CMD_RWREQ) {
> > complete_command(h, c, 0);
> > } else if (c->cmd_type == CMD_IOCTL_PEND) diff --git
> > a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h index
> > cd665b0..dbaed1e 100644
> > --- a/drivers/block/cciss_cmd.h
> > +++ b/drivers/block/cciss_cmd.h
> > @@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
> > #define CMD_SCSI 0x03
> > #define CMD_MSG_DONE 0x04
> > #define CMD_MSG_TIMEOUT 0x05
> > +#define CMD_MSG_STALE 0xff
> >
> > /* This structure needs to be divisible by 8 for new
> > * indexing method.
> > --
> > 1.5.3.2
> >
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> -

2009-07-02 20:01:18

by Jens Axboe

[permalink] [raw]
Subject: Re: Re: [PATCH] cciss: Ignore stale commands after reboot

On Thu, Jul 02 2009, Miller, Mike (OS Dev) wrote:
>
>
> > -----Original Message-----
> > From: Andrew Morton [mailto:[email protected]]
> > Sent: Thursday, July 02, 2009 2:51 PM
> > To: Miller, Mike (OS Dev)
> > Subject: Fw: Re: [PATCH] cciss: Ignore stale commands after reboot
> >
> >
> > oh, Jens already did it.
> >
> > Begin forwarded message:
> >
> > Date: Thu, 2 Jul 2009 21:00:49 +0200
> > From: Jens Axboe <[email protected]>
> > To: Hannes Reinecke <[email protected]>
> > Cc: [email protected],
> > [email protected], [email protected]
> > Subject: Re: [PATCH] cciss: Ignore stale commands after reboot
> >
> >
> > On Thu, Jul 02 2009, Hannes Reinecke wrote:
> > >
> > > When doing an unexpected shutdown like kexec the cciss
> > firmware might
> > > still have some commands in flight, which it is trying to complete.
> > > The driver is doing it's best on resetting the HBA, but
> > sadly there's
> > > a firmware issue causing the firmware _not_ to abort or drop old
> > > commands.
> > > So the firmware will send us commands which we haven't
> > accounted for,
> > > causing the driver to panic.
> > >
> > > With this patch we're just ignoring these commands as there
> > is nothing
> > > we could be doing with them anyway.
> >
> > Looks good to me. Mike, Stephen?
>
> Sorry I haven't seen this before. The beardog addresses are no longer
> valid. We moved into a dungeon and into a new domain. The good folks
> in IT have yet to assign another IP address/domain name or an MX
> record for the mail servers. I hope that by next week that will be
> corrected. Until then all Steve and I have to use is some form of
> OutHouse mail client.
>
> Acked-by: Mike Miller <[email protected]>

OK, I'll add the patch with your ack.

--
Jens Axboe

2009-07-02 20:11:14

by Mike Miller

[permalink] [raw]
Subject: RE: Re: [PATCH] cciss: Ignore stale commands after reboot

Jens wrote:

> > Acked-by: Mike Miller <[email protected]>
>
> OK, I'll add the patch with your ack.
>
> --
> Jens Axboe

Thanks, Jens.

>
> -