Linus - will you please stop merging plain dangerous "lets pretend we
never have errors" patches. I've got proper fixes for megaraid to use
the new_eh if you want them, but merging stuff so that people don't even
notice the problem is *WRONG*
Alan
On Wed, 2002-10-30 at 19:11, Linux Kernel Mailing List wrote:
> ChangeSet 1.844.14.2, 2002/10/30 13:11:37-06:00, [email protected]
>
> megaraid: cleanups so it builds again
>
>
> # This patch includes the following deltas:
> # ChangeSet 1.844.14.1 -> 1.844.14.2
> # drivers/scsi/megaraid.c 1.25 -> 1.26
> # drivers/scsi/megaraid.h 1.11 -> 1.12
> #
>
> megaraid.c | 1 -
> megaraid.h | 2 --
> 2 files changed, 3 deletions(-)
>
>
> diff -Nru a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
> --- a/drivers/scsi/megaraid.c Wed Nov 6 22:03:41 2002
> +++ b/drivers/scsi/megaraid.c Wed Nov 6 22:03:41 2002
> @@ -4123,7 +4123,6 @@
> struct partition *p, *largest = NULL;
> int i, largest_cyl;
> int heads, cyls, sectors;
> - int capacity = capacity;
> unsigned char *buf;
>
> if (!(buf = scsi_bios_ptable(bdev)))
> diff -Nru a/drivers/scsi/megaraid.h b/drivers/scsi/megaraid.h
> --- a/drivers/scsi/megaraid.h Wed Nov 6 22:03:41 2002
> +++ b/drivers/scsi/megaraid.h Wed Nov 6 22:03:41 2002
> @@ -214,8 +214,6 @@
> info: megaraid_info, /* Driver Info Function */\
> command: megaraid_command, /* Command Function */\
> queuecommand: megaraid_queue, /* Queue Command Function */\
> - abort: megaraid_abort, /* Abort Command Function */\
> - reset: megaraid_reset, /* Reset Command Function */\
> bios_param: megaraid_biosparam, /* Disk BIOS Parameters */\
> can_queue: MAX_COMMANDS, /* Can Queue */\
> this_id: 7, /* HBA Target ID */\
> -
> To unsubscribe from this list: send the line "unsubscribe bk-commits-head" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Linus - will you please stop merging plain dangerous "lets pretend we
> never have errors" patches. I've got proper fixes for megaraid to use
> the new_eh if you want them, but merging stuff so that people
> don't even notice the problem is *WRONG*
That one's my mistake, it made to Linus via my sending patches at linux-scsi
and James. Alan, I saw your megaraid updates after I posted these to l-s,
and was in the process of merging the two, which I didn't finish on Friday.
The megaraid driver, especially in 2.5.x, has been suffering from neglect
from the official maintainer, LSI. Atul has returned from lengthy hiatus,
and claims to be ready to get megaraid back into working shape. I *really*
want Atul to get 2.4.x straightened out as a first priority this week, and
then tackle 2.5.x ASAP thereafter. Ideally for 2.5, after a short
stabilization period for the current 1.18 driver and in parallel a merging
of megaraid2, that the 1.18 driver could be dropped in favor of a cleaner
(and proper error-handling) megaraid 2.00 driver. There's even hope that
some megaraid cards can actually do board resets (not all can), and that
logic could be incorporated into the error handling routines. As it stands
now, they are no-ops once a command had been sent to the card.
-Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer, Architect
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
> Linus - will you please stop merging plain dangerous "lets pretend we
> never have errors" patches. I've got proper fixes for megaraid to use
> the new_eh if you want them, but merging stuff so that people don't
> even notice the problem is *WRONG*
This one came in through my scsi tree. How about we fix the issue in a
different way: I can add an option in SCSI to check for the new eh methods
and if it doesn't find any at all, panic the system saying that this driver
has no error handling but if you reboot with the scsi_no_error_handling option
it won't panic again?
James
On Sun, 2002-11-10 at 14:53, J.E.J. Bottomley wrote:
> > Linus - will you please stop merging plain dangerous "lets pretend we
> > never have errors" patches. I've got proper fixes for megaraid to use
> > the new_eh if you want them, but merging stuff so that people don't
> > even notice the problem is *WRONG*
>
> This one came in through my scsi tree. How about we fix the issue in a
> different way: I can add an option in SCSI to check for the new eh methods
> and if it doesn't find any at all, panic the system saying that this driver
> has no error handling but if you reboot with the scsi_no_error_handling option
> it won't panic again?
I dont think panic() is needed bug a loud printk and maybe an error
return from scsi_register() would do the trick. We do however have a
couple of drivers where "pray, the firmware does all the work" is the
right answer, but they really should be setting their own handler that
delays rather than aborting/resetting/kicking offline
[email protected] said:
> I dont think panic() is needed bug a loud printk and maybe an error
> return from scsi_register() would do the trick. We do however have a
> couple of drivers where "pray, the firmware does all the work" is the
> right answer, but they really should be setting their own handler that
> delays rather than aborting/resetting/kicking offline
How about this? It doesn't panic, just refuses to attach the driver (although
this will still eventually cause a panic if your root fs is on it).
James
===== hosts.c 1.23 vs edited =====
--- 1.23/drivers/scsi/hosts.c Tue Nov 5 11:18:22 2002
+++ edited/hosts.c Sun Nov 10 10:17:32 2002
@@ -51,6 +51,22 @@
static int scsi_host_next_hn; /* host_no for next new host */
static int scsi_hosts_registered; /* cnt of registered scsi hosts */
+static int scsi_ignore_no_error_handling = 0;
+
+#ifdef MODULE
+MODULE_PARM(scsi_ignore_no_error_handling, "i");
+#else
+static int __init
+scsi_ignore_no_error_handling_setup(char *str)
+{
+ scsi_ignore_no_error_handling = 1;
+
+ return 1;
+}
+
+__setup("scsi_ignore_no_error_handling=", scsi_ignore_no_error_handling_setup)
;
+#endif
+
/**
* scsi_tp_for_each_host - call function for each scsi host off a template
@@ -345,6 +361,25 @@
int gfp_mask;
DECLARE_MUTEX_LOCKED(sem);
+ /*
+ * Determine host number. Check reserved first before allocating
+ * new one
+ */
+ hname = (shost_tp->proc_name) ? shost_tp->proc_name : "";
+ hname_len = strlen(hname);
+
+ /* Check to see if this host has any error handling facilities */
+ if(shost_tp->eh_strategy_handler == NULL &&
+ shost_tp->eh_abort_handler == NULL &&
+ shost_tp->eh_device_reset_handler == NULL &&
+ shost_tp->eh_bus_reset_handler == NULL &&
+ shost_tp->eh_host_reset_handler == NULL) {
+ printk(KERN_ERR "ERROR: SCSI host `%s' has no error
handling\nERROR: This is not a safe way to run your SCSI host\nERROR: The
error handling must be added to this driver\n", hname);
+ if(!scsi_ignore_no_error_handling) {
+ printk(KERN_ERR "ERROR: not attaching this
host.\n\nERROR: to force attachment, boot with the
scsi_ignore_no_error_handling flag");
+ return NULL;
+ }
+ }
gfp_mask = GFP_KERNEL;
if (shost_tp->unchecked_isa_dma && xtr_bytes)
gfp_mask |= __GFP_DMA;
@@ -356,13 +391,6 @@
}
memset(shost, 0, sizeof(struct Scsi_Host) + xtr_bytes);
-
- /*
- * Determine host number. Check reserved first before allocating
- * new one
- */
- hname = (shost_tp->proc_name) ? shost_tp->proc_name : "";
- hname_len = strlen(hname);
if (hname_len)
list_for_each(lh, &scsi_host_hn_list) {
On Sun, 2002-11-10 at 16:32, J.E.J. Bottomley wrote:
> [email protected] said:
> > I dont think panic() is needed bug a loud printk and maybe an error
> > return from scsi_register() would do the trick. We do however have a
> > couple of drivers where "pray, the firmware does all the work" is the
> > right answer, but they really should be setting their own handler that
> > delays rather than aborting/resetting/kicking offline
>
> How about this? It doesn't panic, just refuses to attach the driver (although
> this will still eventually cause a panic if your root fs is on it).
Looks a good starting point so that people can use the drivers but do
get warned
On Sun, 10 Nov 2002, J.E.J. Bottomley wrote:
>
> How about this? It doesn't panic, just refuses to attach the driver (although
> this will still eventually cause a panic if your root fs is on it).
I think this is worse than the current state of art. We don't want to
screw with users who can't do anything about it, which is 99.9% of them.
TESTING is about as important as anything else, and inconveniencing users
is BAD BAD BAD.
Right now drivers with old EH handling will warn at compile time (except
when people explicitly disable it, like in megaraid, which is in the
process of getting fixed anyway). That's a lot better than irritating
users. Especially since this printk() will possibly have scrolled off the
screen by the time the "cannot load root" happens.
I've said this before, I'll say it again: anything that breaks _working_
is BAD. Don't do it. Don't make up new ways to screw with people who want
to test. Don't add features that have _no_ meaning except to irritate
people.
The compile-time warning is _plenty_ good enough.
Linus
[email protected] said:
> I've said this before, I'll say it again: anything that breaks
> _working_ is BAD. Don't do it. Don't make up new ways to screw with
> people who want to test. Don't add features that have _no_ meaning
> except to irritate people.
OK, what about the runtime warning with no requirement for a special flag to
enable attachment.
> The compile-time warning is _plenty_ good enough.
I don't necessarily agree. It's easy to miss in all the build noise (most
average users don't do make -s). And the warning isn't that fierce (it
complains about a prototype mismatch), so even if it's noticed, it might get
ignored. At least if we have a run time warning, it's in the logs for all to
see when a problem gets posted to any given mailing list.
James
On Sun, 10 Nov 2002, J.E.J. Bottomley wrote:
>
> OK, what about the runtime warning with no requirement for a special flag to
> enable attachment.
That certainly works for me - informational but not irritating. I just
suspect it will scroll past without being seen that much, though.
But along with a stack_dump() or something to make it a bit more
noticeable it might actually be visible.
> I don't necessarily agree. It's easy to miss in all the build noise (most
> average users don't do make -s). And the warning isn't that fierce (it
> complains about a prototype mismatch), so even if it's noticed, it might get
> ignored. At least if we have a run time warning, it's in the logs for all to
> see when a problem gets posted to any given mailing list.
I dunno - I personally think more people look at the compile output than
at the dmessages when they scroll past. But maybe it's just me (I don't
use -s, but I tend to do "make -j4 bzImage > ../makes", so the _only_
thing I see are warnings and errors).
Linus
[email protected] said:
> That certainly works for me - informational but not irritating. I just
> suspect it will scroll past without being seen that much, though.
Hopefully we have a winner...
> But along with a stack_dump() or something to make it a bit more
> noticeable it might actually be visible.
OK, I'll do it this way and put it into the next SCSI merge.
James
On Sun, Nov 10, 2002 at 02:04:27PM -0500, J.E.J. Bottomley wrote:
> I don't necessarily agree. It's easy to miss in all the build noise (most
> average users don't do make -s). And the warning isn't that fierce (it
> complains about a prototype mismatch), so even if it's noticed, it might get
> ignored.
I take this opportunity to advertise for the quiet facility of kbuild:
Sample run:
$> make KBUILD_VERBOSE=0
CC kernel/rcupdate.o
CC kernel/dma.o
kernel/dma.c:102: warning: `proc_dma_show' defined but not used
CC kernel/uid16.o
LD kernel/built-in.o
Warnings really hurts the eye when using this.
And the filename supplied should be good enough for emacs and the like.
I have seen only a few persons actually using this and I wonder if people
feel they loose some information when presented with this condensed format?
Compared to "make -s" you are able to follow the progress.
Sam
On Sun, 2002-11-10 at 19:04, J.E.J. Bottomley wrote:
> > The compile-time warning is _plenty_ good enough.
>
> I don't necessarily agree. It's easy to miss in all the build noise (most
Seconded - not only that people _keep_ removing the old entries without
fixing the bug - those bogon megaraid changes are far from unique