16-pci-error-recovery_header.patch
PCI Error Recovery: header file patch
Various PCI bus errors can be signaled by newer PCI controllers. Recovering
from those errors requires an infrastructure to notify affected device drivers
of the error, and a way of walking through a reset sequence. This patch adds
a set of callbacks to be used by error recovery routines to notify device
drivers of the various stages of recovery.
Signed-off-by: Linas Vepstas <[email protected]>
--
include/linux/pci.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 49 insertions(+)
Index: linux-2.6.14-git3/include/linux/pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/linux/pci.h 2005-11-02 14:29:18.856338553 -0600
+++ linux-2.6.14-git3/include/linux/pci.h 2005-11-02 14:34:32.272401512 -0600
@@ -78,6 +78,16 @@
#define PCI_UNKNOWN ((pci_power_t __force) 5)
#define PCI_POWER_ERROR ((pci_power_t __force) -1)
+/** The pci_channel state describes connectivity between the CPU and
+ * the pci device. If some PCI bus between here and the pci device
+ * has crashed or locked up, this info is reflected here.
+ */
+enum pci_channel_state {
+ pci_channel_io_normal = 0, /* I/O channel is in normal state */
+ pci_channel_io_frozen = 1, /* I/O to channel is blocked */
+ pci_channel_io_perm_failure, /* PCI card is dead */
+};
+
/*
* The pci_dev structure is used to describe PCI devices.
*/
@@ -110,6 +120,7 @@
this is D0-D3, D0 being fully functional,
and D3 being off. */
+ enum pci_channel_state error_state; /* current connectivity state */
struct device dev; /* Generic device interface */
/* device is compatible with these IDs */
@@ -232,6 +243,43 @@
unsigned int use_driver_data:1; /* pci_driver->driver_data is used */
};
+/* ---------------------------------------------------------------- */
+/** PCI error recovery infrastructure. If a PCI device driver provides
+ * a set fof callbacks in struct pci_error_handlers, then that device driver
+ * will be notified of PCI bus errors, and will be driven to recovery
+ * when an error occurs.
+ */
+
+enum pcierr_result {
+ PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */
+ PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
+ PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
+ PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
+ PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
+};
+
+/* PCI bus error event callbacks */
+struct pci_error_handlers
+{
+ /* PCI bus error detected on this device */
+ int (*error_detected)(struct pci_dev *dev,
+ enum pci_channel_state error);
+
+ /* MMIO has been re-enabled, but not DMA */
+ int (*mmio_enabled)(struct pci_dev *dev);
+
+ /* PCI Express link has been reset */
+ int (*link_reset)(struct pci_dev *dev);
+
+ /* PCI slot has been reset */
+ int (*slot_reset)(struct pci_dev *dev);
+
+ /* Device driver may resume normal operations */
+ void (*resume)(struct pci_dev *dev);
+};
+
+/* ---------------------------------------------------------------- */
+
struct module;
struct pci_driver {
struct list_head node;
@@ -245,6 +293,7 @@
int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
void (*shutdown) (struct pci_dev *dev);
+ struct pci_error_handlers *err_handler;
struct device_driver driver;
struct pci_dynids dynids;
};
On Thu, Nov 03, 2005 at 06:50:35PM -0600, Linas Vepstas wrote:
> +/* ---------------------------------------------------------------- */
> +/** PCI error recovery infrastructure. If a PCI device driver provides
> + * a set fof callbacks in struct pci_error_handlers, then that device driver
> + * will be notified of PCI bus errors, and will be driven to recovery
> + * when an error occurs.
> + */
> +
> +enum pcierr_result {
> + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */
> + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> +};
No, do not create new types of error or return codes. Use the standard
-EFOO values. You can document what they should each return, and mean,
but do not create new codes.
Also, you create an enum, but yet do not use it in your function
callback definition, which means you really didn't want to create it in
the first place...
I'll add 15 and 16 to my tree for now, so they will show up in -mm, but
I want to see updated versions before sending them off to Linus.
thanks,
greg k-h
Greg KH writes:
> > +enum pcierr_result {
> > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */
> > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> > +};
>
> No, do not create new types of error or return codes. Use the standard
> -EFOO values. You can document what they should each return, and mean,
> but do not create new codes.
Actually, these are not error or return codes, but rather requested
actions (maybe somewhat misnamed). We can map them on to -EFOO values
but it will be rather strained (-ECONNRESET for "please reset the
slot", anyone? :).
> Also, you create an enum, but yet do not use it in your function
> callback definition, which means you really didn't want to create it in
> the first place...
Yes, they could be #defines.
Paul.
On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark:
> Greg KH writes:
>
> > > +enum pcierr_result {
> > > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */
> > > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> > > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> > > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> > > +};
> >
> > No, do not create new types of error or return codes. Use the standard
> > -EFOO values. You can document what they should each return, and mean,
> > but do not create new codes.
>
> Actually, these are not error or return codes, but rather requested
> actions
Yes.
> (maybe somewhat misnamed).
As to naming, my mind went blank on coming up with a good name,
and the results was a poor name.
I now note that "EDAC" ("Error Detection ad Correction") is now taken.
How about "PECS" ("PCI Error Correction System") ?
I guess "PCI Error Detection And Recovery System" (PEDERAST) might
have an inappropriate set of connotations.
> We can map them on to -EFOO values
> but it will be rather strained (-ECONNRESET for "please reset the
> slot", anyone? :).
Yes, that would only lead to confusion.
> > Also, you create an enum, but yet do not use it in your function
> > callback definition, which means you really didn't want to create it in
> > the first place...
>
> Yes, they could be #defines.
In one incarnation, they were #defines. The enum was supposed to be
the return value of the error notification callbacks.
I can prepare a new patch: would you prefer:
1) lose typing: #defines and int return value?
2) strong typing: enum and enum return value?
I often prefer strong typing.
And do you want a patch now, or later?
--linas
On Mon, Nov 07, 2005 at 11:55:41AM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark:
> > Greg KH writes:
> >
> > > > +enum pcierr_result {
> > > > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */
> > > > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > > > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> > > > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> > > > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> > > > +};
> > >
> > > No, do not create new types of error or return codes. Use the standard
> > > -EFOO values. You can document what they should each return, and mean,
> > > but do not create new codes.
> >
> > Actually, these are not error or return codes, but rather requested
> > actions
>
> Yes.
Ok, then make them be stronger, and not return an int, as everyone will
get that wrong.
> In one incarnation, they were #defines. The enum was supposed to be
> the return value of the error notification callbacks.
>
> I can prepare a new patch: would you prefer:
>
> 1) lose typing: #defines and int return value?
>
> 2) strong typing: enum and enum return value?
3) realy strong typing that sparse can detect.
enums don't really work, as you can get away with using an integer and
the compiler will never complain. Please use a typedef (yeah, I said
typedef) in the way that sparse will catch any bad users of the code.
> I often prefer strong typing.
>
> And do you want a patch now, or later?
Depends on when you want to see this make it into mainline :)
thanks,
greg k-h
On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
>
> 3) realy strong typing that sparse can detect.
Am compiling now.
> enums don't really work, as you can get away with using an integer and
> the compiler will never complain. Please use a typedef (yeah, I said
> typedef) in the way that sparse will catch any bad users of the code.
How about typedef'ing structs?
I'm not to clear on what "sparse" can do; however, in the good old days,
gcc allowed you to commit great sins when passing "struct blah *" to
subroutines, whereas it stoped you cold if you tried the same trick
with a typedef'ed "blah_t *". This got me into the habit of turning
all structs into typedefs in my personal projects. Can we expect
something similar for the kernel, and in particular, should we start
typedefing structs now?
(Documentation/CodingStyle doesn't mention typedef at all).
--linas
On Mon, Nov 07, 2005 at 12:56:21PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> >
> > 3) realy strong typing that sparse can detect.
>
> Am compiling now.
>
> > enums don't really work, as you can get away with using an integer and
> > the compiler will never complain. Please use a typedef (yeah, I said
> > typedef) in the way that sparse will catch any bad users of the code.
>
> How about typedef'ing structs?
No. Use __bitwise. See the lkml archives for how to do this properly.
> I'm not to clear on what "sparse" can do; however, in the good old days,
> gcc allowed you to commit great sins when passing "struct blah *" to
> subroutines, whereas it stoped you cold if you tried the same trick
> with a typedef'ed "blah_t *". This got me into the habit of turning
> all structs into typedefs in my personal projects. Can we expect
> something similar for the kernel, and in particular, should we start
> typedefing structs now?
No, never typedef a struct. That's just wrong. gcc should warn you
just the same if you pass the wrong struct pointer (and all of your code
builds without warnings, right?)
> (Documentation/CodingStyle doesn't mention typedef at all).
If it does, it should say not to use it at all :)
Except for this case, it's special...
thanks,
greg k-h
On Mon, 7 Nov 2005, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> >
> > 3) realy strong typing that sparse can detect.
>
> Am compiling now.
>
> > enums don't really work, as you can get away with using an integer and
> > the compiler will never complain. Please use a typedef (yeah, I said
> > typedef) in the way that sparse will catch any bad users of the code.
>
> How about typedef'ing structs?
No no no. (I feel sure that you will get plenty of responses.)
> I'm not to clear on what "sparse" can do; however, in the good old days,
> gcc allowed you to commit great sins when passing "struct blah *" to
> subroutines, whereas it stoped you cold if you tried the same trick
> with a typedef'ed "blah_t *". This got me into the habit of turning
> all structs into typedefs in my personal projects. Can we expect
> something similar for the kernel, and in particular, should we start
> typedefing structs now?
No no no.
> (Documentation/CodingStyle doesn't mention typedef at all).
We can submit patches for that.
Basically (generally) we never want a struct to be typedef-ed.
(There may be a couple of exceptions to this.)
We do allow a very few basic types to be typedef-ed, as long as
the basic type (e.g., pid_t) is also a C language basic type or
the typedef is useful for strong type checking.
--
~Randy
On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
>
> > I'm not to clear on what "sparse" can do; however, in the good old days,
> > gcc allowed you to commit great sins when passing "struct blah *" to
> > subroutines, whereas it stoped you cold if you tried the same trick
> > with a typedef'ed "blah_t *". This got me into the habit of turning
> > all structs into typedefs in my personal projects. Can we expect
> > something similar for the kernel, and in particular, should we start
> > typedefing structs now?
>
> No, never typedef a struct. That's just wrong.
Its a defacto convention for most C-language apps, see, for
example Xlib, gtk and gnome. Also, "grep typedef include/linux/*"
shows that many kernel device drivers use this convention.
> gcc should warn you
> just the same if you pass the wrong struct pointer
There were many cases where it did not warn (I don't remember
the case of subr calls). I beleive this had to do with ANSI-C spec
issues dating to the 1990's; traditional C is weakly typed.
Its not just gcc; anyoe who coded for a while eventually discovered
that tyedefs where strongly typed, but "struct blah *" were not.
> (and all of your code
> builds without warnings, right?)
:-/ Yes, of course.
--linas
On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> 3) realy strong typing that sparse can detect.
PCI Error Recovery: header file patch
Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique,
less generic "PERS" "PCI Error Recovery System".
Greg, Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600
@@ -82,11 +82,11 @@
* the pci device. If some PCI bus between here and the pci device
* has crashed or locked up, this info is reflected here.
*/
-enum pci_channel_state {
+typedef enum {
pci_channel_io_normal = 0, /* I/O channel is in normal state */
pci_channel_io_frozen = 1, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
-};
+} pci_channel_state_t;
/*
* The pci_dev structure is used to describe PCI devices.
@@ -121,7 +121,7 @@
this is D0-D3, D0 being fully functional,
and D3 being off. */
- enum pci_channel_state error_state; /* current connectivity state */
+ pci_channel_state_t error_state; /* current connectivity state */
struct device dev; /* Generic device interface */
/* device is compatible with these IDs */
@@ -245,35 +245,35 @@
};
/* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure. If a PCI device driver provides
+/** PCI Error Recovery System (PERS). If a PCI device driver provides
* a set fof callbacks in struct pci_error_handlers, then that device driver
* will be notified of PCI bus errors, and will be driven to recovery
* when an error occurs.
*/
-enum pcierr_result {
- PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */
- PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
- PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
- PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
- PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
-};
+typedef enum {
+ PERS_RESULT_NONE = 0, /* no result/none/not supported in device driver */
+ PERS_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
+ PERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
+ PERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
+ PERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
+} pers_result_t;
/* PCI bus error event callbacks */
struct pci_error_handlers
{
/* PCI bus error detected on this device */
- int (*error_detected)(struct pci_dev *dev,
- enum pci_channel_state error);
+ pers_result_t (*error_detected)(struct pci_dev *dev,
+ pci_channel_state_t error);
/* MMIO has been re-enabled, but not DMA */
- int (*mmio_enabled)(struct pci_dev *dev);
+ pers_result_t (*mmio_enabled)(struct pci_dev *dev);
/* PCI Express link has been reset */
- int (*link_reset)(struct pci_dev *dev);
+ pers_result_t (*link_reset)(struct pci_dev *dev);
/* PCI slot has been reset */
- int (*slot_reset)(struct pci_dev *dev);
+ pers_result_t (*slot_reset)(struct pci_dev *dev);
/* Device driver may resume normal operations */
void (*resume)(struct pci_dev *dev);
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
>
>
> PCI Error Recovery: header file patch
>
> Change enums and subroutine signatures to be strongly typed, per recent
> discussion with GregKH. Also, change the acronym to the more unique,
> less generic "PERS" "PCI Error Recovery System".
>
> Greg, Please apply.
>
> Signed-off-by: Linas Vepstas <[email protected]>
>
> --
> Index: linux-2.6.14-mm1/include/linux/pci.h
> ===================================================================
> --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600
> +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600
> @@ -82,11 +82,11 @@
> * the pci device. If some PCI bus between here and the pci device
> * has crashed or locked up, this info is reflected here.
> */
> -enum pci_channel_state {
> +typedef enum {
> pci_channel_io_normal = 0, /* I/O channel is in normal state */
> pci_channel_io_frozen = 1, /* I/O to channel is blocked */
> pci_channel_io_perm_failure, /* PCI card is dead */
> -};
> +} pci_channel_state_t;
this is not strongly typed, just a completely useless typedef.
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
>
>
> PCI Error Recovery: header file patch
>
> Change enums and subroutine signatures to be strongly typed, per recent
> discussion with GregKH. Also, change the acronym to the more unique,
> less generic "PERS" "PCI Error Recovery System".
>
> Greg, Please apply.
>
> Signed-off-by: Linas Vepstas <[email protected]>
>
> --
> Index: linux-2.6.14-mm1/include/linux/pci.h
> ===================================================================
> --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600
> +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600
> @@ -82,11 +82,11 @@
> * the pci device. If some PCI bus between here and the pci device
> * has crashed or locked up, this info is reflected here.
> */
> -enum pci_channel_state {
> +typedef enum {
> pci_channel_io_normal = 0, /* I/O channel is in normal state */
> pci_channel_io_frozen = 1, /* I/O to channel is blocked */
> pci_channel_io_perm_failure, /* PCI card is dead */
> -};
> +} pci_channel_state_t;
No, this doesn't help out at all. Please go look at the __bitwise
documentation.
Good luck,
greg k-h
On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> >
> > > I'm not to clear on what "sparse" can do; however, in the good old days,
> > > gcc allowed you to commit great sins when passing "struct blah *" to
> > > subroutines, whereas it stoped you cold if you tried the same trick
> > > with a typedef'ed "blah_t *". This got me into the habit of turning
> > > all structs into typedefs in my personal projects. Can we expect
> > > something similar for the kernel, and in particular, should we start
> > > typedefing structs now?
> >
> > No, never typedef a struct. That's just wrong.
>
> Its a defacto convention for most C-language apps, see, for
> example Xlib, gtk and gnome.
The kernel is not those projects.
> Also, "grep typedef include/linux/*" shows that many kernel device
> drivers use this convention.
They are wrong and should be fixed.
See my old OLS paper on all about the problems of using typedefs in
kernel code.
> > gcc should warn you
> > just the same if you pass the wrong struct pointer
>
> There were many cases where it did not warn (I don't remember
> the case of subr calls). I beleive this had to do with ANSI-C spec
> issues dating to the 1990's; traditional C is weakly typed.
>
> Its not just gcc; anyoe who coded for a while eventually discovered
> that tyedefs where strongly typed, but "struct blah *" were not.
Sorry, but you are using a broken compiler if it doesn't complain about
this.
thanks,
greg k-h
On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> > >
> > > No, never typedef a struct. That's just wrong.
> >
> > Its a defacto convention for most C-language apps, see, for
> > example Xlib, gtk and gnome.
>
> The kernel is not those projects.
!!
> > Also, "grep typedef include/linux/*" shows that many kernel device
> > drivers use this convention.
>
> They are wrong and should be fixed.
What, precisely, is wrong?
> See my old OLS paper on all about the problems of using typedefs in
> kernel code.
Is this on the web somewhere? Google is having trouble finding it.
I understand that old code bases often choke on typedefs;
forward declarations are a big problem. Not to be rude,
but choking for forward decl's is often a symptom of
poorly-designed code.
> > > gcc should warn you
> > > just the same if you pass the wrong struct pointer
> >
> > There were many cases where it did not warn (I don't remember
> > the case of subr calls). I beleive this had to do with ANSI-C spec
> > issues dating to the 1990's; traditional C is weakly typed.
> >
> > Its not just gcc; anyoe who coded for a while eventually discovered
> > that tyedefs where strongly typed, but "struct blah *" were not.
>
> Sorry, but you are using a broken compiler if it doesn't complain about
> this.
Uhh, gcc?
Maybe I've just got more mileage under my wheels. Of all of the
compilers I've used, gcc has always had the strictest checking,
and was the most verbose about warnings. There's a trick that pros
use when they inherit crufty old code: run it through gcc first, and
clean it up, even if the project requires using some other compiler.
I was simply stating a fact about gcc and about standard ANSI-C
type-checking that is "well known" to anyone who's been around the
block. I was not trying to start an argument.
--linas
On Mon, Nov 07, 2005 at 02:41:36PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark:
> > On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> > > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> > > >
> > > > No, never typedef a struct. That's just wrong.
> > >
> > > Its a defacto convention for most C-language apps, see, for
> > > example Xlib, gtk and gnome.
> >
> > The kernel is not those projects.
>
> !!
Yeah, anyone who thinks that Xlib is the paradigm for coding style...
> > > Also, "grep typedef include/linux/*" shows that many kernel device
> > > drivers use this convention.
> >
> > They are wrong and should be fixed.
>
> What, precisely, is wrong?
>
> > See my old OLS paper on all about the problems of using typedefs in
> > kernel code.
>
> Is this on the web somewhere? Google is having trouble finding it.
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_paper/codingstyle.ps
and the presentation is at:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
> > > > gcc should warn you
> > > > just the same if you pass the wrong struct pointer
> > >
> > > There were many cases where it did not warn (I don't remember
> > > the case of subr calls). I beleive this had to do with ANSI-C spec
> > > issues dating to the 1990's; traditional C is weakly typed.
> > >
> > > Its not just gcc; anyoe who coded for a while eventually discovered
> > > that tyedefs where strongly typed, but "struct blah *" were not.
> >
> > Sorry, but you are using a broken compiler if it doesn't complain about
> > this.
>
> Uhh, gcc?
Try it in the kernel today. You will get a warning if you pass in a
pointer to a different structure type than it was defined as.
> I was simply stating a fact about gcc and about standard ANSI-C
> type-checking that is "well known" to anyone who's been around the
> block. I was not trying to start an argument.
Then let's end it here...
thanks,
greg k-h
On Mon, Nov 07, 2005 at 12:03:52PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > > 3) realy strong typing that sparse can detect.
> Please go look at the __bitwise documentation.
PCI Error Recovery: header file patch
Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique,
less generic "PERS" "PCI Error Recovery System".
Greg, Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 14:56:04.917367579 -0600
@@ -82,10 +82,12 @@
* the pci device. If some PCI bus between here and the pci device
* has crashed or locked up, this info is reflected here.
*/
+typedef int __bitwise pci_channel_state_t;
+
enum pci_channel_state {
- pci_channel_io_normal = 0, /* I/O channel is in normal state */
- pci_channel_io_frozen = 1, /* I/O to channel is blocked */
- pci_channel_io_perm_failure, /* PCI card is dead */
+ pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */
+ pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */
+ pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */
};
/*
@@ -121,7 +123,7 @@
this is D0-D3, D0 being fully functional,
and D3 being off. */
- enum pci_channel_state error_state; /* current connectivity state */
+ pci_channel_state_t error_state; /* current connectivity state */
struct device dev; /* Generic device interface */
/* device is compatible with these IDs */
@@ -245,35 +247,37 @@
};
/* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure. If a PCI device driver provides
+/** PCI Error Recovery System (PERS). If a PCI device driver provides
* a set fof callbacks in struct pci_error_handlers, then that device driver
* will be notified of PCI bus errors, and will be driven to recovery
* when an error occurs.
*/
-enum pcierr_result {
- PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */
- PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
- PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
- PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
- PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
+typedef int __bitwise pers_result_t;
+
+enum pers_result {
+ PERS_RESULT_NONE = (__force pers_result_t) 0, /* no result/none/not supported in device driver */
+ PERS_RESULT_CAN_RECOVER = (__force pers_result_t) 1, /* Device driver can recover without slot reset */
+ PERS_RESULT_NEED_RESET = (__force pers_result_t) 2, /* Device driver wants slot to be reset. */
+ PERS_RESULT_DISCONNECT = (__force pers_result_t) 3, /* Device has completely failed, is unrecoverable */
+ PERS_RESULT_RECOVERED = (__force pers_result_t) 4, /* Device driver is fully recovered and operational */
};
/* PCI bus error event callbacks */
struct pci_error_handlers
{
/* PCI bus error detected on this device */
- int (*error_detected)(struct pci_dev *dev,
- enum pci_channel_state error);
+ pers_result_t (*error_detected)(struct pci_dev *dev,
+ pci_channel_state_t error);
/* MMIO has been re-enabled, but not DMA */
- int (*mmio_enabled)(struct pci_dev *dev);
+ pers_result_t (*mmio_enabled)(struct pci_dev *dev);
/* PCI Express link has been reset */
- int (*link_reset)(struct pci_dev *dev);
+ pers_result_t (*link_reset)(struct pci_dev *dev);
/* PCI slot has been reset */
- int (*slot_reset)(struct pci_dev *dev);
+ pers_result_t (*slot_reset)(struct pci_dev *dev);
/* Device driver may resume normal operations */
void (*resume)(struct pci_dev *dev);
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
Various PCI bus errors can be signaled by newer PCI controllers. This
patch adds the PCI error recovery callbacks to the IPR SCSI device driver.
The patch has been tested, and appears to work well.
Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
Signed-off-by: Brian King <[email protected]>
--
Index: linux-2.6.14-mm1/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c 2005-11-07 13:55:27.986920072 -0600
+++ linux-2.6.14-mm1/drivers/scsi/ipr.c 2005-11-07 15:02:00.639392946 -0600
@@ -5328,6 +5328,94 @@
shutdown_type);
}
+/* --------------- PCI Error Recovery infrastructure ----------- */
+/** If the PCI slot is frozen, hold off all i/o
+ * activity; then, as soon as the slot is available again,
+ * initiate an adapter reset.
+ */
+static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd)
+{
+ /* Disallow new interrupts, avoid loop */
+ ipr_cmd->ioa_cfg->allow_interrupts = 0;
+ list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q);
+ ipr_cmd->done = ipr_reset_ioa_job;
+ return IPR_RC_JOB_RETURN;
+}
+
+/** ipr_eeh_frozen -- called when slot has experience PCI bus error.
+ * This routine is called to tell us that the PCI bus is down.
+ * Can't do anything here, except put the device driver into a
+ * holding pattern, waiting for the PCI bus to come back.
+ */
+static void ipr_eeh_frozen (struct pci_dev *pdev)
+{
+ unsigned long flags = 0;
+ struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+ spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+ _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE);
+ spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+/** ipr_eeh_slot_reset - called when pci slot has been reset.
+ *
+ * This routine is called by the pci error recovery recovery
+ * code after the PCI slot has been reset, just before we
+ * should resume normal operations.
+ */
+static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
+{
+ unsigned long flags = 0;
+ struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+ // pci_enable_device(pdev);
+ // pci_set_master(pdev);
+ spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+ _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space,
+ IPR_SHUTDOWN_NONE);
+ spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+
+ return PERS_RESULT_RECOVERED;
+}
+
+/** This routine is called when the PCI bus has permanently
+ * failed. This routine should purge all pending I/O and
+ * shut down the device driver (close and unload).
+ */
+static void ipr_eeh_perm_failure(struct pci_dev *pdev)
+{
+ unsigned long flags = 0;
+ struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+ spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+ if (ioa_cfg->sdt_state == WAIT_FOR_DUMP)
+ ioa_cfg->sdt_state = ABORT_DUMP;
+ ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES;
+ ioa_cfg->in_ioa_bringdown = 1;
+ ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE);
+ spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+static pers_result_t ipr_eeh_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ switch (state) {
+ case pci_channel_io_frozen:
+ ipr_eeh_frozen (pdev);
+ return PERS_RESULT_NEED_RESET;
+
+ case pci_channel_io_perm_failure:
+ ipr_eeh_perm_failure (pdev);
+ return PERS_RESULT_DISCONNECT;
+ break;
+ default:
+ break;
+ }
+ return PERS_RESULT_NEED_RESET;
+}
+
+/* ------------- end of PCI Error Recovery suport ----------- */
+
/**
* ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..)
* @ioa_cfg: ioa cfg struct
@@ -6065,12 +6153,18 @@
};
MODULE_DEVICE_TABLE(pci, ipr_pci_table);
+static struct pci_error_handlers ipr_err_handler = {
+ .error_detected = ipr_eeh_error_detected,
+ .slot_reset = ipr_eeh_slot_reset,
+};
+
static struct pci_driver ipr_driver = {
.name = IPR_NAME,
.id_table = ipr_pci_table,
.probe = ipr_probe,
.remove = ipr_remove,
.shutdown = ipr_shutdown,
+ .err_handler = &ipr_err_handler,
};
/**
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
Various PCI bus errors can be signaled by newer PCI controllers. This
patch adds the PCI error recovery callbacks to the Symbios SCSI device driver.
The patch has been tested, and appears to work well.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 13:55:26.839081234 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:08.152337375 -0600
@@ -686,6 +686,10 @@
if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("[");
+ /* Avoid spinloop trying to handle interrupts on frozen device */
+ if (np->s.io_state != pci_channel_io_normal)
+ return IRQ_HANDLED;
+
spin_lock_irqsave(np->s.host->host_lock, flags);
sym_interrupt(np);
spin_unlock_irqrestore(np->s.host->host_lock, flags);
@@ -759,6 +763,25 @@
*/
static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
+static void sym_eeh_timeout(u_long p)
+{
+ struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
+ if (!ep)
+ return;
+ complete(&ep->done);
+}
+
+static void sym_eeh_done(struct sym_eh_wait *ep)
+{
+ if (!ep)
+ return;
+ ep->timed_out = 0;
+ if (!del_timer(&ep->timer))
+ return;
+
+ complete(&ep->done);
+}
+
/*
* Generic method for our eh processing.
* The 'op' argument tells what we have to do.
@@ -799,6 +822,35 @@
/* Try to proceed the operation we have been asked for */
sts = -1;
+
+ /* We may be in an error condition because the PCI bus
+ * went down. In this case, we need to wait until the
+ * PCI bus is reset, the card is reset, and only then
+ * proceed with the scsi error recovery. We'll wait
+ * for 15 seconds for this to happen.
+ */
+#define WAIT_FOR_PCI_RECOVERY 15
+ if (np->s.io_state != pci_channel_io_normal) {
+ struct sym_eh_wait eeh, *eep = &eeh;
+ np->s.io_reset_wait = eep;
+ init_completion(&eep->done);
+ init_timer(&eep->timer);
+ eep->to_do = SYM_EH_DO_WAIT;
+ eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ);
+ eep->timer.function = sym_eeh_timeout;
+ eep->timer.data = (u_long)eep;
+ eep->timed_out = 1; /* Be pessimistic for once :) */
+ add_timer(&eep->timer);
+ spin_unlock_irq(np->s.host->host_lock);
+ wait_for_completion(&eep->done);
+ spin_lock_irq(np->s.host->host_lock);
+ if (eep->timed_out) {
+ printk (KERN_ERR "%s: Timed out waiting for PCI reset\n",
+ sym_name(np));
+ }
+ np->s.io_reset_wait = NULL;
+ }
+
switch(op) {
case SYM_EH_ABORT:
sts = sym_abort_scsiio(np, cmd, 1);
@@ -1584,6 +1636,8 @@
np->maxoffs = dev->chip.offset_max;
np->maxburst = dev->chip.burst_max;
np->myaddr = dev->host_id;
+ np->s.io_state = pci_channel_io_normal;
+ np->s.io_reset_wait = NULL;
/*
* Edit its name.
@@ -1916,6 +1970,58 @@
return 1;
}
+/* ------------- PCI Error Recovery infrastructure -------------- */
+/** sym2_io_error_detected() is called when PCI error is detected */
+static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct sym_hcb *np = pci_get_drvdata(pdev);
+
+ np->s.io_state = state;
+ // XXX If slot is permanently frozen, then what?
+ // Should we scsi_remove_host() maybe ??
+
+ /* Request a slot slot reset. */
+ return PERS_RESULT_NEED_RESET;
+}
+
+/** sym2_io_slot_reset is called when the pci bus has been reset.
+ * Restart the card from scratch. */
+static pers_result_t sym2_io_slot_reset (struct pci_dev *pdev)
+{
+ struct sym_hcb *np = pci_get_drvdata(pdev);
+
+ printk (KERN_INFO "%s: recovering from a PCI slot reset\n",
+ sym_name(np));
+
+ if (pci_enable_device(pdev))
+ printk (KERN_ERR "%s: device setup failed most egregiously\n",
+ sym_name(np));
+
+ pci_set_master(pdev);
+ enable_irq (pdev->irq);
+
+ /* Perform host reset only on one instance of the card */
+ if (0 == PCI_FUNC (pdev->devfn))
+ sym_reset_scsi_bus(np, 0);
+
+ return PERS_RESULT_RECOVERED;
+}
+
+/** sym2_io_resume is called when the error recovery driver
+ * tells us that its OK to resume normal operation.
+ */
+static void sym2_io_resume (struct pci_dev *pdev)
+{
+ struct sym_hcb *np = pci_get_drvdata(pdev);
+
+ /* Perform device startup only once for this card. */
+ if (0 == PCI_FUNC (pdev->devfn))
+ sym_start_up (np, 1);
+
+ np->s.io_state = pci_channel_io_normal;
+ sym_eeh_done (np->s.io_reset_wait);
+}
+
/*
* Driver host template.
*/
@@ -2169,11 +2275,18 @@
MODULE_DEVICE_TABLE(pci, sym2_id_table);
+static struct pci_error_handlers sym2_err_handler = {
+ .error_detected = sym2_io_error_detected,
+ .slot_reset = sym2_io_slot_reset,
+ .resume = sym2_io_resume,
+};
+
static struct pci_driver sym2_driver = {
.name = NAME53C8XX,
.id_table = sym2_id_table,
.probe = sym2_probe,
.remove = __devexit_p(sym2_remove),
+ .err_handler = &sym2_err_handler,
};
static int __init sym2_init(void)
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-07 13:55:26.839081234 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-07 15:02:08.154337094 -0600
@@ -181,6 +181,10 @@
char chip_name[8];
struct pci_dev *device;
+ /* pci bus i/o state; waiter for clearing of i/o state */
+ pci_channel_state_t io_state;
+ struct sym_eh_wait *io_reset_wait;
+
struct Scsi_Host *host;
void __iomem * ioaddr; /* MMIO kernel io address */
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 13:55:26.840081093 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 15:02:08.162335970 -0600
@@ -2810,6 +2810,7 @@
u_char istat, istatc;
u_char dstat;
u_short sist;
+ u_int icnt;
/*
* interrupt on the fly ?
@@ -2851,6 +2852,7 @@
sist = 0;
dstat = 0;
istatc = istat;
+ icnt = 0;
do {
if (istatc & SIP)
sist |= INW(np, nc_sist);
@@ -2858,6 +2860,19 @@
dstat |= INB(np, nc_dstat);
istatc = INB(np, nc_istat);
istat |= istatc;
+
+ /* Prevent deadlock waiting on a condition that may never clear. */
+ /* XXX this is a temporary kludge; the correct to detect
+ * a PCI bus error would be to use the io_check interfaces
+ * proposed by Hidetoshi Seto <[email protected]>
+ * Problem with polling like that is the state flag might not
+ * be set.
+ */
+ icnt ++;
+ if (100 < icnt) {
+ if (np->s.device->error_state != pci_channel_io_normal)
+ return;
+ }
} while (istatc & (SIP|DIP));
if (DEBUG_FLAGS & DEBUG_TINY)
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
Various PCI bus errors can be signaled by newer PCI controllers. This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver. The patch has been tested, and appears to work well.
Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/drivers/net/e100.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e100.c 2005-11-07 13:55:26.363148057 -0600
+++ linux-2.6.14-mm1/drivers/net/e100.c 2005-11-07 15:02:11.120920287 -0600
@@ -2465,6 +2465,75 @@
}
+/* ------------------ PCI Error Recovery infrastructure -------------- */
+/** e100_io_error_detected() is called when PCI error is detected */
+static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+
+ /* Same as calling e100_down(netdev_priv(netdev)), but generic */
+ netdev->stop(netdev);
+
+ /* Is a detach needed ?? */
+ // netif_device_detach(netdev);
+
+ /* Request a slot reset. */
+ return PERS_RESULT_NEED_RESET;
+}
+
+/** e100_io_slot_reset is called after the pci bus has been reset.
+ * Restart the card from scratch. */
+static pers_result_t e100_io_slot_reset(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct nic *nic = netdev_priv(netdev);
+
+ if(pci_enable_device(pdev)) {
+ printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+ return PERS_RESULT_DISCONNECT;
+ }
+ pci_set_master(pdev);
+
+ /* Only one device per card can do a reset */
+ if (0 != PCI_FUNC (pdev->devfn))
+ return PERS_RESULT_RECOVERED;
+
+ e100_hw_reset(nic);
+ e100_phy_init(nic);
+
+ if(e100_hw_init(nic)) {
+ DPRINTK(HW, ERR, "e100_hw_init failed\n");
+ return PERS_RESULT_DISCONNECT;
+ }
+
+ return PERS_RESULT_RECOVERED;
+}
+
+/** e100_io_resume is called when the error recovery driver
+ * tells us that its OK to resume normal operation.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct nic *nic = netdev_priv(netdev);
+
+ /* ack any pending wake events, disable PME */
+ pci_enable_wake(pdev, 0, 0);
+
+ netif_device_attach(netdev);
+ if(netif_running(netdev)) {
+ e100_open (netdev);
+ mod_timer(&nic->watchdog, jiffies);
+ }
+}
+
+static struct pci_error_handlers e100_err_handler = {
+ .error_detected = e100_io_error_detected,
+ .slot_reset = e100_io_slot_reset,
+ .resume = e100_io_resume,
+};
+
+
static struct pci_driver e100_driver = {
.name = DRV_NAME,
.id_table = e100_id_table,
@@ -2475,6 +2544,7 @@
.resume = e100_resume,
#endif
.shutdown = e100_shutdown,
+ .err_handler = &e100_err_handler,
};
static int __init e100_init_module(void)
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
Various PCI bus errors can be signaled by newer PCI controllers. This
patch adds the PCI error recovery callbacks to the intel gigabit
ethernet e1000 device driver. The patch has been tested, and appears
to work well.
Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c 2005-11-07 13:55:25.948206317 -0600
+++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:12.811682734 -0600
@@ -206,6 +206,16 @@
void e1000_rx_schedule(void *data);
#endif
+static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state);
+static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+ .error_detected = e1000_io_error_detected,
+ .slot_reset = e1000_io_slot_reset,
+ .resume = e1000_io_resume,
+};
+
/* Exported from other modules */
extern void e1000_check_options(struct e1000_adapter *adapter);
@@ -218,8 +228,9 @@
/* Power Managment Hooks */
#ifdef CONFIG_PM
.suspend = e1000_suspend,
- .resume = e1000_resume
+ .resume = e1000_resume,
#endif
+ .err_handler = &e1000_err_handler,
};
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
@@ -2938,6 +2949,10 @@
#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
+ /* Prevent stats update while adapter is being reset */
+ if (adapter->link_speed == 0)
+ return;
+
spin_lock_irqsave(&adapter->stats_lock, flags);
/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4359,4 +4374,88 @@
}
#endif
+/* --------------- PCI Error Recovery infrastructure ------------ */
+/** e1000_io_error_detected() is called when PCI error is detected */
+static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct e1000_adapter *adapter = netdev->priv;
+
+ if (netif_running(netdev))
+ e1000_down(adapter);
+
+ /* Request a slot slot reset. */
+ return PERS_RESULT_NEED_RESET;
+}
+
+/** e1000_io_slot_reset is called after the pci bus has been reset.
+ * Restart the card from scratch.
+ * Implementation resembles the first-half of the
+ * e1000_resume routine.
+ */
+static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct e1000_adapter *adapter = netdev->priv;
+
+ if (pci_enable_device(pdev)) {
+ printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n");
+ return PERS_RESULT_DISCONNECT;
+ }
+ pci_set_master(pdev);
+
+ pci_enable_wake(pdev, 3, 0);
+ pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+ /* Perform card reset only on one instance of the card */
+ if(0 != PCI_FUNC (pdev->devfn))
+ return PERS_RESULT_RECOVERED;
+
+ e1000_reset(adapter);
+ E1000_WRITE_REG(&adapter->hw, WUS, ~0);
+
+ return PERS_RESULT_RECOVERED;
+}
+
+/** e1000_io_resume is called when the error recovery driver
+ * tells us that its OK to resume normal operation.
+ * Implementation resembles the second-half of the
+ * e1000_resume routine.
+ */
+static void e1000_io_resume(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct e1000_adapter *adapter = netdev->priv;
+ uint32_t manc, swsm;
+
+ if(netif_running(netdev)) {
+ if (e1000_up(adapter)) {
+ printk("e1000: can't bring device back up after reset\n");
+ return;
+ }
+ }
+
+ netif_device_attach(netdev);
+
+ if(adapter->hw.mac_type >= e1000_82540 &&
+ adapter->hw.media_type == e1000_media_type_copper) {
+ manc = E1000_READ_REG(&adapter->hw, MANC);
+ manc &= ~(E1000_MANC_ARP_EN);
+ E1000_WRITE_REG(&adapter->hw, MANC, manc);
+ }
+
+ switch(adapter->hw.mac_type) {
+ case e1000_82573:
+ swsm = E1000_READ_REG(&adapter->hw, SWSM);
+ E1000_WRITE_REG(&adapter->hw, SWSM,
+ swsm | E1000_SWSM_DRV_LOAD);
+ break;
+ default:
+ break;
+ }
+
+ if(netif_running(netdev))
+ mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
/* e1000_main.c */
On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> +typedef int __bitwise pci_channel_state_t;
Closer but...
> enum pci_channel_state {
> - pci_channel_io_normal = 0, /* I/O channel is in normal state */
> - pci_channel_io_frozen = 1, /* I/O to channel is blocked */
> - pci_channel_io_perm_failure, /* PCI card is dead */
> + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */
> + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */
> + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */
> };
You don't have to use an enum anymore, just use a #define.
Sparse developers, I see code in the kernel that that does both
(__force foo_t) and (foo_t __force). Which one is correct?
> +typedef int __bitwise pers_result_t;
Ugh, I don't like that name, but I can't think of anything better right
now. You should at least keep "pci" at the beginning to make it make
more sense to people looking at it for the first time.
thanks,
greg k-h
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
Replace-Subject: PCI Error Recovery: ixgb network device driver
Various PCI bus errors can be signaled by newer PCI controllers. This
patch adds the PCI error recovery callbacks to the intel ten-gigabit
ethernet ixgb device driver. The patch has been tested, and appears
to work well.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 13:55:25.431278896 -0600
+++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:14.779406268 -0600
@@ -132,6 +132,16 @@
static void ixgb_netpoll(struct net_device *dev);
#endif
+static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state);
+static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
+static void ixgb_io_resume (struct pci_dev *pdev);
+
+static struct pci_error_handlers ixgb_err_handler = {
+ .error_detected = ixgb_io_error_detected,
+ .slot_reset = ixgb_io_slot_reset,
+ .resume = ixgb_io_resume,
+};
+
/* Exported from other modules */
extern void ixgb_check_options(struct ixgb_adapter *adapter);
@@ -141,6 +151,8 @@
.id_table = ixgb_pci_tbl,
.probe = ixgb_probe,
.remove = __devexit_p(ixgb_remove),
+ .err_handler = &ixgb_err_handler,
+
};
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
@@ -1654,8 +1666,16 @@
unsigned int i;
#endif
+#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY
+ if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) {
+ if (eeh_slot_is_isolated (adapter->pdev))
+ // disable_irq_nosync (adapter->pdev->irq);
+ return IRQ_NONE; /* Not our interrupt */
+ }
+#else
if(unlikely(!icr))
return IRQ_NONE; /* Not our interrupt */
+#endif /* CONFIG_IXGB_EEH_RECOVERY */
if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) {
mod_timer(&adapter->watchdog_timer, jiffies);
@@ -2125,4 +2145,70 @@
}
#endif
+/* -------------- PCI Error Recovery infrastructure ---------------- */
+/** ixgb_io_error_detected() is called when PCI error is detected */
+static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ if(netif_running(netdev))
+ ixgb_down(adapter, TRUE);
+
+ /* Request a slot reset. */
+ return PERS_RESULT_NEED_RESET;
+}
+
+/** ixgb_io_slot_reset is called after the pci bus has been reset.
+ * Restart the card from scratch.
+ * Implementation resembles the first-half of the
+ * ixgb_resume routine.
+ */
+static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ if(pci_enable_device(pdev)) {
+ printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n");
+ return PERS_RESULT_DISCONNECT;
+ }
+ pci_set_master(pdev);
+
+ /* Perform card reset only on one instance of the card */
+ if (0 != PCI_FUNC (pdev->devfn))
+ return PERS_RESULT_RECOVERED;
+
+ ixgb_reset(adapter);
+
+ return PERS_RESULT_RECOVERED;
+}
+
+/** ixgb_io_resume is called when the error recovery driver
+ * tells us that its OK to resume normal operation.
+ * Implementation resembles the second-half of the
+ * ixgb_resume routine.
+ */
+static void ixgb_io_resume (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ if(netif_running(netdev)) {
+ if(ixgb_up(adapter)) {
+ printk ("ixgb: can't bring device back up after reset\n");
+ return;
+ }
+ }
+
+ netif_device_attach(netdev);
+ if(netif_running(netdev))
+ mod_timer(&adapter->watchdog_timer, jiffies);
+
+ /* Reading all-ff's from the adapter will completely hose
+ * the counts and statistics. So just clear them out */
+ memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
+ ixgb_update_stats(adapter);
+}
+
/* ixgb_main.c */
On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the
various device drivers. This patch is "optional" in that its a little bit
messy, but it does solve a little problem.
-- The good news: this gives some users (e.g. embeddd systems) the option
of not compiling in this code, thus making thier device drivers a tiny
bit smaller.
-- The bad news: This also clutters up the drivers with extraneous markup
and the config process with yet another config.
Please apply if you agree with the need for this patch :)
Signed-off-by: Linas Vepstas <[email protected]>
Index: linux-2.6.14-mm1/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c 2005-11-07 15:02:00.639392946 -0600
+++ linux-2.6.14-mm1/drivers/scsi/ipr.c 2005-11-07 15:02:20.029668601 -0600
@@ -5329,6 +5329,8 @@
}
/* --------------- PCI Error Recovery infrastructure ----------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
+
/** If the PCI slot is frozen, hold off all i/o
* activity; then, as soon as the slot is available again,
* initiate an adapter reset.
@@ -5414,6 +5416,7 @@
return PERS_RESULT_NEED_RESET;
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/* ------------- end of PCI Error Recovery suport ----------- */
/**
@@ -6153,10 +6156,12 @@
};
MODULE_DEVICE_TABLE(pci, ipr_pci_table);
+#ifdef CONFIG_PCI_ERR_RECOVERY
static struct pci_error_handlers ipr_err_handler = {
.error_detected = ipr_eeh_error_detected,
.slot_reset = ipr_eeh_slot_reset,
};
+#endif /* CONFIG_PCI_ERR_RECOVERY */
static struct pci_driver ipr_driver = {
.name = IPR_NAME,
@@ -6164,7 +6169,9 @@
.probe = ipr_probe,
.remove = ipr_remove,
.shutdown = ipr_shutdown,
+#ifdef CONFIG_PCI_ERR_RECOVERY
.err_handler = &ipr_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
};
/**
Index: linux-2.6.14-mm1/drivers/pci/Kconfig
===================================================================
--- linux-2.6.14-mm1.orig/drivers/pci/Kconfig 2005-11-07 13:55:23.869498177 -0600
+++ linux-2.6.14-mm1/drivers/pci/Kconfig 2005-11-07 15:02:20.030668460 -0600
@@ -13,6 +13,21 @@
If you don't know what to do here, say N.
+config PCI_ERR_RECOVERY
+ bool "PCI Error Recovery support"
+ depends on PCI
+ depends on PPC_PSERIES
+ default y
+ help
+ PCI Error Recovery is a mechanism by which crashed/hung
+ PCI adapters are automatically detected and rebooted without
+ otherwise disturbing the operation of the system. Support
+ for this recovery requires special PCI bridge chips (some
+ PCI-E chips may have this support) as well as support in
+ the device drivers (not all device drivers can handle this).
+
+ When in doubt, say Y.
+
config PCI_LEGACY_PROC
bool "Legacy /proc/pci interface"
depends on PCI
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:08.152337375 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:20.034667898 -0600
@@ -763,6 +763,7 @@
*/
static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
+#ifdef CONFIG_PCI_ERR_RECOVERY
static void sym_eeh_timeout(u_long p)
{
struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
@@ -781,6 +782,7 @@
complete(&ep->done);
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/*
* Generic method for our eh processing.
@@ -823,6 +825,7 @@
/* Try to proceed the operation we have been asked for */
sts = -1;
+#ifdef CONFIG_PCI_ERR_RECOVERY
/* We may be in an error condition because the PCI bus
* went down. In this case, we need to wait until the
* PCI bus is reset, the card is reset, and only then
@@ -850,6 +853,7 @@
}
np->s.io_reset_wait = NULL;
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
switch(op) {
case SYM_EH_ABORT:
@@ -1971,6 +1975,7 @@
}
/* ------------- PCI Error Recovery infrastructure -------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
/** sym2_io_error_detected() is called when PCI error is detected */
static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
{
@@ -2021,6 +2026,7 @@
np->s.io_state = pci_channel_io_normal;
sym_eeh_done (np->s.io_reset_wait);
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/*
* Driver host template.
@@ -2275,18 +2281,22 @@
MODULE_DEVICE_TABLE(pci, sym2_id_table);
+#ifdef CONFIG_PCI_ERR_RECOVERY
static struct pci_error_handlers sym2_err_handler = {
.error_detected = sym2_io_error_detected,
.slot_reset = sym2_io_slot_reset,
.resume = sym2_io_resume,
};
+#endif /* CONFIG_PCI_ERR_RECOVERY */
static struct pci_driver sym2_driver = {
.name = NAME53C8XX,
.id_table = sym2_id_table,
.probe = sym2_probe,
.remove = __devexit_p(sym2_remove),
+#ifdef CONFIG_PCI_ERR_RECOVERY
.err_handler = &sym2_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
};
static int __init sym2_init(void)
Index: linux-2.6.14-mm1/drivers/net/e100.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e100.c 2005-11-07 15:02:11.120920287 -0600
+++ linux-2.6.14-mm1/drivers/net/e100.c 2005-11-07 15:02:20.038667336 -0600
@@ -2466,6 +2466,7 @@
/* ------------------ PCI Error Recovery infrastructure -------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
/** e100_io_error_detected() is called when PCI error is detected */
static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
{
@@ -2532,6 +2533,7 @@
.slot_reset = e100_io_slot_reset,
.resume = e100_io_resume,
};
+#endif /* CONFIG_PCI_ERR_RECOVERY */
static struct pci_driver e100_driver = {
@@ -2544,7 +2546,9 @@
.resume = e100_resume,
#endif
.shutdown = e100_shutdown,
+#ifdef CONFIG_PCI_ERR_RECOVERY
.err_handler = &e100_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
};
static int __init e100_init_module(void)
Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:12.811682734 -0600
+++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:20.071662701 -0600
@@ -206,6 +206,7 @@
void e1000_rx_schedule(void *data);
#endif
+#ifdef CONFIG_PCI_ERR_RECOVERY
static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state);
static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
static void e1000_io_resume(struct pci_dev *pdev);
@@ -215,6 +216,7 @@
.slot_reset = e1000_io_slot_reset,
.resume = e1000_io_resume,
};
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/* Exported from other modules */
@@ -230,7 +232,9 @@
.suspend = e1000_suspend,
.resume = e1000_resume,
#endif
+#ifdef CONFIG_PCI_ERR_RECOVERY
.err_handler = &e1000_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
};
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
@@ -4375,6 +4379,7 @@
#endif
/* --------------- PCI Error Recovery infrastructure ------------ */
+#ifdef CONFIG_PCI_ERR_RECOVERY
/** e1000_io_error_detected() is called when PCI error is detected */
static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
{
@@ -4457,5 +4462,6 @@
if(netif_running(netdev))
mod_timer(&adapter->watchdog_timer, jiffies);
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/* e1000_main.c */
Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:14.779406268 -0600
+++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:20.075662139 -0600
@@ -132,6 +132,7 @@
static void ixgb_netpoll(struct net_device *dev);
#endif
+#ifdef CONFIG_PCI_ERR_RECOVERY
static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state);
static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
static void ixgb_io_resume (struct pci_dev *pdev);
@@ -141,6 +142,7 @@
.slot_reset = ixgb_io_slot_reset,
.resume = ixgb_io_resume,
};
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/* Exported from other modules */
@@ -151,8 +153,9 @@
.id_table = ixgb_pci_tbl,
.probe = ixgb_probe,
.remove = __devexit_p(ixgb_remove),
+#ifdef CONFIG_PCI_ERR_RECOVERY
.err_handler = &ixgb_err_handler,
-
+#endif /* CONFIG_PCI_ERR_RECOVERY */
};
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
@@ -2146,6 +2149,7 @@
#endif
/* -------------- PCI Error Recovery infrastructure ---------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
/** ixgb_io_error_detected() is called when PCI error is detected */
static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
{
@@ -2210,5 +2214,6 @@
memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
ixgb_update_stats(adapter);
}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
/* ixgb_main.c */
linas wrote:
> +/** ipr_eeh_slot_reset - called when pci slot has been reset.
> + *
> + * This routine is called by the pci error recovery recovery
> + * code after the PCI slot has been reset, just before we
> + * should resume normal operations.
> + */
> +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
> +{
> + unsigned long flags = 0;
> + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
> +
> + // pci_enable_device(pdev);
> + // pci_set_master(pdev);
I assume you want remove these two lines... The pci config space
restore in ipr's reset handling should cover them.
> + spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
> + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space,
> + IPR_SHUTDOWN_NONE);
> + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
> +
> + return PERS_RESULT_RECOVERED;
> +}
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
On Mon, 7 Nov 2005, Greg KH wrote:
>
> > enum pci_channel_state {
> > - pci_channel_io_normal = 0, /* I/O channel is in normal state */
> > - pci_channel_io_frozen = 1, /* I/O to channel is blocked */
> > - pci_channel_io_perm_failure, /* PCI card is dead */
> > + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */
> > + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */
> > + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */
> > };
>
> You don't have to use an enum anymore, just use a #define.
The enum works fine, though, and has less namespace pollution than a
#define, so sometimes an enum can be preferred.
HOWEVER. For sanity, if possible please avoid using the value "0". It's
magic for __bitwise, in that a zero is always acceptable as a bitwise
thing (which makes sense if you think of bitwise as being about bits: the
zero representation is totally independent of any bit ordering).
So it's better to start counting from 1 if possible.
> Sparse developers, I see code in the kernel that that does both
> (__force foo_t) and (foo_t __force). Which one is correct?
sparse doesn't care. Whatever scans better for humans. Attributes like
"force" parse the same way things like "const" and "volatile" parses, and
while most people _tend_ to write "const int", it's not incorrect to write
"int const". Same with "__attribute__((force))", aka __force.
Linus
On Mon, Nov 07, 2005 at 03:40:32PM -0600, Brian King was heard to remark:
> linas wrote:
> > +/** ipr_eeh_slot_reset - called when pci slot has been reset.
> > + *
> > + * This routine is called by the pci error recovery recovery
> > + * code after the PCI slot has been reset, just before we
> > + * should resume normal operations.
> > + */
> > +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
> > +{
> > + unsigned long flags = 0;
> > + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
> > +
> > + // pci_enable_device(pdev);
> > + // pci_set_master(pdev);
>
> I assume you want remove these two lines... The pci config space
> restore in ipr's reset handling should cover them.
Yes, I do. Its cruft left over from old test and debug cycles. :(
On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> > +typedef int __bitwise pci_channel_state_t;
>
> You don't have to use an enum anymore, just use a #define.
Per Linus's remarks about namespace pollution, I've kept the enums.
> > +typedef int __bitwise pers_result_t;
>
> You should at least keep "pci" at the beginning to make it make
> more sense to people looking at it for the first time.
PCI_ERS and pci_ers, then.
I'm feeling like a blinkin' spammer, splatting out all these emails.
--linas
PCI Error Recovery: header file patch
Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique,
less generic "PCI-ERS" "PCI Error Recovery System".
Please apply.
Signed-off-by: Linas Vepstas <[email protected]>
--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.000000000 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 16:34:29.790592784 -0600
@@ -82,10 +82,12 @@
* the pci device. If some PCI bus between here and the pci device
* has crashed or locked up, this info is reflected here.
*/
+typedef int __bitwise pci_channel_state_t;
+
enum pci_channel_state {
- pci_channel_io_normal = 0, /* I/O channel is in normal state */
- pci_channel_io_frozen = 1, /* I/O to channel is blocked */
- pci_channel_io_perm_failure, /* PCI card is dead */
+ pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O channel is in normal state */
+ pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* I/O to channel is blocked */
+ pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, /* PCI card is dead */
};
/*
@@ -121,7 +123,7 @@
this is D0-D3, D0 being fully functional,
and D3 being off. */
- enum pci_channel_state error_state; /* current connectivity state */
+ pci_channel_state_t error_state; /* current connectivity state */
struct device dev; /* Generic device interface */
/* device is compatible with these IDs */
@@ -245,35 +247,46 @@
};
/* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure. If a PCI device driver provides
+/** PCI Error Recovery System (PCI-ERS). If a PCI device driver provides
* a set fof callbacks in struct pci_error_handlers, then that device driver
* will be notified of PCI bus errors, and will be driven to recovery
* when an error occurs.
*/
-enum pcierr_result {
- PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */
- PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
- PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
- PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
- PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
+typedef int __bitwise pci_ers_result_t;
+
+enum pci_ers_result {
+ /* no result/none/not supported in device driver */
+ PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1,
+
+ /* Device driver can recover without slot reset */
+ PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2,
+
+ /* Device driver wants slot to be reset. */
+ PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3,
+
+ /* Device has completely failed, is unrecoverable */
+ PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4,
+
+ /* Device driver is fully recovered and operational */
+ PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
};
/* PCI bus error event callbacks */
struct pci_error_handlers
{
/* PCI bus error detected on this device */
- int (*error_detected)(struct pci_dev *dev,
- enum pci_channel_state error);
+ pci_ers_result_t (*error_detected)(struct pci_dev *dev,
+ pci_channel_state_t error);
/* MMIO has been re-enabled, but not DMA */
- int (*mmio_enabled)(struct pci_dev *dev);
+ pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev);
/* PCI Express link has been reset */
- int (*link_reset)(struct pci_dev *dev);
+ pci_ers_result_t (*link_reset)(struct pci_dev *dev);
/* PCI slot has been reset */
- int (*slot_reset)(struct pci_dev *dev);
+ pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
/* Device driver may resume normal operations */
void (*resume)(struct pci_dev *dev);
On Mon, Nov 07, 2005 at 04:43:38PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark:
> > On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> > > +typedef int __bitwise pci_channel_state_t;
> >
> > You don't have to use an enum anymore, just use a #define.
>
> Per Linus's remarks about namespace pollution, I've kept the enums.
That's fine.
> > > +typedef int __bitwise pers_result_t;
> >
> > You should at least keep "pci" at the beginning to make it make
> > more sense to people looking at it for the first time.
>
> PCI_ERS and pci_ers, then.
Sounds good.
> I'm feeling like a blinkin' spammer, splatting out all these emails.
Care to just resend the whole series over again? No "patch on top of
patch" stuff is needed here.
thanks,
greg k-h
On Mon, Nov 07, 2005 at 01:54:35PM -0800, Linus Torvalds wrote:
> On Mon, 7 Nov 2005, Greg KH wrote:
> > > enum pci_channel_state {
> > > - pci_channel_io_normal = 0, /* I/O channel is in normal state */
> > > - pci_channel_io_frozen = 1, /* I/O to channel is blocked */
> > > - pci_channel_io_perm_failure, /* PCI card is dead */
> > > + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */
> > > + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */
> > > + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */
> > > };
> >
> > You don't have to use an enum anymore, just use a #define.
>
> The enum works fine, though, and has less namespace pollution than a
> #define, so sometimes an enum can be preferred.
Good point.
> > Sparse developers, I see code in the kernel that that does both
> > (__force foo_t) and (foo_t __force). Which one is correct?
>
> sparse doesn't care. Whatever scans better for humans. Attributes like
> "force" parse the same way things like "const" and "volatile" parses, and
> while most people _tend_ to write "const int", it's not incorrect to write
> "int const". Same with "__attribute__((force))", aka __force.
Ok, thanks for clearing this up.
greg k-h
On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark:
> > I'm feeling like a blinkin' spammer, splatting out all these emails.
>
> Care to just resend the whole series over again? No "patch on top of
> patch" stuff is needed here.
So that I can avoid that spammin' feelin' ...
I'll send patches against -git10, then, so as to start with a clean
slate; unless you wanted something aginst -mm1?
"The whole series": do you want all 42 patches? Or just the seven
discussed today?
-----
In the series-of-42, the staging of some of the patches in the
middle require simultaneous update to both the drivers/pci/hotplug
and the arch/powerpc/xxx; otherwise, build breaks result. I am
not sure how to handle that: the obvious solution is to split these
up... but that will probably result in a bigger series, and was
not a step I wanted to take unless someone asked...
--linas
On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
>
> > > Also, "grep typedef include/linux/*" shows that many kernel device
> > > drivers use this convention.
> >
> > They are wrong and should be fixed.
>
> What, precisely, is wrong?
I can't seem to find it on google, but IIRC Linus stated that he didn't
want any more structures defined with typedefs. If it is a structure,
simple keep it one, and don't use typedef to get rid of "struct".
This was for the simple reason, too many developers were passing
structures by value instead of by reference, just because they were
using a type that they didn't realize was a structure. And to make
things worse, these structures started to get bigger.
So in my every day programming, I switched to not typedef structures
anymore, and I even found some places that I passed structures by value
when it would have been much more efficient by reference.
The only exceptions that I can see where you typedef a structure is for
use with arch dependent types, like atomic_t.
-- Steve
On Monday November 7, [email protected] wrote:
>
> This was for the simple reason, too many developers were passing
> structures by value instead of by reference, just because they were
> using a type that they didn't realize was a structure. And to make
> things worse, these structures started to get bigger.
>
Another reason for not using typedefs is that if you do, and you want
to refer to the structure in some other include file, you have to
#include the include file that devices the structure.
If you don't use typedefs, you can just say:
struct foo;
and the compiler will happily wait for the complete definition later
(providing it doesn't need the size in the meanwhile).
So avoiding typedef means that you can sometimes avoid excess
#includes, which means faster compiling.
NeilBrown
On Mon, Nov 07, 2005 at 05:19:55PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark:
> > > I'm feeling like a blinkin' spammer, splatting out all these emails.
> >
> > Care to just resend the whole series over again? No "patch on top of
> > patch" stuff is needed here.
>
> So that I can avoid that spammin' feelin' ...
>
> I'll send patches against -git10, then, so as to start with a clean
> slate; unless you wanted something aginst -mm1?
-git10 would be great.
> "The whole series": do you want all 42 patches? Or just the seven
> discussed today?
Just the 7 discussed. The others should go to their proper maintainers
(which I am not.)
> -----
>
> In the series-of-42, the staging of some of the patches in the
> middle require simultaneous update to both the drivers/pci/hotplug
> and the arch/powerpc/xxx; otherwise, build breaks result. I am
> not sure how to handle that: the obvious solution is to split these
> up... but that will probably result in a bigger series, and was
> not a step I wanted to take unless someone asked...
The drivers/pci/hotplug/ stuff only touches the rpaphp driver, right?
If so, I don't have a problem with Paul/Ben sending those on with the
other PPC64 changes to keep everything building properly for your arch.
thanks,
greg k-h
On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
>
> don't use typedef to get rid of "struct".
>
> This was for the simple reason, too many developers were passing
> structures by value instead of by reference, just because they were
> using a type that they didn't realize was a structure.
That's a rather bizarre mistake to make, since, in order to
access a values in such a beast, you have to use a dot . instead
of an arrow -> and so it hits ou in the face that you passed a value
instead of a reference.
----
Off-topic: There's actually a neat little trick in C++ that can
help avoid accidentally passing null pointers. One can declare
function declarations as:
int func (sturct blah &v) {
v.a ++;
return v.b;
}
The ampersand says "pass argument by reference (so as to get arg passing
efficiency) but force coder to write code as if they were passing by value"
As a result, it gets difficult to pass null pointers (for reasons
similar to the difficulty of passing null pointers in Java (and yes,
I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick
only; I wish it was in C so I could experiment more and find out if I
like it or hate it.
--linas
On Tue, 2005-11-08 at 17:23 -0600, linas wrote:
> On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> >
> > don't use typedef to get rid of "struct".
> >
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure.
>
> That's a rather bizarre mistake to make, since, in order to
> access a values in such a beast, you have to use a dot . instead
> of an arrow -> and so it hits ou in the face that you passed a value
> instead of a reference.
It happens when you access the variable via macros and other routines
that you notice that takes and address of the variable, so you just pass
in the address of the current local variable.
>
> ----
> Off-topic: There's actually a neat little trick in C++ that can
> help avoid accidentally passing null pointers. One can declare
> function declarations as:
>
> int func (sturct blah &v) {
> v.a ++;
> return v.b;
> }
>
> The ampersand says "pass argument by reference (so as to get arg passing
> efficiency) but force coder to write code as if they were passing by value"
> As a result, it gets difficult to pass null pointers (for reasons
> similar to the difficulty of passing null pointers in Java (and yes,
> I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick
> only; I wish it was in C so I could experiment more and find out if I
> like it or hate it.
>
Actually, the true pass by reference (not by pointer) is one of the
things that C++ has, that I wish C had.
-- Steve
On Tue, Nov 08, 2005 at 12:18:42PM +1100, Neil Brown was heard to remark:
>
> Another reason for not using typedefs is that if you do, and you want
> to refer to the structure in some other include file, you have to
> #include the include file that devices the structure.
> If you don't use typedefs, you can just say:
>
> struct foo;
>
> and the compiler will happily wait for the complete definition later
> (providing it doesn't need the size in the meanwhile).
Yes, this is the "forward declaration" problem I was refering to.
Its unavoidable if structs have circular references to each other.
However, I've learned, by experience, several things by trying to
eliminate such forward declarations (and the related #include hell):
-- Its really, really hard, and right in the middle, you think,
"gosh this is a stupid idea, why am I bothering?"
-- When you get done, you think: "wow this new code structure
is so insanely better than the old code! The guy who wrote
the old code should be hung from a yardarm as an example!"
So having a mechanism that prevents coders from declaring
"struct foo" whenever they feel like it can be a good thing.
Of course, your milage may vary.
--linas
On Nov 8, 2005, at 18:23:27, linas wrote:
> Off-topic: There's actually a neat little trick in C++ that can
> help avoid accidentally passing null pointers. One can declare
> function declarations as:
>
> int func (sturct blah &v) {
> v.a ++;
> return v.b;
> }
>
> The ampersand says "pass argument by reference (so as to get arg
> passing efficiency) but force coder to write code as if they were
> passing by value" As a result, it gets difficult to pass null
> pointers (for reasons similar to the difficulty of passing null
> pointers in Java (and yes, I loathe Java, sorry to subject you to
> that)) Anyway, that's a C++ trick only; I wish it was in C so I
> could experiment more and find out if I like it or hate it.
That technique tends to cause more problems than it solves. If I
write the following code:
struct foo the_leftmost_foo = get_leftmost_foo();
do_some_stuff(the_leftmost_foo);
How do I know what it is going to do? Will it modify
the_leftmost_foo, or is it a pass-by-value as it appears? This is
just as bad as defining a macro some_macro(foo,bar) that does (foo =
bar), it's _really_ hard to tell what it does, especially when you
aren't all that familiar with the code. A much better solution is this:
void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1)));
do_some_stuff(&the_leftmost_foo);
That ensures that the first argument cannot be explicitly passed as
null, while still being quite obvious to the programmer what it's doing.
Cheers,
Kyle Moffett
--
They _will_ find opposing experts to say it isn't, if you push hard
enough the wrong way. Idiots with a PhD aren't hard to buy.
-- Rob Landley
On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote:
> On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> >
> > don't use typedef to get rid of "struct".
> >
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure.
>
> That's a rather bizarre mistake to make, since, in order to
> access a values in such a beast, you have to use a dot . instead
> of an arrow -> and so it hits ou in the face that you passed a value
> instead of a reference.
>
> ----
> Off-topic: There's actually a neat little trick in C++ that can
> help avoid accidentally passing null pointers. One can declare
> function declarations as:
>
> int func (sturct blah &v) {
> v.a ++;
> return v.b;
> }
>
> The ampersand says "pass argument by reference (so as to get arg passing
> efficiency) but force coder to write code as if they were passing by value"
> As a result, it gets difficult to pass null pointers (for reasons
> similar to the difficulty of passing null pointers in Java (and yes,
> I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick
> only; I wish it was in C so I could experiment more and find out if I
> like it or hate it.
I hate it: it obscures the fact that it's a pass-by-reference at the
callsite, which is useful information. Although this is, admittedly,
the least confusing use of C++ reference types.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote:
> On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote:
[snip]
> > The ampersand says "pass argument by reference (so as to get arg passing
> > efficiency) but force coder to write code as if they were passing by value"
> > As a result, it gets difficult to pass null pointers (for reasons
> > similar to the difficulty of passing null pointers in Java (and yes,
> > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick
> > only; I wish it was in C so I could experiment more and find out if I
> > like it or hate it.
>
> I hate it: it obscures the fact that it's a pass-by-reference at the
> callsite, which is useful information. Although this is, admittedly,
> the least confusing use of C++ reference types.
I agree with you about that one. It's yet another thing for C
programmers to have to learn to watch for C++ doing behind your back.
However, it isn't any worse than having an ordinary C pointer to some
struct. If the pointer was passed to the current function from above,
and you're passing it to another function below, you really don't know
what's going to happen to the structure unless you go look. Just like
the C++ reference, the C pointer doesn't get an address-of operator to
remind you.
--
Zan Lynx <[email protected]>
On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark:
> On Nov 8, 2005, at 18:23:27, linas wrote:
> >Off-topic: There's actually a neat little trick in C++ that can
> >help avoid accidentally passing null pointers. One can declare
> >function declarations as:
> >
> > int func (sturct blah &v) {
> > v.a ++;
> > return v.b;
> > }
> >
> >The ampersand says "pass argument by reference (so as to get arg
> >passing efficiency) but force coder to write code as if they were
> >passing by value" As a result, it gets difficult to pass null
> >pointers (for reasons similar to the difficulty of passing null
> >pointers in Java (and yes, I loathe Java, sorry to subject you to
> >that)) Anyway, that's a C++ trick only; I wish it was in C so I
> >could experiment more and find out if I like it or hate it.
>
> That technique tends to cause more problems than it solves. If I
> write the following code:
>
> struct foo the_leftmost_foo = get_leftmost_foo();
> do_some_stuff(the_leftmost_foo);
>
> How do I know what it is going to do?
It depends on how do_some_stuff() was declared. If its declared as
do_some_stuff (struct foo &x)
then it will be a pass by reference.
> A much better solution is this:
>
> void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1)));
Think of it as "syntactic sugar": the compiler "does the right thing"
without all the grungy extra markup such as __atribute.
(Remember that at the dawn of time, C++ was just a bunch of pre-processor
markup that did nothing but hide grunge like __attribute__((whatever))
from the programmer. Only later did it become a language. Doing markup
like what you're suggesting is only a tiny step away from inventing a new
language, esp if you come up with some clever, unobtrusive markup for
it.)
> That ensures that the first argument cannot be explicitly passed as
> null,
Well, this misses the point. No one intentionally passes null pointers.
Its just that "shit happens". Pass-by-reference changes your coding
style. You tend to alloc on stack instead of malloc. And then, since
its on stack, you know it would be very wrong to keep a pointer to it,
and so you don't, you design code differently. Usually, you discover
you never really needed to hold a pointer to it anyway; you just did so
out of some ingrained habit.
And since its on stack, you can't leak memory, you don't need to
reference count it. Much fewer mallocs & frees, so less likely to have
errors there. Better performance, and less memory fragmentation, for
what that's worth.
I dunno, I did this once on a larger, year-long project, and rather
liked it (I otherwise don't much like C++, since people tend to use
it in bad, horrible ways). I won't say this is the greatest
coding style in the world, but it does change the way you think about
designing code, mostly for the better.
--linas
linas <[email protected]> writes:
> On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark:
>> That technique tends to cause more problems than it solves. If I
>> write the following code:
>>
>> struct foo the_leftmost_foo = get_leftmost_foo();
>> do_some_stuff(the_leftmost_foo);
>>
>> How do I know what it is going to do?
>
> It depends on how do_some_stuff() was declared. If its declared as
>
> do_some_stuff (struct foo &x)
>
> then it will be a pass by reference.
Yeah, but if you're trying to read that code, you have to go look up
the declaration to figure out whether it might affect 'foo' or not.
And if you get it wrong, you get silent data corruption.
I'd rather pass a pointer explicitly and crash with a segfault if
someone passes NULL--at least then it's pellucidly clear what went
wrong.
-Doug
On Tue, Nov 08, 2005 at 05:13:48PM -0700, Zan Lynx was heard to remark:
> On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote:
> >
> > I hate it: it obscures the fact that it's a pass-by-reference at the
> > callsite, which is useful information. Although this is, admittedly,
> > the least confusing use of C++ reference types.
>
> I agree with you about that one. It's yet another thing for C
> programmers to have to learn to watch for C++ doing behind your back.
I think you're rushing to judgement on something you've never tried.
It fundamentally changes coding style; you'd have to try it on some
mid-size project for at least a few months or longer to get into the
mindset. To make it all work, you also have to do other things, like
avoid mallocs and allocing on stack, which forces major changes of
style (because of the lifetime of things on stack). If you don't change
style to go with it, then you'll just end up in debug hell, in which
case you'd be right: it would be a (very) bad idea.
(Disclaimer: I've moved away from C++ because of all the other
opportunities for misuse that it offers and encourages.)
--linas
On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
>
> Yeah, but if you're trying to read that code, you have to go look up
> the declaration to figure out whether it might affect 'foo' or not.
> And if you get it wrong, you get silent data corruption.
No, that is not what "pass by reference" means. You are thinking of
"const", maybe, or "pass by value"; this is neither. The arg is not
declared const, the subroutine can (and usually will) modify the contents
of the structure, and so the caller will be holding a modified structure
when the callee returns (just like it would if a pointer was passed).
--linas
linas <[email protected]> writes:
> On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
>>
>> Yeah, but if you're trying to read that code, you have to go look up
>> the declaration to figure out whether it might affect 'foo' or not.
>> And if you get it wrong, you get silent data corruption.
>
> No, that is not what "pass by reference" means. You are thinking of
> "const", maybe, or "pass by value"; this is neither. The arg is not
> declared const, the subroutine can (and usually will) modify the contents
> of the structure, and so the caller will be holding a modified structure
> when the callee returns (just like it would if a pointer was passed).
Right. My point is only that it's not clear from looking at the call
site whether a struct passed by reference will be modified by the
callee (some people pass by reference just for "efficiency"). And if
the called function modifies the data without the caller's knowledge,
it leads to obscure bugs. Whereas if you pass a pointer, it's
immediately clear that the called function can modify the pointed-to
object.
-Doug
On Nov 8, 2005, at 19:48:08, linas wrote:
> On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was
> heard to remark:
>> Yeah, but if you're trying to read that code, you have to go look
>> up the declaration to figure out whether it might affect 'foo' or
>> not. And if you get it wrong, you get silent data corruption.
>
> No, that is not what "pass by reference" means. You are thinking of
> "const", maybe, or "pass by value"; this is neither. The arg is
> not declared const, the subroutine can (and usually will) modify
> the contents of the structure, and so the caller will be holding a
> modified structure when the callee returns (just like it would if a
> pointer was passed).
Pass by value in C:
do_some_stuff(arg1, arg2);
Pass by reference in C:
do_some_stuff(&arg1, &arg2);
This is very obvious what it does. The compiler does type-checks to
make sure you don't get it wrong. There are tools to check stack
usage of functions too. This is inherently obvious what the code
does without looking at a completely different file where the
function is defined.
Pass by value in C++:
do_some_stuff(arg1, arg2);
Pass by reference in C++:
do_some_stuff(arg1, arg2);
This is C++ being clever and hiding stuff from the programmer, which
is Not Good(TM) for a kernel. C++ may be an excellent language for
userspace programmers (I say "may" here because some disagree,
including myself), however, many of the features are extremely
problematic for a kernel.
Cheers,
Kyle Moffett
--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
-- Brian Kernighan
On Tuesday 08 November 2005 19:59, Douglas McNaught wrote:
> linas <[email protected]> writes:
>
> > On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
> >>
> >> Yeah, but if you're trying to read that code, you have to go look up
> >> the declaration to figure out whether it might affect 'foo' or not.
> >> And if you get it wrong, you get silent data corruption.
> >
> > No, that is not what "pass by reference" means. You are thinking of
> > "const", maybe, or "pass by value"; this is neither. The arg is not
> > declared const, the subroutine can (and usually will) modify the contents
> > of the structure, and so the caller will be holding a modified structure
> > when the callee returns (just like it would if a pointer was passed).
>
> Right. My point is only that it's not clear from looking at the call
> site whether a struct passed by reference will be modified by the
> callee (some people pass by reference just for "efficiency"). And if
> the called function modifies the data without the caller's knowledge,
> it leads to obscure bugs. Whereas if you pass a pointer, it's
> immediately clear that the called function can modify the pointed-to
> object.
>
A structure is almost never passed by value, no matter whether it is C
or C++. So both languages require you either use descriptive naming or
look up declaration/implementation:
C:
struct str {
char buf[1024];
int count;
};
struct str s;
do_something_with_s(&s);
do_something_else_with_s(&s);
Which one modufies s?
--
Dmitry
On Tue, 2005-11-08 at 18:42 -0600, linas wrote:
[ C vs C++ ]
> It fundamentally changes coding style; you'd have to try it on some
> mid-size project for at least a few months or longer to get into the
> mindset. To make it all work, you also have to do other things, like
> avoid mallocs and allocing on stack, which forces major changes of
> style (because of the lifetime of things on stack). If you don't change
The lifetime of the stack is AFAIK the same on C and C++. So there can't
be a significant difference.
> style to go with it, then you'll just end up in debug hell, in which
> case you'd be right: it would be a (very) bad idea.
>
> (Disclaimer: I've moved away from C++ because of all the other
> opportunities for misuse that it offers and encourages.)
You that opportunities in all programming languages - in some more (perl
being probably the leader here), in some less (I don't know one).
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On Tue, 2005-11-08 at 18:33 -0500, Steven Rostedt wrote:
> On Tue, 2005-11-08 at 17:23 -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> > >
> > > don't use typedef to get rid of "struct".
> > >
> > > This was for the simple reason, too many developers were passing
> > > structures by value instead of by reference, just because they were
> > > using a type that they didn't realize was a structure.
> >
> > That's a rather bizarre mistake to make, since, in order to
> > access a values in such a beast, you have to use a dot . instead
> > of an arrow -> and so it hits ou in the face that you passed a value
> > instead of a reference.
And for every access of a field with a . you also look if it is not a
locally declared (small) struct?
> It happens when you access the variable via macros and other routines
> that you notice that takes and address of the variable, so you just pass
> in the address of the current local variable.
> > ----
> > Off-topic: There's actually a neat little trick in C++ that can
> > help avoid accidentally passing null pointers. One can declare
And if you want a NULL-pointer equivalent, you declared a defined
null_blah object just to have a reference.
I've seen that often enough.
If want to avoid accidental NULL pointers, use "splint" or similar
tools. Or add an BUG_ON().
> > function declarations as:
> >
> > int func (sturct blah &v) {
> > v.a ++;
> > return v.b;
> > }
> >
> > The ampersand says "pass argument by reference (so as to get arg passing
> > efficiency) but force coder to write code as if they were passing by value"
> > As a result, it gets difficult to pass null pointers (for reasons
> > similar to the difficulty of passing null pointers in Java (and yes,
See above for NULL-pointer equivalents.
> > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick
> > only; I wish it was in C so I could experiment more and find out if I
No, it's not a trick butt an ordinary language feature.
And no, it was already in several Pascal's decades ago.
> > like it or hate it.
>
> Actually, the true pass by reference (not by pointer) is one of the
> things that C++ has, that I wish C had.
No, you probably don't because if you forget one of the & in a call
chain (or even worse: it is removed for whatever bizarre reason), you
might get interesting bugs to hunt. And C++ is also much more creative
with temporaries which are simply thrown away afterwards.
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett <[email protected]> wrote:
>
> Pass by value in C:
> do_some_stuff(arg1, arg2);
>
> Pass by reference in C:
> do_some_stuff(&arg1, &arg2);
>
> This is very obvious what it does. The compiler does type-checks to
> make sure you don't get it wrong. There are tools to check stack
> usage of functions too. This is inherently obvious what the code
> does without looking at a completely different file where the
> function is defined.
>
>
> Pass by value in C++:
> do_some_stuff(arg1, arg2);
>
> Pass by reference in C++:
> do_some_stuff(arg1, arg2);
>
> This is C++ being clever and hiding stuff from the programmer, which
> is Not Good(TM) for a kernel. C++ may be an excellent language for
> userspace programmers (I say "may" here because some disagree,
> including myself), however, many of the features are extremely
> problematic for a kernel.
>
Why is it not good for kernel ?
You want to pass an struct to a function in the best way you can.
Reference just pases a pointer instead of copying, but you don't
realize.
If you want the funcion to be able to modify the struct, code it as
void do_some_stuff(T& arg1,T& arg2)
If you DO NOT want the funcion to be able to modify the struct, code it as
void do_some_stuff(const T& arg1,const T& arg2)
This is far better than in C,. because you get the benefits from
reference pass without the problems of accidental modification of
pointer contents. And get rid of arrows -> ;).
If the function modifies the struct it should be obvious from its name,
not depending if you put an & in the call or not.
And you stop worrying about argument pass methods.
The person who programs the function decides and can even change it without
you user even noticing.
And gcc does nice optimizations when you mix const& and inlining...
--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
On Wed, 9 Nov 2005, J.A. Magallon wrote:
> On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett <[email protected]> wrote:
>
> >
> > Pass by value in C:
> > do_some_stuff(arg1, arg2);
> >
> > Pass by reference in C:
> > do_some_stuff(&arg1, &arg2);
> >
> > This is very obvious what it does. The compiler does type-checks to
> > make sure you don't get it wrong. There are tools to check stack
> > usage of functions too. This is inherently obvious what the code
> > does without looking at a completely different file where the
> > function is defined.
> >
> >
> > Pass by value in C++:
> > do_some_stuff(arg1, arg2);
> >
> > Pass by reference in C++:
> > do_some_stuff(arg1, arg2);
> >
> > This is C++ being clever and hiding stuff from the programmer, which
> > is Not Good(TM) for a kernel. C++ may be an excellent language for
> > userspace programmers (I say "may" here because some disagree,
> > including myself), however, many of the features are extremely
> > problematic for a kernel.
> >
>
> Why is it not good for kernel ?
> You want to pass an struct to a function in the best way you can.
> Reference just pases a pointer instead of copying, but you don't
> realize.
> If you want the funcion to be able to modify the struct, code it as
>
> void do_some_stuff(T& arg1,T& arg2)
>
> If you DO NOT want the funcion to be able to modify the struct, code it as
>
> void do_some_stuff(const T& arg1,const T& arg2)
A diligent C programmer would write this as follows:
void do_some_stuff (struct T * a, struct T * b);
versus
void do_more_stuff (const struct T * a, const struct T * b);
So I don't see C++ winning at all here.
> This is far better than in C,. because you get the benefits from
> reference pass without the problems of accidental modification of
> pointer contents. And get rid of arrows -> ;).
>
> If the function modifies the struct it should be obvious from its name,
> not depending if you put an & in the call or not.
> And you stop worrying about argument pass methods.
I think I'll call this my rule #1:
The moment you stop worrying about something is the moment it bites you
in the butt. :-) Much firsthand experience.
> The person who programs the function decides and can even change it without
> you user even noticing.
And if the caller is passing in something that's not meant to be
modified, then the modification causes much badness. Happens with both
languages, too.
> And gcc does nice optimizations when you mix const& and inlining...
As far as I know, nothing stops GCC from doing the exact same
optimizations in the function prototypes given above.
>
> --
> J.A. Magallon <jamagallon()able!es> \ Software is like sex:
> werewolf!able!es \ It's better when it's free
> Mandriva Linux release 2006.1 (Cooker) for i586
> Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
>
-Vadim Lobanov
On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> On Wed, 9 Nov 2005, J.A. Magallon wrote:
>
> > void do_some_stuff(T& arg1,T& arg2)
>
> A diligent C programmer would write this as follows:
> void do_some_stuff (struct T * a, struct T * b);
> So I don't see C++ winning at all here.
I guess the real point that I'd wanted to make, and seems
to have gotten lost, was that by avoiding using pointers,
you end up designing code in a very different way, and you
can find out that often/usually, you don't need structs
filled with a zoo of pointers.
Minimizing pointers is good: less ref counting is needed,
fewer mallocs are needed, fewer locks are needed
(because of local/private scope!!), and null pointer
deref errors are less likely.
There are even performance implications: on modern CPU's
there's a very long pipeline to memory (hundreds of cycles
for a cache miss! Really! Worse if you have run out of
TLB entries!). So walking a long linked list chasing
pointers can really really hurt performance.
By using refs instead of pointers, it helps you focus
on the issue of "do I really need to store this pointer
somewhere? Will I really need it later, or can I be done
with it now?".
I don't know if the idea of "using fewer pointers" can
actually be carried out in the kernel. For starters,
the stack is way too short to be able to put much on it.
--linas
On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers,
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.
Umm, references are implemented as pointers. Instead of a "zoo of
pointers" you have a "zoo of references". No functional difference.
> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed
> (because of local/private scope!!), and null pointer
> deref errors are less likely.
Not true at all! If you're storing references you absolutley still need
reference counting. Allocation non-trivial things on the stack is Bad
Idea in kernel land.
On Wed, Nov 09, 2005 at 11:36:25AM -0800, [email protected] was heard to remark:
> On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
>
> Umm, references are implemented as pointers. Instead of a "zoo of
> pointers" you have a "zoo of references". No functional difference.
Sigh.
I think you are confusing references and pointers. By definition
you cannot "store a reference"; however, you can "dereference"
an object and store a pointer to it.
The C programming language conflates these two different ideas;
that is why they seem to be "the same thing" to you.
> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed
> > (because of local/private scope!!), and null pointer
> > deref errors are less likely.
>
> Not true at all!
Which part isn't true?
--linas
On Wed, 9 Nov 2005, linas wrote:
> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
>>
>>> void do_some_stuff(T& arg1,T& arg2)
>>
>> A diligent C programmer would write this as follows:
>> void do_some_stuff (struct T * a, struct T * b);
>> So I don't see C++ winning at all here.
>
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers,
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.
>
But you can't avoid pointers unless you make your entire
program have global scope. That may be great for performance,
but a killer if for have any bugs.
Procedures that get pointers to variables (including structures)
are a way of isolating faults. Without them, you can't test
the procedures in a working environment.
Also, without pointers, you are severely limited on the kinds
of libraries you can share. You certainly wouldn't want
to compile an entire C runtime library into your code so
that all the buffers have local scope.
> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed
> (because of local/private scope!!), and null pointer
> deref errors are less likely.
>
No. Minimizing pointers should not be an objective. Properly
using the components of your tool-set should be. This means
that you use the correct access mode for various object
types. "Correct" depends upon the instant context, not upon
some company or personal rule.
> There are even performance implications: on modern CPU's
> there's a very long pipeline to memory (hundreds of cycles
> for a cache miss! Really! Worse if you have run out of
> TLB entries!). So walking a long linked list chasing
> pointers can really really hurt performance.
>
Linked lists are some of the necessary elements when one
doesn't know ahead of time the number of objects that must
be manipulated. They are just programming tools. You use
them when they are necessary. The fact that they use pointers
to make the links is not relevant.
> By using refs instead of pointers, it helps you focus
> on the issue of "do I really need to store this pointer
> somewhere? Will I really need it later, or can I be done
> with it now?".
>
Huh? References (at the opcode level) are pointers. There
is no difference whatsoever. For memory references, you
Get:
direct, direct+displacement, register_indirect,
register_indirect+displacement.
There isn't anything else. Some processors let you sum
displacements over several registers. Nevertheless, that's
all you have. Accessing a variable by reference is an
old artifact of FORTRAN. It can be efficient if the
architecture is flat and global so the compiler can
substitute direct access. In other words, no parameter
or pointer actually gets passed to the routine. The
compiler just remembers what the parameters actually
were and substitutes code to directly access the
parameters.
Not so in C++. With C++, "reference" is a user-shorthand
where the compiler actually accesses variables with pointers.
The rules don't prohibit C++ compilers from using FORTRAN-
like conventions for passing-by-reference. It's just that
nobody seems to do so.
> I don't know if the idea of "using fewer pointers" can
> actually be carried out in the kernel. For starters,
> the stack is way too short to be able to put much on it.
>
> --linas
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote:
> On Wed, Nov 09, 2005 at 11:36:25AM -0800, [email protected] was heard to remark:
> > Umm, references are implemented as pointers. Instead of a "zoo of
> > pointers" you have a "zoo of references". No functional difference.
>
> Sigh.
>
> I think you are confusing references and pointers. By definition
> you cannot "store a reference"; however, you can "dereference"
> an object and store a pointer to it.
Sigh, That's funny - I've written C++ code which has references as members
of objects. You absolutely *can* store a reference.
References are simply a syntactic simplification to eliminate the
different pointer-dereference notation. If they make you think about a
problem differently, that's fine, but they are really just pointers in
disguise.
On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote:
> On Wed, Nov 09, 2005 at 11:36:25AM -0800, [email protected] was heard to remark:
> > On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:
SHUT UP! SHUT UP ALL OF YOU!!
Or at least stop cc'ing linux-pci on this stupid wanking.
Thanks.
On Wed, 9 Nov 2005, linas wrote:
> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> > On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >
> > > void do_some_stuff(T& arg1,T& arg2)
> >
> > A diligent C programmer would write this as follows:
> > void do_some_stuff (struct T * a, struct T * b);
> > So I don't see C++ winning at all here.
>
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers,
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.
>
> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed
> (because of local/private scope!!), and null pointer
> deref errors are less likely.
>
> There are even performance implications: on modern CPU's
> there's a very long pipeline to memory (hundreds of cycles
> for a cache miss! Really! Worse if you have run out of
> TLB entries!). So walking a long linked list chasing
> pointers can really really hurt performance.
>
> By using refs instead of pointers, it helps you focus
> on the issue of "do I really need to store this pointer
> somewhere? Will I really need it later, or can I be done
> with it now?".
>
> I don't know if the idea of "using fewer pointers" can
> actually be carried out in the kernel. For starters,
> the stack is way too short to be able to put much on it.
I really see the two issues at hand as being very much orthogonal to
each other.
Namely, you put data on the stack when you need it in the local
'context' only, whereas you put data globally when it needs to be
available globally. The C++ references are nothing more than syntactic
sugar (and we all know what they say about that and semicolons) for
pointers, and so I don't see how they would affect the choices at all.
Choosing where the data goes should be done according to the data's
lifetime, not the specifics of how functions are declared.
</soapbox>
> --linas
>
>
-Vadim Lobanov
[email protected] writes:
> Sigh, That's funny - I've written C++ code which has references as members
> of objects. You absolutely *can* store a reference.
You can _initialize_, but not _modify_ (reseat) it.
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
On Wed, 2005-11-09 at 22:53 +0100, Andreas Schwab wrote:
> [email protected] writes:
>
> > Sigh, That's funny - I've written C++ code which has references as members
> > of objects. You absolutely *can* store a reference.
>
> You can _initialize_, but not _modify_ (reseat) it.
reset?
As in:
---- snip ----
struct x {
struct y * const p;
};
---- snip ----
We assume that no one casts the "const" away.
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
>
> On Wed, 9 Nov 2005, linas wrote:
>
> > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>
> >>> void do_some_stuff(T& arg1,T& arg2)
> >>
> >> A diligent C programmer would write this as follows:
> >> void do_some_stuff (struct T * a, struct T * b);
> >> So I don't see C++ winning at all here.
> >
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> >
>
> But you can't avoid pointers unless you make your entire
> program have global scope. That may be great for performance,
> but a killer if for have any bugs.
Just to extract some useful technical knowledge from the current ongoing
"flamewar"...
I'm not entirely sure if the above statement regarding performance is
correct. Some enlightenment would be appreciated.
Suppose you have the following code:
int myvar;
void foo (void) {
printf("%d\n", myvar);
bar();
printf("%d\n", myvar);
}
If bar is declared in _another_ file as
void bar (void);
then I believe the compiler has to reread the global 'myvar' from memory
for the second printf().
However, if the code is as follows:
void foo (void) {
int myvar = 0;
printf("%d\n", myvar);
bar(&myvar);
printf("%d\n", myvar);
}
If bar is declared in _another_ file as
void bar (const int * var);
then I think the compiler can validly cache the value of 'myvar' for the
second printf without re-reading it. Correct/incorrect?
-Vadim Lobanov
On Wed, 9 Nov 2005, Vadim Lobanov wrote:
> On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
>
>>
>> On Wed, 9 Nov 2005, linas wrote:
>>
>>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
>>>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
>>>>
>>>>> void do_some_stuff(T& arg1,T& arg2)
>>>>
>>>> A diligent C programmer would write this as follows:
>>>> void do_some_stuff (struct T * a, struct T * b);
>>>> So I don't see C++ winning at all here.
>>>
>>> I guess the real point that I'd wanted to make, and seems
>>> to have gotten lost, was that by avoiding using pointers,
>>> you end up designing code in a very different way, and you
>>> can find out that often/usually, you don't need structs
>>> filled with a zoo of pointers.
>>>
>>
>> But you can't avoid pointers unless you make your entire
>> program have global scope. That may be great for performance,
>> but a killer if for have any bugs.
>
> Just to extract some useful technical knowledge from the current ongoing
> "flamewar"...
> I'm not entirely sure if the above statement regarding performance is
> correct. Some enlightenment would be appreciated.
>
> Suppose you have the following code:
> int myvar;
> void foo (void) {
> printf("%d\n", myvar);
> bar();
> printf("%d\n", myvar);
> }
> If bar is declared in _another_ file as
> void bar (void);
> then I believe the compiler has to reread the global 'myvar' from memory
> for the second printf().
>
Correct because bar() could have modified (it's global).
> However, if the code is as follows:
> void foo (void) {
> int myvar = 0;
> printf("%d\n", myvar);
> bar(&myvar);
> printf("%d\n", myvar);
> }
> If bar is declared in _another_ file as
> void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?
>
Maybe you tried to trick me by showing the variable was not going
to be changed (const *). In that case, the compiler may not re-read
the variable. However, it can re-read the variable.
A "smart" compiler might just do: write(1, "0\n", 2);
... for the first printf() as well. Such compilers make
debugging difficult.
> -Vadim Lobanov
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
Trimmed linux-pci so as not to annoy those who don't want to listen to
all of this. Anyone else who wants off the CC list should yell also. :-)
On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
>
> On Wed, 9 Nov 2005, Vadim Lobanov wrote:
>
> > On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
> >
> >>
> >> On Wed, 9 Nov 2005, linas wrote:
> >>
> >>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >>>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>>>
> >>>>> void do_some_stuff(T& arg1,T& arg2)
> >>>>
> >>>> A diligent C programmer would write this as follows:
> >>>> void do_some_stuff (struct T * a, struct T * b);
> >>>> So I don't see C++ winning at all here.
> >>>
> >>> I guess the real point that I'd wanted to make, and seems
> >>> to have gotten lost, was that by avoiding using pointers,
> >>> you end up designing code in a very different way, and you
> >>> can find out that often/usually, you don't need structs
> >>> filled with a zoo of pointers.
> >>>
> >>
> >> But you can't avoid pointers unless you make your entire
> >> program have global scope. That may be great for performance,
> >> but a killer if for have any bugs.
> >
>
> Maybe you tried to trick me by showing the variable was not going
> to be changed (const *). In that case, the compiler may not re-read
> the variable. However, it can re-read the variable.
>
> A "smart" compiler might just do: write(1, "0\n", 2);
> ... for the first printf() as well. Such compilers make
> debugging difficult.
It wasn't meant to be a trick. In fact, I _want_ the compiler to cache
the myvar value in the second case -- I was merely wondering if there
was some fact that I overlooked that would prevent such an optimization.
The ultimate point of this was to show that globals can actually be
slower than locals (imagine the int in the example replaced by a
gigantic struct), contrary to the implication of your original
statement. :-)
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
> Warning : 98.36% of all statistics are fiction.
> .
>
> ****************************************************************
> The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
>
> Thank you.
>
-Vadim Lobanov
Yes, trim off bluesmoke mailing list
thanks
doug thompson
> bluesmoke-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bluesmoke-devel
Vadim Lobanov <[email protected]> writes:
> However, if the code is as follows:
> void foo (void) {
> int myvar = 0;
> printf("%d\n", myvar);
> bar(&myvar);
> printf("%d\n", myvar);
> }
> If bar is declared in _another_ file as
> void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?
Incorrect. bar() may cast away const. In C const does not mean readonly.
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
On Wed, Nov 09, 2005 at 03:26:10PM -0500, linux-os (Dick Johnson) was heard to remark:
>
> On Wed, 9 Nov 2005, linas wrote:
>
> > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>
> >>> void do_some_stuff(T& arg1,T& arg2)
> >>
> >> A diligent C programmer would write this as follows:
> >> void do_some_stuff (struct T * a, struct T * b);
> >> So I don't see C++ winning at all here.
> >
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
>
> But you can't avoid pointers unless you make your entire
> program have global scope.
I didn't say you can avoid all pointers. I did say that
for many projects, one can often avoid many pointers.
And I certainly did not say that one needs global scope
to do so. In fact, I said the opposite.
> Also, without pointers, you are severely limited on the kinds
> of libraries you can share.
I think you don't understand what a reference is.
A reference is just like a pointer, except that the
signature is different. It has nothing to do with the
ability to create or use libraries, or to create/use
modular code.
I was trying to say that by focusing on the concept
of a "reference" as opposed to the concept of a
"pointer", you can write code that is *more* modular,
not less.
> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed
> > (because of local/private scope!!), and null pointer
> > deref errors are less likely.
>
> No. Minimizing pointers should not be an objective.
Why not?
I've fixed hundreds of kernel bugs (which you don't
see on this list because my fixes mostly go to the
distros or other users) and nine out of ten of these
are null-pointer derefs.
Maybe I'm naive for thinking that "fewer pointers ==
fewer pointer bugs" but, hey its worth a shot.
> Properly
> using the components of your tool-set should be.
What tool set are you refering to? I am assuming
that the code is 100% malleable: that one has
complete authority to redesign the way the system
works, from the ground up. If you do not have this
freedom, but are forced to use someone-else's tool set,
then yes, you are SOL. Furthermore, I would agree
that mixing two different styles of coding in one
project can lead to some nasty, ugly code.
> > By using refs instead of pointers, it helps you focus
> > on the issue of "do I really need to store this pointer
> > somewhere? Will I really need it later, or can I be done
> > with it now?".
>
> Huh? References (at the opcode level) are pointers. There
> is no difference whatsoever.
Yes, that is right. I'm not talking about opcodes.
That's not what the conversation is about.
What I am trying to say is that many people design
code in such a way that they need to store lots of
pointers in an assortment of structs.
I wanted to emphasize that there are other ways of
designing code, which has smaller needs for pointers
(and that this can be done without loosing modularity,
testability, debugability, and it can be done without
resorting to global variables.)
--linas
On Thu, 10 Nov 2005, Andreas Schwab wrote:
> Vadim Lobanov <[email protected]> writes:
>
> > However, if the code is as follows:
> > void foo (void) {
> > int myvar = 0;
> > printf("%d\n", myvar);
> > bar(&myvar);
> > printf("%d\n", myvar);
> > }
> > If bar is declared in _another_ file as
> > void bar (const int * var);
> > then I think the compiler can validly cache the value of 'myvar' for the
> > second printf without re-reading it. Correct/incorrect?
>
> Incorrect. bar() may cast away const. In C const does not mean readonly.
In that case, I stand corrected.
Is there any real reason to apply const to pointer targets, aside from
giving yourself a warning in the case you try to write the pointer
target directly? Seems to be a missed opportunity for optimizations
where the coder designates that it's okay to do so.
> Andreas.
>
> --
> Andreas Schwab, SuSE Labs, [email protected]
> SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
> PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>
-Vadim Lobanov
On Wed, Nov 09, 2005 at 01:43:10PM -0800, Vadim Lobanov was heard to remark:
> On Wed, 9 Nov 2005, linas wrote:
>
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> >
> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed
> > (because of local/private scope!!), and null pointer
> > deref errors are less likely.
> >
> > There are even performance implications: on modern CPU's
> > there's a very long pipeline to memory (hundreds of cycles
> > for a cache miss! Really! Worse if you have run out of
> > TLB entries!). So walking a long linked list chasing
> > pointers can really really hurt performance.
> >
> > By using refs instead of pointers, it helps you focus
> > on the issue of "do I really need to store this pointer
> > somewhere? Will I really need it later, or can I be done
> > with it now?".
> >
> > I don't know if the idea of "using fewer pointers" can
> > actually be carried out in the kernel. For starters,
> > the stack is way too short to be able to put much on it.
>
> I really see the two issues at hand as being very much orthogonal to
> each other.
Yes. I accidentally linked them, see below.
> Namely, you put data on the stack when you need it in the local
> 'context' only, whereas you put data globally when it needs to be
> available globally.
Yes. But there's some flexibility.
> The C++ references are nothing more than syntactic
> sugar (and we all know what they say about that and semicolons) for
> pointers,
Yes.
> and so I don't see how they would affect the choices at all.
> Choosing where the data goes should be done according to the data's
> lifetime, not the specifics of how functions are declared.
My apologies for linking the idea of references to fewer pointers.
They're not linked, except in how I discovered them.
I once had a project (that used threads, so it was "kernel-like",
in that race conditions had to be dealt with). One day, for the
the hell of it, I decided to create a struct and keep it on the
stack, instead of mallocing it. Since this struct was accessed
only by a few small, well-defined routines that did not keep any
pointers to it, this worked just fine. And skipping the malloc/free
felt good, so I liked it.
Then I thought that maybe I could push the idea, see how far I
could go. Well, of course, the code was filled with various
objects, all of which *seemed* to be (or seemed to need to be)
long-lifetime objects. And they all stored pointers to one-another,
since they all needed to get access to one-another at various points,
for various reasons.
Well, I really wanted to alloc objects on stack, and so that forced
me to think about how to get rid of pointers (since a pointer to
an object on stack is deadly). And that forced me to think about
lifetime. And some of this thinking was quite hard. But encouraged
by some modest success at first, I found that I was able to
eliminate almost all the pointers, and almost all the mallocs
(maybe several dozen of each, scattered accross maybe several dozen
structs). And I was flabbergasted, since the resulting program
actually got smaller in the process, and faster. And the
null-pointer derefs vanished.
Now, maybe this was specific to the project, and can't be replicated
elsewhere. But this was a communcations daemon: it basically
was a pool of threads, each thread handling a long-lived,
stateful "session" of requests and responses from some remote server,
and so while its not the kernel, that's a reasonably complex thing.
I'm not crazy enough to suggest that one could do the same thing
in the Linux kernel, since one probably can't, but now that we're
here and all, it does make me wonder. FWIW, the two designs of
the commo daemon were radically different; things that were sliced
one way got reworked to flow and be handled in a completely
different order. You can't just get rid of pointers with some
trivial restructuring; you have to figure out how not to need them.
--linas
On Wed, 2005-11-09 at 15:40 -0800, Vadim Lobanov wrote:
> On Thu, 10 Nov 2005, Andreas Schwab wrote:
>
> > Vadim Lobanov <[email protected]> writes:
> >
> > > However, if the code is as follows:
> > > void foo (void) {
> > > int myvar = 0;
> > > printf("%d\n", myvar);
> > > bar(&myvar);
> > > printf("%d\n", myvar);
> > > }
> > > If bar is declared in _another_ file as
> > > void bar (const int * var);
> > > then I think the compiler can validly cache the value of 'myvar' for the
> > > second printf without re-reading it. Correct/incorrect?
> >
> > Incorrect. bar() may cast away const. In C const does not mean readonly.
>
> In that case, I stand corrected.
>
> Is there any real reason to apply const to pointer targets, aside from
> giving yourself a warning in the case you try to write the pointer
> target directly? Seems to be a missed opportunity for optimizations
> where the coder designates that it's okay to do so.
Actually, where are you going to cache it? In a register? but calling
bar() may use that register, so it would be stored on the stack anyway.
I doubt that this is a problem with the compiler, since if bar _is_
small, then myvar is most likely already in the processor's cache to
begin with, so it wouldn't need to go back out to memory, unless it was
modified.
-- Steve
On Wed, 9 Nov 2005, Steven Rostedt wrote:
> On Wed, 2005-11-09 at 15:40 -0800, Vadim Lobanov wrote:
> > On Thu, 10 Nov 2005, Andreas Schwab wrote:
> >
> > > Vadim Lobanov <[email protected]> writes:
> > >
> > > > However, if the code is as follows:
> > > > void foo (void) {
> > > > int myvar = 0;
> > > > printf("%d\n", myvar);
> > > > bar(&myvar);
> > > > printf("%d\n", myvar);
> > > > }
> > > > If bar is declared in _another_ file as
> > > > void bar (const int * var);
> > > > then I think the compiler can validly cache the value of 'myvar' for the
> > > > second printf without re-reading it. Correct/incorrect?
> > >
> > > Incorrect. bar() may cast away const. In C const does not mean readonly.
> >
> > In that case, I stand corrected.
> >
> > Is there any real reason to apply const to pointer targets, aside from
> > giving yourself a warning in the case you try to write the pointer
> > target directly? Seems to be a missed opportunity for optimizations
> > where the coder designates that it's okay to do so.
>
> Actually, where are you going to cache it? In a register? but calling
> bar() may use that register, so it would be stored on the stack anyway.
May, but not necessarily will.
> I doubt that this is a problem with the compiler, since if bar _is_
> small, then myvar is most likely already in the processor's cache to
> begin with, so it wouldn't need to go back out to memory, unless it was
> modified.
You're right, however. There's very few cases where such an optimization
would be useful, due to register constraints.
> -- Steve
>
>
-Vadim Lobanov
On Wed, 9 Nov 2005 14:12:38 -0800 (PST), Vadim Lobanov <[email protected]> wrote:
> On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
>
> >
> > On Wed, 9 Nov 2005, linas wrote:
> >
> > > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> > >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> > >>
> > >>> void do_some_stuff(T& arg1,T& arg2)
> > >>
> > >> A diligent C programmer would write this as follows:
> > >> void do_some_stuff (struct T * a, struct T * b);
> > >> So I don't see C++ winning at all here.
> > >
> > > I guess the real point that I'd wanted to make, and seems
> > > to have gotten lost, was that by avoiding using pointers,
> > > you end up designing code in a very different way, and you
> > > can find out that often/usually, you don't need structs
> > > filled with a zoo of pointers.
> > >
> >
> > But you can't avoid pointers unless you make your entire
> > program have global scope. That may be great for performance,
> > but a killer if for have any bugs.
>
> Just to extract some useful technical knowledge from the current ongoing
> "flamewar"...
> I'm not entirely sure if the above statement regarding performance is
> correct. Some enlightenment would be appreciated.
>
> Suppose you have the following code:
> int myvar;
> void foo (void) {
> printf("%d\n", myvar);
> bar();
> printf("%d\n", myvar);
> }
> If bar is declared in _another_ file as
> void bar (void);
> then I believe the compiler has to reread the global 'myvar' from memory
> for the second printf().
>
> However, if the code is as follows:
> void foo (void) {
> int myvar = 0;
> printf("%d\n", myvar);
> bar(&myvar);
> printf("%d\n", myvar);
> }
> If bar is declared in _another_ file as
> void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?
>
Nope. You can't trust bar() not doing something like
bar(const int* local_var)
{
... use local_var as ro...
extern int myvar;
myvar = 7;
}
For the compiler to do that, you must tag bar() with attribute(pure).
--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
J.A. Magallon writes:
[...]
> >
> > However, if the code is as follows:
> > void foo (void) {
> > int myvar = 0;
> > printf("%d\n", myvar);
> > bar(&myvar);
> > printf("%d\n", myvar);
> > }
> > If bar is declared in _another_ file as
> > void bar (const int * var);
> > then I think the compiler can validly cache the value of 'myvar' for the
> > second printf without re-reading it. Correct/incorrect?
> >
>
> Nope. You can't trust bar() not doing something like
>
> bar(const int* local_var)
> {
> ... use local_var as ro...
> extern int myvar;
> myvar = 7;
> }
>
> For the compiler to do that, you must tag bar() with attribute(pure).
extern declaration in your version of bar() cannot refer to the
automatic variable myvar in foo().
>
> --
> J.A. Magallon <jamagallon()able!es> \ Software is like sex:
> werewolf!able!es \ It's better when it's free
> Mandriva Linux release 2006.1 (Cooker) for i586
> Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
Nikita.
On Thu, 10 Nov 2005, Nikita Danilov wrote:
> J.A. Magallon writes:
>
> [...]
>
> > >
> > > However, if the code is as follows:
> > > void foo (void) {
> > > int myvar = 0;
> > > printf("%d\n", myvar);
> > > bar(&myvar);
> > > printf("%d\n", myvar);
> > > }
> > > If bar is declared in _another_ file as
> > > void bar (const int * var);
> > > then I think the compiler can validly cache the value of 'myvar' for the
> > > second printf without re-reading it. Correct/incorrect?
> > >
> >
> > Nope. You can't trust bar() not doing something like
> >
> > bar(const int* local_var)
> > {
> > ... use local_var as ro...
> > extern int myvar;
> > myvar = 7;
> > }
> >
> > For the compiler to do that, you must tag bar() with attribute(pure).
>
> extern declaration in your version of bar() cannot refer to the
> automatic variable myvar in foo().
Also, just because some called function may use casts to write
to a variable behind a void pointer doesn't mean that the function's
code must respect the potential of such buggy code.
The compiler must properly assume that the local variable, 'myvar'
will never be modified through the void pointer, even though some
jerk's buggy code may actually do that. Infortunately most protection
is page-based so there isn't any way for an OS to seg-fault somebody
who writes through void pointers.
The only hint that somebody did the wrong thing is that the wrong
value got printed. Because 'C' allows the use of casts, which
allows anything to be cast to anything else (may have to cheat by
first casting to void), the 'C' language not only allows you to
shoot yourself in the foot, but also hands you all the tools
necessary to do that, including loading and cocking the trigger.
Code inspection is sometimes necessary to avoid such problems.
In particular one must be on the lookout for 'excessive' casts.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> extern declaration in your version of bar() cannot refer to the
> automatic variable myvar in foo().
int foo;
void bar(const int *local_var) {
foo = *local_var + 1;
}
void show(void) {
printf("%d\n", foo);
bar(&foo);
printf("%d\n", foo);
}
If GCC thought it could arbitrarily cache anything it wanted to, then
code like this would die. There is a whole mess of code in GCC
designed specifically to watch for and avoid aliasing issues like these.
Cheers,
Kyle Moffett
--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
-- Brian Kernighan
On Thu, 10 Nov 2005, Kyle Moffett wrote:
> On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> > extern declaration in your version of bar() cannot refer to the
> > automatic variable myvar in foo().
>
> int foo;
Except foo is not an automatic variable within show(). When the variable
is global, then there's no argument. It was a question of when foo is
local and bar() would get a const pointer to the local.
> void bar(const int *local_var) {
> foo = *local_var + 1;
> }
>
> void show(void) {
> printf("%d\n", foo);
> bar(&foo);
> printf("%d\n", foo);
> }
>
> If GCC thought it could arbitrarily cache anything it wanted to, then
> code like this would die. There is a whole mess of code in GCC
> designed specifically to watch for and avoid aliasing issues like these.
>
> Cheers,
> Kyle Moffett
>
> --
> Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are, by
> definition, not smart enough to debug it.
> -- Brian Kernighan
>
>
-Vadim Lobanov
On Thu, 10 Nov 2005 11:28:27 -0800 (PST), Vadim Lobanov <[email protected]> wrote:
> On Thu, 10 Nov 2005, Kyle Moffett wrote:
>
> > On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> > > extern declaration in your version of bar() cannot refer to the
> > > automatic variable myvar in foo().
> >
> > int foo;
>
> Except foo is not an automatic variable within show(). When the variable
> is global, then there's no argument. It was a question of when foo is
> local and bar() would get a const pointer to the local.
>
> > void bar(const int *local_var) {
> > foo = *local_var + 1;
> > }
> >
> > void show(void) {
> > printf("%d\n", foo);
> > bar(&foo);
> > printf("%d\n", foo);
> > }
> >
You can tweak it all as you want:
int* foo_p;
void bar(const int *local_var) {
*foo_p = *local_var + 1;
}
int main() {
int foo;
foo_p = &foo;
printf("%d\n", foo);
bar(&foo);
printf("%d\n", foo);
return 0;
}
The fact is that gcc can't suppose anything unless you say explicitly it can.
That is why 'pure' and 'const' __attributes__ exist, it supposes aliasing
by default and so on.
--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
On Tuesday 08 November 2005 03:18, Neil Brown wrote:
> On Monday November 7, [email protected] wrote:
> >
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure. And to make
> > things worse, these structures started to get bigger.
> >
>
> Another reason for not using typedefs is that if you do, and you want
> to refer to the structure in some other include file, you have to
> #include the include file that devices the structure.
> If you don't use typedefs, you can just say:
>
> struct foo;
Forward decl for typedef works too:
typedef struct foo foo_t;
is ok even before struct foo is defined. Not sure that standards
allow thing, but gcc does.
> and the compiler will happily wait for the complete definition later
> (providing it doesn't need the size in the meanwhile).
> So avoiding typedef means that you can sometimes avoid excess
> #includes, which means faster compiling.
--
vda
On Fri, Dec 16, 2005 at 03:09:01PM +0200, Denis Vlasenko wrote:
>
> Forward decl for typedef works too:
>
> typedef struct foo foo_t;
>
> is ok even before struct foo is defined. Not sure that standards
> allow thing, but gcc does.
Forward declarations of typedefs don't work in at least one case that
do for struct definitions:
$ cat foo.c
typedef struct foo foo_t;
typedef struct foo foo_t;
$ gcc -Wall -o foo.o -c foo.c
foo.c:2: error: redefinition of typedef 'foo_t'
foo.c:1: error: previous declaration of 'foo_t' was here
and if you don't believe we do that, take another look at our headers
sometime.