2003-05-30 17:09:52

by Bernd Jendrissek

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

Not *exactly* on-topic for [email protected] I suppose, but here goes.

[Cc'ed to [email protected]]

On Fri, May 30, 2003 at 09:26:51AM -0600, Kendrick Hamilton wrote:
> I have a module for a custom developped PCI card. The device
> driver is written for the Linux 2.4 series kernels. When I build the
> module and the Linux kernel with gcc-2.95.3, the module installs
> correctly. When I build the module and the Linux kernel with gcc-3.2.3
> (also other gcc-3.2.x), the module installs but the Linux kernel crashes
> in random places outside of the module. Do you have any suggestions of
> what to look for? I can email you the complete module source code. I have
> not tried gcc-3.3 because I cannot compile the current Linux kernel with
> it (there is a known bug that is being fixed and should be out in
> Linux-2.4.21).

Been there, done that, got the T-shirt. I was lucky: while my module
installed, it broke in a fairly harmless way. (It just didn't work; it
didn't screw with my system.)

If you look at linux/include/linux/spinlock.h, you'll see:

/*
* Your basic spinlocks, allowing only a single CPU anywhere
*
* Most gcc versions have a nasty bug with empty initializers.
*/
#if (__GNUC__ > 2)
typedef struct { } spinlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { }
#else
typedef struct { int gcc_is_buggy; } spinlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
#endif

There are a couple of spinlock_t's (directly or through other structs) in
the task_struct. So when your module accesses parts of the "current"
task_struct beyond the first spinlock_t, you better hope it's reading and
not writing (which was the case with my module).

I bet your module modifies "current".

Hmm, actually I thought the kernel had a mechanism to prevent a GCC 3.x
module from being loaded into a GCC 2.x kernel and vice versa?


2003-05-30 17:19:03

by Kendrick Hamilton

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

I have been manually recompillng the module and kernel to ensure they are
both compiled with the same version of gcc. When I do switch gcc versions,
I cp .config to config, make mrproper, cp config .config, make dep, make
all modules modules_install install; reboot; make clean on my driver the
make it.

On Fri, 30 May 2003, Bernd Jendrissek wrote:

> Not *exactly* on-topic for [email protected] I suppose, but here goes.
>
> [Cc'ed to [email protected]]
>
> On Fri, May 30, 2003 at 09:26:51AM -0600, Kendrick Hamilton wrote:
> > I have a module for a custom developped PCI card. The device
> > driver is written for the Linux 2.4 series kernels. When I build the
> > module and the Linux kernel with gcc-2.95.3, the module installs
> > correctly. When I build the module and the Linux kernel with gcc-3.2.3
> > (also other gcc-3.2.x), the module installs but the Linux kernel crashes
> > in random places outside of the module. Do you have any suggestions of
> > what to look for? I can email you the complete module source code. I have
> > not tried gcc-3.3 because I cannot compile the current Linux kernel with
> > it (there is a known bug that is being fixed and should be out in
> > Linux-2.4.21).
>
> Been there, done that, got the T-shirt. I was lucky: while my module
> installed, it broke in a fairly harmless way. (It just didn't work; it
> didn't screw with my system.)
>
> If you look at linux/include/linux/spinlock.h, you'll see:
>
> /*
> * Your basic spinlocks, allowing only a single CPU anywhere
> *
> * Most gcc versions have a nasty bug with empty initializers.
> */
> #if (__GNUC__ > 2)
> typedef struct { } spinlock_t;
> #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
> #else
> typedef struct { int gcc_is_buggy; } spinlock_t;
> #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
> #endif
>
> There are a couple of spinlock_t's (directly or through other structs) in
> the task_struct. So when your module accesses parts of the "current"
> task_struct beyond the first spinlock_t, you better hope it's reading and
> not writing (which was the case with my module).
>
> I bet your module modifies "current".
>
> Hmm, actually I thought the kernel had a mechanism to prevent a GCC 3.x
> module from being loaded into a GCC 2.x kernel and vice versa?
>

--
Kendrick Hamilton E.I.T.
SED Systems, a division of Calian Ltd.
18 Innovation Blvd.
PO Box 1464
Saskatoon, Saskatchewan
Canada
S7N 3R1

[email protected]
Tel: (306) 933-1453
Fax: (306) 933-1486

2003-05-30 17:21:26

by Joe Buck

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

On Fri, May 30, 2003 at 07:22:40PM +0200, Bernd Jendrissek wrote:

> If you look at linux/include/linux/spinlock.h, you'll see:
>
> /*
> * Your basic spinlocks, allowing only a single CPU anywhere
> *
> * Most gcc versions have a nasty bug with empty initializers.
> */
> #if (__GNUC__ > 2)
> typedef struct { } spinlock_t;
> #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
> #else
> typedef struct { int gcc_is_buggy; } spinlock_t;
> #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
> #endif

Yuk! What is the benefit of introducing this incompatibility? #ifdefs
are harmful to maintainance, and it's only one word, so why not always
put in the dummy struct member?

> Hmm, actually I thought the kernel had a mechanism to prevent a GCC 3.x
> module from being loaded into a GCC 2.x kernel and vice versa?

Is there any reason, other than the above-described bit of evil, for doing
this (forbidding mixing)? It prevents the bug-finding approach I
described earlier (a binary search for finding miscompiled code) from
working.

2003-05-30 17:48:59

by Bernd Jendrissek

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

On Fri, May 30, 2003 at 11:31:57AM -0600, Kendrick Hamilton wrote:
> I have been manually recompillng the module and kernel to ensure they are
> both compiled with the same version of gcc. When I do switch gcc versions,
> I cp .config to config, make mrproper, cp config .config, make dep, make
> all modules modules_install install; reboot; make clean on my driver the
> make it.

Aargh. Now if I had actually *read* your message I'd have picked that up.

Well, it's not maybe some *other* module that gets left behind in
/lib/modules/$VERSION? No, that doesn't make too much sense. That
doesn't gel with the crashes happening from the time you load *your*
module.

Uh, could it maybe be (gasp!) a *bug* in your module? Maybe some
assumption your code is making is being invalidated by a new! improved!
optimization in GCC 3.x? I know my module ha[ds] bugs...

Although... I must say that ever since I recompiled 2.4.18 with 3.2.x
(now 3.2.3), my machine seems somewhat less stable. (I think) I *had* to
reboot yesterday after just 16 days' uptime after X or something else
with the keyboard went berserk. But I'm not quite ready yet to "blame"
GCC for that.

> On Fri, 30 May 2003, Bernd Jendrissek wrote:
>
> > Not *exactly* on-topic for [email protected] I suppose, but here goes.
> >
> > [Cc'ed to [email protected]]
> >
> > On Fri, May 30, 2003 at 09:26:51AM -0600, Kendrick Hamilton wrote:
> > > I have a module for a custom developped PCI card. The device
> > > driver is written for the Linux 2.4 series kernels. When I build the
> > > module and the Linux kernel with gcc-2.95.3, the module installs
> > > correctly. When I build the module and the Linux kernel with gcc-3.2.3
> > > (also other gcc-3.2.x), the module installs but the Linux kernel crashes
> > > in random places outside of the module. Do you have any suggestions of
> > > what to look for? I can email you the complete module source code. I have
> > > not tried gcc-3.3 because I cannot compile the current Linux kernel with
> > > it (there is a known bug that is being fixed and should be out in
> > > Linux-2.4.21).
> >
> > Been there, done that, got the T-shirt. I was lucky: while my module
> > installed, it broke in a fairly harmless way. (It just didn't work; it
> > didn't screw with my system.)
> >
> > If you look at linux/include/linux/spinlock.h, you'll see:
> >
> > /*
> > * Your basic spinlocks, allowing only a single CPU anywhere
> > *
> > * Most gcc versions have a nasty bug with empty initializers.
> > */
> > #if (__GNUC__ > 2)
> > typedef struct { } spinlock_t;
> > #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
> > #else
> > typedef struct { int gcc_is_buggy; } spinlock_t;
> > #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
> > #endif
> >
> > There are a couple of spinlock_t's (directly or through other structs) in
> > the task_struct. So when your module accesses parts of the "current"
> > task_struct beyond the first spinlock_t, you better hope it's reading and
> > not writing (which was the case with my module).
> >
> > I bet your module modifies "current".
> >
> > Hmm, actually I thought the kernel had a mechanism to prevent a GCC 3.x
> > module from being loaded into a GCC 2.x kernel and vice versa?

2003-05-30 18:30:48

by Bernd Jendrissek

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

On Fri, May 30, 2003 at 10:33:29AM -0700, Joe Buck wrote:
> On Fri, May 30, 2003 at 07:22:40PM +0200, Bernd Jendrissek wrote:
>
> > If you look at linux/include/linux/spinlock.h, you'll see:
> >
> > /*
> > * Your basic spinlocks, allowing only a single CPU anywhere
> > *
> > * Most gcc versions have a nasty bug with empty initializers.
> > */
> > #if (__GNUC__ > 2)
> > typedef struct { } spinlock_t;
> > #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
> > #else
> > typedef struct { int gcc_is_buggy; } spinlock_t;
> > #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
> > #endif
>
> Yuk! What is the benefit of introducing this incompatibility? #ifdefs
> are harmful to maintainance, and it's only one word, so why not always
> put in the dummy struct member?

I dont speak for the kernel people, but...

I suppose some people just insist on squeezing every last cycle out of
their machines. For my home PC (a 486 with 5MB RAM running linux 2.0.30),
I am quite grateful for such cycle and bit saving. Believe me, I notice
whether I have apache running or not. :)

Hmm, yes, it does seem to be just one word. grep -r spinlock_t . |wc -l
says 1013 here, that's across *all* architectures. IOW 4052 bytes - that's
*one page* - on i386!

Never mind what definition tcc will give to __GNUC__

So there I thought I was going to justify the kernel. Instead I mostly
agree with Joe! I'm also sure there have been flamewars about this...

> > Hmm, actually I thought the kernel had a mechanism to prevent a GCC 3.x
> > module from being loaded into a GCC 2.x kernel and vice versa?
>
> Is there any reason, other than the above-described bit of evil, for doing
> this (forbidding mixing)? It prevents the bug-finding approach I
> described earlier (a binary search for finding miscompiled code) from
> working.

Between GCC 2.x and 3.x the *major* version changed (duh). I would
imagine that people are/were (justifiably?) concerned that ABI's might
have changed. From your response, I assume there are no ABI changes
for C at least? I suppose a gratuitous ABI change would constitute a
bug, though...

BTW I said "I thought" - it appears there is in fact no such mechanism.

Okay, so here's a PR (Public Relations, not Problem Report) patch just
for you, Joe: <with a fistful of smileys :)>

(It also gets rid of some of that crazy 2-space indentation.)

diff -u linux/include/linux/spinlock.h.borig linux/include/linux/spinlock.h
--- linux/include/linux/spinlock.h.borig Tue May 13 17:05:57 2003
+++ linux/include/linux/spinlock.h Fri May 30 20:29:42 2003
@@ -53,13 +53,8 @@
*
* Most gcc versions have a nasty bug with empty initializers.
*/
-#if (__GNUC__ > 2)
- typedef struct { } spinlock_t;
- #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
-#else
- typedef struct { int gcc_is_buggy; } spinlock_t;
- #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
-#endif
+typedef struct { int gcc_was_buggy; } spinlock_t;
+#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }

#define spin_lock_init(lock) do { } while(0)
#define spin_lock(lock) (void)(lock) /* Not "unused variable". */

2003-05-30 18:49:29

by Joe Buck

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

On Fri, May 30, 2003 at 08:43:32PM +0200, Bernd Jendrissek wrote:
> > Is there any reason, other than the above-described bit of evil, for doing
> > this (forbidding mixing)? It prevents the bug-finding approach I
> > described earlier (a binary search for finding miscompiled code) from
> > working.
>
> Between GCC 2.x and 3.x the *major* version changed (duh). I would
> imagine that people are/were (justifiably?) concerned that ABI's might
> have changed. From your response, I assume there are no ABI changes
> for C at least? I suppose a gratuitous ABI change would constitute a
> bug, though...

There are no ABI changes for C.

2003-05-30 20:21:24

by Alan

[permalink] [raw]
Subject: Re: Problem Installing Linux Kernel Module compiled with gcc-3.2.x

On Gwe, 2003-05-30 at 20:02, Joe Buck wrote:
> > Between GCC 2.x and 3.x the *major* version changed (duh). I would
> > imagine that people are/were (justifiably?) concerned that ABI's might
> > have changed. From your response, I assume there are no ABI changes
> > for C at least? I suppose a gratuitous ABI change would constitute a
> > bug, though...
>
> There are no ABI changes for C.

Not quite true but close. The padding in gcc 2.96 however is actually to
work around a compiler bug