When the .*.exit sections are discarded from vmlinux, any dangling
references from the rest of the code or data to the discarded sections
are potential problems. Newer versions of binutils detect these
dangling references and complain. Unfortunately we are getting some
false positives that I cannot see an easy way of fixing, I'm looking
for ideas.
References from one discarded section to another are not an issue,
binutils is smart enough to cope with those. Pointer references to
__devexit code can be wrapped in __devexit_p, a bit of a kludge but it
works. What is killing us at the moment is the out of line spinlock
code.
As an example, net/ipv4/netfilter/ip_nat_snmp_basic.c
static void __exit fini(void)
{
ip_nat_helper_unregister(&snmp);
ip_nat_helper_unregister(&snmp_trap);
br_write_lock_bh(BR_NETPROTO_LOCK);
br_write_unlock_bh(BR_NETPROTO_LOCK);
}
The lock operations generate a branch to out of line code in section
.text.lock which then branches back to fini. When ip_nat_snmp_basic is
built into vmlinux, the fini section is discarded but the .text.lock
code is kept. That has two problems, unused code in .text.lock (minor)
and an unresolved reference from .text.lock to .text.exit which makes
binutils complain (major).
I have several options, none of which I like :-
(1) Drop the ld check for discarded sections.
I don't want to lose the ld check, it has already found several
bits of buggy code. For example, usb_uhci.c calls the exit routine
from the init code on error, but the exit code has been discarded
in vmlinux - oops. New binutils flagged that bug and others.
(2) Tell ld which sections matter and which ones it can ignore, then
ignore dangling refernces in .text.lock.
Maybe, but it makes ld specific to vmlinux's design. If this were
done by an environment variable then it might be acceptable.
(3) Add _exit versions of locks that do noop when the code is
discarded.
br_write_lock_bh_exit(BR_NETPROTO_LOCK). Barf-o-meter alert!
(4) Add #define/#undef EXIT_CODE around functions, EXIT_CODE tells
the lock functions to become noop.
#define EXIT_CODE
static void __exit fini(void)
{
ip_nat_helper_unregister(&snmp);
ip_nat_helper_unregister(&snmp_trap);
br_write_lock_bh(BR_NETPROTO_LOCK);
br_write_unlock_bh(BR_NETPROTO_LOCK);
}
#undef EXIT_CODE
Barf-o-meter overflow!
(5) Post process the objects before ld sees them, remove the dangling
references in safe sections.
Will probably mess up timestamps on objects, as well as requiring
yet another program for kernel build. Cross compiling would be
"interesting".
Number (2) is the least objectionable but I am hoping for any better
ideas.
--On Sunday, 23 December, 2001 8:17 PM +1100 Keith Owens <[email protected]>
wrote:
> (5) Post process the objects before ld sees them, remove the dangling
> references in safe sections.
>
> Will probably mess up timestamps on objects, as well as requiring
> yet another program for kernel build. Cross compiling would be
> "interesting".
>
1+5) (seeing as you seem to have already written some perl); would it
be possible to run our own perl code to check for whichever dangling
references we are concerned about (and not those we aren't), then do
> (1) Drop the ld check for discarded sections.
>
> I don't want to lose the ld check, it has already found several
> bits of buggy code. For example, usb_uhci.c calls the exit routine
> from the init code on error, but the exit code has been discarded
> in vmlinux - oops. New binutils flagged that bug and others.
This would seem to have the advantage of better readability of errors
too.
--
Alex Bligh
On Sun, 23 Dec 2001, Keith Owens wrote:
> As an example, net/ipv4/netfilter/ip_nat_snmp_basic.c
> static void __exit fini(void)
> {
> ip_nat_helper_unregister(&snmp);
> ip_nat_helper_unregister(&snmp_trap);
> br_write_lock_bh(BR_NETPROTO_LOCK);
> br_write_unlock_bh(BR_NETPROTO_LOCK);
> }
>
> The lock operations generate a branch to out of line code in section
> .text.lock which then branches back to fini. When ip_nat_snmp_basic is
> built into vmlinux, the fini section is discarded but the .text.lock
> code is kept. That has two problems, unused code in .text.lock (minor)
> and an unresolved reference from .text.lock to .text.exit which makes
> binutils complain (major).
The logical fix would be to have the out-of-line code put into
.text.exit.lock in this case, and then discard .text.exit.lock during the
final link. Now, the problem is to have the spinlock code automatically
chose the right section.
But I think, I found a solution: Don't use .text.lock for the out-of-line
code, use .text 1 (subsection 1) instead. This will put the out-of-line
code out of line, but into the same section - case solved ;-)
I'll patch my kernel now - since I don't have the new binutils yet, it'll
be a bit hard to verify it really works, though.
--Kai
On Sun, 23 Dec 2001, Kai Germaschewski wrote:
> But I think, I found a solution: Don't use .text.lock for the out-of-line
> code, use .text 1 (subsection 1) instead. This will put the out-of-line
> code out of line, but into the same section - case solved ;-)
So here's the patch (i386 only). Appears to work.
diff -X excl -urN linux-2.5.2-pre1.patches/arch/i386/vmlinux.lds linux-2.5.2-pre1.work/arch/i386/vmlinux.lds
--- linux-2.5.2-pre1.patches/arch/i386/vmlinux.lds Mon Jul 2 23:40:14 2001
+++ linux-2.5.2-pre1.work/arch/i386/vmlinux.lds Sun Dec 23 23:41:51 2001
@@ -13,7 +13,6 @@
*(.fixup)
*(.gnu.warning)
} = 0x9090
- .text.lock : { *(.text.lock) } /* out-of-line lock text */
_etext = .; /* End of text section */
diff -X excl -urN linux-2.5.2-pre1.patches/include/asm-i386/rwlock.h linux-2.5.2-pre1.work/include/asm-i386/rwlock.h
--- linux-2.5.2-pre1.patches/include/asm-i386/rwlock.h Fri Sep 22 23:07:43 2000
+++ linux-2.5.2-pre1.work/include/asm-i386/rwlock.h Sun Dec 23 23:08:22 2001
@@ -24,7 +24,7 @@
asm volatile(LOCK "subl $1,(%0)\n\t" \
"js 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
"2:\tcall " helper "\n\t" \
"jmp 1b\n" \
".previous" \
@@ -34,7 +34,7 @@
asm volatile(LOCK "subl $1,%0\n\t" \
"js 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
"2:\tpushl %%eax\n\t" \
"leal %0,%%eax\n\t" \
"call " helper "\n\t" \
@@ -54,7 +54,7 @@
asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
"jnz 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
"2:\tcall " helper "\n\t" \
"jmp 1b\n" \
".previous" \
@@ -64,7 +64,7 @@
asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
"jnz 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
"2:\tpushl %%eax\n\t" \
"leal %0,%%eax\n\t" \
"call " helper "\n\t" \
diff -X excl -urN linux-2.5.2-pre1.patches/include/asm-i386/rwsem.h linux-2.5.2-pre1.work/include/asm-i386/rwsem.h
--- linux-2.5.2-pre1.patches/include/asm-i386/rwsem.h Sun Dec 23 23:03:21 2001
+++ linux-2.5.2-pre1.work/include/asm-i386/rwsem.h Sun Dec 23 23:39:05 2001
@@ -101,7 +101,7 @@
LOCK_PREFIX " incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */
" js 2f\n\t" /* jump if we weren't granted the lock */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\n\t"
" pushl %%ecx\n\t"
" pushl %%edx\n\t"
@@ -130,7 +130,7 @@
" testl %0,%0\n\t" /* was the count 0 before? */
" jnz 2f\n\t" /* jump if we weren't granted the lock */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\n\t"
" pushl %%ecx\n\t"
" call rwsem_down_write_failed\n\t"
@@ -154,7 +154,7 @@
LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtracts 1, returns the old value */
" js 2f\n\t" /* jump if the lock is being waited upon */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\n\t"
" decw %%dx\n\t" /* do nothing if still outstanding active readers */
" jnz 1b\n\t"
@@ -180,7 +180,7 @@
LOCK_PREFIX " xaddl %%edx,(%%eax)\n\t" /* tries to transition 0xffff0001 -> 0x00000000 */
" jnz 2f\n\t" /* jump if the lock is being waited upon */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\n\t"
" decw %%dx\n\t" /* did the active count reduce to 0? */
" jnz 1b\n\t" /* jump back if not */
diff -X excl -urN linux-2.5.2-pre1.patches/include/asm-i386/semaphore.h linux-2.5.2-pre1.work/include/asm-i386/semaphore.h
--- linux-2.5.2-pre1.patches/include/asm-i386/semaphore.h Sun Dec 23 23:03:21 2001
+++ linux-2.5.2-pre1.work/include/asm-i386/semaphore.h Sun Dec 23 23:39:05 2001
@@ -122,7 +122,7 @@
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
".previous"
@@ -149,7 +149,7 @@
"js 2f\n\t"
"xorl %0,%0\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\tcall __down_failed_interruptible\n\t"
"jmp 1b\n"
".previous"
@@ -177,7 +177,7 @@
"js 2f\n\t"
"xorl %0,%0\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\tcall __down_failed_trylock\n\t"
"jmp 1b\n"
".previous"
@@ -203,7 +203,7 @@
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
".previous"
diff -X excl -urN linux-2.5.2-pre1.patches/include/asm-i386/softirq.h linux-2.5.2-pre1.work/include/asm-i386/softirq.h
--- linux-2.5.2-pre1.patches/include/asm-i386/softirq.h Sun Dec 23 23:03:22 2001
+++ linux-2.5.2-pre1.work/include/asm-i386/softirq.h Sun Dec 23 23:39:05 2001
@@ -33,7 +33,7 @@
"jnz 2f;" \
"1:;" \
\
- ".section .text.lock,\"ax\";" \
+ ".subsection 1;" \
"2: pushl %%eax; pushl %%ecx; pushl %%edx;" \
"call %c1;" \
"popl %%edx; popl %%ecx; popl %%eax;" \
diff -X excl -urN linux-2.5.2-pre1.patches/include/asm-i386/spinlock.h linux-2.5.2-pre1.work/include/asm-i386/spinlock.h
--- linux-2.5.2-pre1.patches/include/asm-i386/spinlock.h Thu Dec 20 23:36:25 2001
+++ linux-2.5.2-pre1.work/include/asm-i386/spinlock.h Sun Dec 23 23:39:04 2001
@@ -56,7 +56,7 @@
"\n1:\t" \
"lock ; decb %0\n\t" \
"js 2f\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
"2:\t" \
"cmpb $0,%0\n\t" \
"rep;nop\n\t" \
Kai Germaschewski wrote:
>
> asm volatile(LOCK "subl $1,(%0)\n\t" \
> "js 2f\n" \
> "1:\n" \
> - ".section .text.lock,\"ax\"\n" \
> + ".subsection 1\n" \
> "2:\tcall " helper "\n\t" \
> "jmp 1b\n" \
> ".previous" \
Don't we want `.subsection 0' here, rather than .previous?
Apart from that, it looks like a winner.
-
On Sun, 23 Dec 2001 16:15:47 -0800,
Andrew Morton <[email protected]> wrote:
>Kai Germaschewski wrote:
>>
>> asm volatile(LOCK "subl $1,(%0)\n\t" \
>> "js 2f\n" \
>> "1:\n" \
>> - ".section .text.lock,\"ax\"\n" \
>> + ".subsection 1\n" \
>> "2:\tcall " helper "\n\t" \
>> "jmp 1b\n" \
>> ".previous" \
>
>Don't we want `.subsection 0' here, rather than .previous?
>
>Apart from that, it looks like a winner.
One downside, subsections mess up backtrace. The out of line code
shows up as belonging to the last function in subsection 0, which will
confuse users. It will also break kdb backtrace which knows that the
out of line code in section .text.lock is special.
Easily fixed, add label .text.lock.basename at the start of each chunk
of subsection 1 code. As an added bonus, oops backtrace and ksymoops
will now say .text.lock.floppy+offset instead of stext_lock+some large
number, IOW the traces will show which source the failing lock is in.
Not 100% unambiguous because some sources have the same base name, but
better than we already have, the rest of the trace should help
disambiguate the problem source.
gcc/as generates worse code for the local branches in the out of line
subsection. With .text.lock we get
0: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
7: f3 90 repz nop
9: 7e f5 jle 0 <.text.lock> <=== 2 bytes
b: e9 ca 01 00 00 jmp 1da <.text.lock+0x1da>
With .subsection 1 it generates
.text.lock.es1371:
6387: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
638e: f3 90 repz nop
6390: 0f 8e f1 ff ff ff jle 6387 <.text.lock.es1371> <=== 6 bytes
6396: e9 33 9e ff ff jmp 1ce <set_adc_rate+0x8e>
The inline code is unchanged, it is only the out of line code that is
bigger. IMHO the subsection difference is a gcc/as bug which should
not stop us using this fix.
Unless anybody objects, I will send this patch to Marcelo and, once it
is in 2.4.18-pre, do separate patches for each architecture. Normally
I would apply this to 2.5 first then backport to 2.4 but 2.5 is not
taking patches at the moment (it does not even have the __devexit fix)
and this problem is already affecting 2.4 users.
============ patch 2.4.17
Drop section .text.lock, use subsections in the current section instead
to avoid dangling references to discarded sections and give better
debugging on oops. Add a standard __stringify macro instead of
everybody writing their own macros. Kai Germaschewski, Keith Owens.
Index: 17.1/include/asm-i386/rwlock.h
--- 17.1/include/asm-i386/rwlock.h Fri, 05 Jan 2001 13:42:29 +1100 kaos (linux-2.4/T/20_rwlock.h 1.1 644)
+++ 17.1(w)/include/asm-i386/rwlock.h Mon, 24 Dec 2001 15:26:16 +1100 kaos (linux-2.4/T/20_rwlock.h 1.1 644)
@@ -17,6 +17,8 @@
#ifndef _ASM_I386_RWLOCK_H
#define _ASM_I386_RWLOCK_H
+#include <linux/stringify.h>
+
#define RW_LOCK_BIAS 0x01000000
#define RW_LOCK_BIAS_STR "0x01000000"
@@ -24,23 +26,29 @@
asm volatile(LOCK "subl $1,(%0)\n\t" \
"js 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2:\tcall " helper "\n\t" \
"jmp 1b\n" \
- ".previous" \
+ ".subsection 0\n" \
::"a" (rw) : "memory")
#define __build_read_lock_const(rw, helper) \
asm volatile(LOCK "subl $1,%0\n\t" \
"js 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2:\tpushl %%eax\n\t" \
"leal %0,%%eax\n\t" \
"call " helper "\n\t" \
"popl %%eax\n\t" \
"jmp 1b\n" \
- ".previous" \
+ ".subsection 0\n" \
:"=m" (*(volatile int *)rw) : : "memory")
#define __build_read_lock(rw, helper) do { \
@@ -54,23 +62,29 @@
asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
"jnz 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2:\tcall " helper "\n\t" \
"jmp 1b\n" \
- ".previous" \
+ ".subsection 0\n" \
::"a" (rw) : "memory")
#define __build_write_lock_const(rw, helper) \
asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
"jnz 2f\n" \
"1:\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2:\tpushl %%eax\n\t" \
"leal %0,%%eax\n\t" \
"call " helper "\n\t" \
"popl %%eax\n\t" \
"jmp 1b\n" \
- ".previous" \
+ ".subsection 0\n" \
:"=m" (*(volatile int *)rw) : : "memory")
#define __build_write_lock(rw, helper) do { \
Index: 17.1/include/asm-i386/spinlock.h
--- 17.1/include/asm-i386/spinlock.h Fri, 26 Oct 2001 15:50:03 +1000 kaos (linux-2.4/T/50_spinlock.h 1.1.2.2 644)
+++ 17.1(w)/include/asm-i386/spinlock.h Mon, 24 Dec 2001 15:34:15 +1100 kaos (linux-2.4/T/50_spinlock.h 1.1.2.2 644)
@@ -5,6 +5,7 @@
#include <asm/rwlock.h>
#include <asm/page.h>
#include <linux/config.h>
+#include <linux/stringify.h>
extern int printk(const char * fmt, ...)
__attribute__ ((format (printf, 1, 2)));
@@ -56,7 +57,10 @@ typedef struct {
"\n1:\t" \
"lock ; decb %0\n\t" \
"js 2f\n" \
- ".section .text.lock,\"ax\"\n" \
+ ".subsection 1\n" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2:\t" \
"cmpb $0,%0\n\t" \
"rep;nop\n\t" \
Index: 17.1/include/asm-i386/softirq.h
--- 17.1/include/asm-i386/softirq.h Sun, 09 Sep 2001 19:22:07 +1000 kaos (linux-2.4/T/51_softirq.h 1.8.1.1 644)
+++ 17.1(w)/include/asm-i386/softirq.h Mon, 24 Dec 2001 15:46:39 +1100 kaos (linux-2.4/T/51_softirq.h 1.8.1.1 644)
@@ -3,6 +3,7 @@
#include <asm/atomic.h>
#include <asm/hardirq.h>
+#include <linux/stringify.h>
#define __cpu_bh_enable(cpu) \
do { barrier(); local_bh_count(cpu)--; } while (0)
@@ -33,12 +34,15 @@ do { \
"jnz 2f;" \
"1:;" \
\
- ".section .text.lock,\"ax\";" \
+ ".subsection 1;" \
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n" \
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n" \
+ ".endif\n" \
"2: pushl %%eax; pushl %%ecx; pushl %%edx;" \
"call %c1;" \
"popl %%edx; popl %%ecx; popl %%eax;" \
"jmp 1b;" \
- ".previous;" \
+ ".subsection 0;" \
\
: /* no output */ \
: "r" (ptr), "i" (do_softirq) \
Index: 17.1/include/asm-i386/semaphore.h
--- 17.1/include/asm-i386/semaphore.h Fri, 14 Sep 2001 12:20:01 +1000 kaos (linux-2.4/U/13_semaphore. 1.1.1.3 644)
+++ 17.1(w)/include/asm-i386/semaphore.h Mon, 24 Dec 2001 15:46:39 +1100 kaos (linux-2.4/U/13_semaphore. 1.1.1.3 644)
@@ -40,6 +40,7 @@
#include <asm/atomic.h>
#include <linux/wait.h>
#include <linux/rwsem.h>
+#include <linux/stringify.h>
struct semaphore {
atomic_t count;
@@ -122,10 +123,13 @@ static inline void down(struct semaphore
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
- ".previous"
+ ".subsection 0\n"
:"=m" (sem->count)
:"c" (sem)
:"memory");
@@ -149,10 +153,13 @@ static inline int down_interruptible(str
"js 2f\n\t"
"xorl %0,%0\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\tcall __down_failed_interruptible\n\t"
"jmp 1b\n"
- ".previous"
+ ".subsection 0\n"
:"=a" (result), "=m" (sem->count)
:"c" (sem)
:"memory");
@@ -177,10 +184,13 @@ static inline int down_trylock(struct se
"js 2f\n\t"
"xorl %0,%0\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\tcall __down_failed_trylock\n\t"
"jmp 1b\n"
- ".previous"
+ ".subsection 0\n"
:"=a" (result), "=m" (sem->count)
:"c" (sem)
:"memory");
@@ -203,10 +213,13 @@ static inline void up(struct semaphore *
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
- ".previous"
+ ".subsection 0\n"
:"=m" (sem->count)
:"c" (sem)
:"memory");
Index: 17.1/arch/i386/vmlinux.lds
--- 17.1/arch/i386/vmlinux.lds Tue, 03 Jul 2001 11:11:12 +1000 kaos (linux-2.4/R/c/35_vmlinux.ld 1.1.4.1 644)
+++ 17.1(w)/arch/i386/vmlinux.lds Mon, 24 Dec 2001 13:01:08 +1100 kaos (linux-2.4/R/c/35_vmlinux.ld 1.1.4.1 644)
@@ -13,7 +13,6 @@ SECTIONS
*(.fixup)
*(.gnu.warning)
} = 0x9090
- .text.lock : { *(.text.lock) } /* out-of-line lock text */
_etext = .; /* End of text section */
Index: 17.1/arch/i386/boot/compressed/Makefile
--- 17.1/arch/i386/boot/compressed/Makefile Fri, 14 Sep 2001 12:20:01 +1000 kaos (linux-2.4/T/c/42_Makefile 1.2 644)
+++ 17.1(w)/arch/i386/boot/compressed/Makefile Mon, 24 Dec 2001 15:28:45 +1100 kaos (linux-2.4/T/c/42_Makefile 1.2 644)
@@ -33,7 +33,7 @@ head.o: head.S
$(CC) $(AFLAGS) -traditional -c head.S
misc.o: misc.c
- $(CC) $(CFLAGS) -c misc.c
+ $(CC) $(CFLAGS) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) -c misc.c
piggy.o: $(SYSTEM)
tmppiggy=_tmp_$$$$piggy; \
Index: 17.1/Rules.make
--- 17.1/Rules.make Wed, 07 Mar 2001 23:04:43 +1100 kaos (linux-2.4/T/c/47_Rules.make 1.1.2.2 644)
+++ 17.1(w)/Rules.make Mon, 24 Dec 2001 15:29:41 +1100 kaos (linux-2.4/T/c/47_Rules.make 1.1.2.2 644)
@@ -31,6 +31,8 @@ unexport subdir-m
unexport subdir-n
unexport subdir-
+comma := ,
+
#
# Get things started.
#
@@ -54,7 +56,7 @@ ALL_SUB_DIRS := $(sort $(subdir-y) $(sub
$(CPP) $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$@) $< > $@
%.o: %.c
- $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$@) -c -o $@ $<
+ $(CC) $(CFLAGS) $(EXTRA_CFLAGS) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) -c -o $@ $<
@ ( \
echo 'ifeq ($(strip $(subst $(comma),:,$(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$@))),$$(strip $$(subst $$(comma),:,$$(CFLAGS) $$(EXTRA_CFLAGS) $$(CFLAGS_$@))))' ; \
echo 'FILES_FLAGS_UP_TO_DATE += $@' ; \
@@ -270,7 +272,7 @@ endif # CONFIG_MODVERSIONS
ifneq "$(strip $(export-objs))" ""
$(export-objs): $(export-objs:.o=.c) $(TOPDIR)/include/linux/modversions.h
- $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$@) -DEXPORT_SYMTAB -c $(@:.o=.c)
+ $(CC) $(CFLAGS) $(EXTRA_CFLAGS) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) -DEXPORT_SYMTAB -c $(@:.o=.c)
@ ( \
echo 'ifeq ($(strip $(subst $(comma),:,$(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$@) -DEXPORT_SYMTAB)),$$(strip $$(subst $$(comma),:,$$(CFLAGS) $$(EXTRA_CFLAGS) $$(CFLAGS_$@) -DEXPORT_SYMTAB)))' ; \
echo 'FILES_FLAGS_UP_TO_DATE += $@' ; \
Index: 17.1/Makefile
--- 17.1/Makefile Sat, 22 Dec 2001 12:56:52 +1100 kaos (linux-2.4/T/c/50_Makefile 1.1.2.15.1.2.2.25.2.2.1.17.1.4.1.29.1.40 644)
+++ 17.1(w)/Makefile Mon, 24 Dec 2001 15:29:22 +1100 kaos (linux-2.4/T/c/50_Makefile 1.1.2.15.1.2.2.25.2.2.1.17.1.4.1.29.1.40 644)
@@ -330,10 +330,10 @@ include/linux/version.h: ./Makefile
@mv -f .ver $@
init/version.o: init/version.c include/linux/compile.h include/config/MARKER
- $(CC) $(CFLAGS) $(CFLAGS_KERNEL) -DUTS_MACHINE='"$(ARCH)"' -c -o init/version.o init/version.c
+ $(CC) $(CFLAGS) $(CFLAGS_KERNEL) -DUTS_MACHINE='"$(ARCH)"' -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) -c -o init/version.o init/version.c
init/main.o: init/main.c include/config/MARKER
- $(CC) $(CFLAGS) $(CFLAGS_KERNEL) $(PROFILING) -c -o $*.o $<
+ $(CC) $(CFLAGS) $(CFLAGS_KERNEL) $(PROFILING) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) -c -o $*.o $<
fs lib mm ipc kernel drivers net: dummy
$(MAKE) CFLAGS="$(CFLAGS) $(CFLAGS_KERNEL)" $(subst $@, _dir_$@, $@)
Index: 17.1/include/asm-i386/rwsem.h
--- 17.1/include/asm-i386/rwsem.h Thu, 26 Apr 2001 12:48:22 +1000 kaos (linux-2.4/K/d/10_rwsem.h 1.5 644)
+++ 17.1(w)/include/asm-i386/rwsem.h Mon, 24 Dec 2001 15:46:39 +1100 kaos (linux-2.4/K/d/10_rwsem.h 1.5 644)
@@ -40,6 +40,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
+#include <linux/stringify.h>
struct rwsem_waiter;
@@ -101,7 +102,10 @@ static inline void __down_read(struct rw
LOCK_PREFIX " incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */
" js 2f\n\t" /* jump if we weren't granted the lock */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\n\t"
" pushl %%ecx\n\t"
" pushl %%edx\n\t"
@@ -109,7 +113,7 @@ LOCK_PREFIX " incl (%%eax)\n\t" /*
" popl %%edx\n\t"
" popl %%ecx\n\t"
" jmp 1b\n"
- ".previous"
+ ".subsection 0\n"
"# ending down_read\n\t"
: "+m"(sem->count)
: "a"(sem)
@@ -130,13 +134,16 @@ LOCK_PREFIX " xadd %0,(%%eax)\n\t"
" testl %0,%0\n\t" /* was the count 0 before? */
" jnz 2f\n\t" /* jump if we weren't granted the lock */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\n\t"
" pushl %%ecx\n\t"
" call rwsem_down_write_failed\n\t"
" popl %%ecx\n\t"
" jmp 1b\n"
- ".previous\n"
+ ".subsection 0\n"
"# ending down_write"
: "+d"(tmp), "+m"(sem->count)
: "a"(sem)
@@ -154,7 +161,10 @@ static inline void __up_read(struct rw_s
LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtracts 1, returns the old value */
" js 2f\n\t" /* jump if the lock is being waited upon */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\n\t"
" decw %%dx\n\t" /* do nothing if still outstanding active readers */
" jnz 1b\n\t"
@@ -162,7 +172,7 @@ LOCK_PREFIX " xadd %%edx,(%%eax)\n
" call rwsem_wake\n\t"
" popl %%ecx\n\t"
" jmp 1b\n"
- ".previous\n"
+ ".subsection 0\n"
"# ending __up_read\n"
: "+m"(sem->count), "+d"(tmp)
: "a"(sem)
@@ -180,7 +190,10 @@ static inline void __up_write(struct rw_
LOCK_PREFIX " xaddl %%edx,(%%eax)\n\t" /* tries to transition 0xffff0001 -> 0x00000000 */
" jnz 2f\n\t" /* jump if the lock is being waited upon */
"1:\n\t"
- ".section .text.lock,\"ax\"\n"
+ ".subsection 1\n"
+ ".ifndef .text.lock." __stringify(KBUILD_BASENAME) "\n"
+ ".text.lock." __stringify(KBUILD_BASENAME) ":\n"
+ ".endif\n"
"2:\n\t"
" decw %%dx\n\t" /* did the active count reduce to 0? */
" jnz 1b\n\t" /* jump back if not */
@@ -188,7 +201,7 @@ LOCK_PREFIX " xaddl %%edx,(%%eax)\n
" call rwsem_wake\n\t"
" popl %%ecx\n\t"
" jmp 1b\n"
- ".previous\n"
+ ".subsection 0\n"
"# ending __up_write\n"
: "+m"(sem->count)
: "a"(sem), "i"(-RWSEM_ACTIVE_WRITE_BIAS)
Index: 17.1/include/linux/stringify.h
--- 17.1/include/linux/stringify.h Mon, 24 Dec 2001 15:49:02 +1100 kaos ()
+++ 17.1(w)/include/linux/stringify.h Mon, 24 Dec 2001 15:48:13 +1100 kaos (linux-2.4/O/f/50_stringify. 644)
@@ -0,0 +1,12 @@
+#ifndef __LINUX_STRINGIFY_H
+#define __LINUX_STRINGIFY_H
+
+/* Indirect stringification. Doing two levels allows the parameter to be a
+ * macro itself. For example, compile with -DFOO=bar, __stringify(FOO)
+ * converts to "bar".
+ */
+
+#define __stringify_1(x) #x
+#define __stringify(x) __stringify_1(x)
+
+#endif /* !__LINUX_STRINGIFY_H */
Keith Owens wrote:
>
> gcc/as generates worse code for the local branches in the out of line
> subsection. With .text.lock we get
>
> 0: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
> 7: f3 90 repz nop
> 9: 7e f5 jle 0 <.text.lock> <=== 2 bytes
> b: e9 ca 01 00 00 jmp 1da <.text.lock+0x1da>
>
> With .subsection 1 it generates
>
> .text.lock.es1371:
> 6387: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
> 638e: f3 90 repz nop
> 6390: 0f 8e f1 ff ff ff jle 6387 <.text.lock.es1371> <=== 6 bytes
> 6396: e9 33 9e ff ff jmp 1ce <set_adc_rate+0x8e>
>
> The inline code is unchanged, it is only the out of line code that is
> bigger. IMHO the subsection difference is a gcc/as bug which should
> not stop us using this fix.
I don't see this. With egcs-1.1.2 and assembler 2.11.90.0.25,
your patch actually (and mysteriously) shrunk the kernel by 500
bytes - the new .text is a little smaller than the sum of the
old .text and .text.lock.
As you say, if the assembler is generating long-form branches for
the `jle' in the below sequence, we have a problem.
#APP
1: lock ; decb timerlist_lock
js 2f
.subsection 1
.ifndef .text.lock.bust_spinlocks
.text.lock.bust_spinlocks:
.endif
2: cmpb $0,timerlist_lock
rep;nop
jle 2b
jmp 1b
.previous
Keith, perhaps you could ship a .s file and a version number to HJ?
-
On Sun, 23 Dec 2001 21:44:32 -0800,
Andrew Morton <[email protected]> wrote:
>Keith Owens wrote:
>> gcc/as generates worse code for the local branches in the out of line
>> subsection. With .text.lock we get
>>
>> 0: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
>> 7: f3 90 repz nop
>> 9: 7e f5 jle 0 <.text.lock> <=== 2 bytes
>> b: e9 ca 01 00 00 jmp 1da <.text.lock+0x1da>
>>
>> With .subsection 1 it generates
>>
>> .text.lock.es1371:
>> 6387: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
>> 638e: f3 90 repz nop
>> 6390: 0f 8e f1 ff ff ff jle 6387 <.text.lock.es1371> <=== 6 bytes
>> 6396: e9 33 9e ff ff jmp 1ce <set_adc_rate+0x8e>
>>
>> The inline code is unchanged, it is only the out of line code that is
>> bigger. IMHO the subsection difference is a gcc/as bug which should
>> not stop us using this fix.
>
>I don't see this. With egcs-1.1.2 and assembler 2.11.90.0.25,
>your patch actually (and mysteriously) shrunk the kernel by 500
>bytes - the new .text is a little smaller than the sum of the
>old .text and .text.lock.
That is what I expected, the new code might be smaller because an intra
section branch can use a small PC relative operand, inter section
branches must use full sized branches, worst case it should generate
the same size code. I was surprised that it generates larger code,
this definitely looks like a gas bug. The system is RH 7.1, according
to Documentation/Changes this version of gcc 2.96 and binutils should
be OK.
NOTE: The patch had one .previous left in include/asm-i386/spinlock.h,
which needs to be changed to .subsection 0\n. But that change
makes no difference.
cd drivers/sound
gcc -v --save-temps -D__KERNEL__ -I../../include -Wall \
-Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer \
-fno-strict-aliasing -fno-common -mpreferred-stack-boundary=2 \
-march=i686 -DMODULE -DKBUILD_BASENAME=es1371 \
-c -o es1371.o es1371.c
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81)
/usr/lib/gcc-lib/i386-redhat-linux/2.96/cpp0 -lang-c -v -I../../include -D__GNUC__=2 -D__GNUC_MINOR__=96 -D__GNUC_PATCHLEVEL__=0 -D__ELF__ -Dunix -Dlinux -D__ELF__ -D__unix__ -D__linux__ -D__unix -D__linux -Asystem(posix) -D__OPTIMIZE__ -Wall -Wstrict-prototypes -Wno-trigraphs -Acpu(i386) -Amachine(i386) -Di386 -D__i386 -D__i386__ -D__pentiumpro -D__pentiumpro__ -D__tune_pentiumpro__ -D__KERNEL__ -DMODULE -DKBUILD_BASENAME=es1371 es1371.c es1371.i
GNU CPP version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) (cpplib) (i386 Linux/ELF)
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
../../include
/usr/lib/gcc-lib/i386-redhat-linux/2.96/include
/usr/include
End of search list.
/usr/lib/gcc-lib/i386-redhat-linux/2.96/cc1 es1371.i -quiet -dumpbase es1371.c -mpreferred-stack-boundary=2 -march=i686 -O2 -Wall -Wstrict-prototypes -Wno-trigraphs -version -fomit-frame-pointer -fno-strict-aliasing -fno-common -o es1371.s
GNU C version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) (i386-redhat-linux) compiled by GNU C version 2.96 20000731 (Red Hat Linux 7.1 2.96-81).
as -V -Qy -o es1371.o es1371.s
GNU assembler version 2.10.91 (i386-redhat-linux) using BFD version 2.10.91.0.2
drivers/sound/es1371.s, first reference to .text.lock
1: lock ; decb 248(%ebp)
js 2f
.subsection 1
.ifndef .text.lock.es1371
.text.lock.es1371:
.endif
2: cmpb $0,248(%ebp)
rep;nop
jle 2b
jmp 1b
.subsection 0
objdump -S drivers/sound/es1371.o
1ce: f0 fe 8d f8 00 00 00 lock decb 0xf8(%ebp)
1d5: 0f 88 ac 61 00 00 js 6387 <.text.lock.es1371>
....
00006387 <.text.lock.es1371>:
6387: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
638e: f3 90 repz nop
6390: 0f 8e f1 ff ff ff jle 6387 <.text.lock.es1371>
6396: e9 33 9e ff ff jmp 1ce <set_adc_rate+0x8e>
jle 2b can be handled by an 8 bit operand but gas is using full 32
bits.
On Sun, 23 Dec 2001, Andrew Morton wrote:
> Kai Germaschewski wrote:
> >
> > asm volatile(LOCK "subl $1,(%0)\n\t" \
> > "js 2f\n" \
> > "1:\n" \
> > - ".section .text.lock,\"ax\"\n" \
> > + ".subsection 1\n" \
> > "2:\tcall " helper "\n\t" \
> > "jmp 1b\n" \
> > ".previous" \
>
> Don't we want `.subsection 0' here, rather than .previous?
Either should be fine.
> Apart from that, it looks like a winner.
;-)
--Kai
On Mon, 24 Dec 2001, Keith Owens wrote:
> >> .text.lock.es1371:
> >> 6387: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
> >> 638e: f3 90 repz nop
> >> 6390: 0f 8e f1 ff ff ff jle 6387 <.text.lock.es1371> <=== 6 bytes
> >> 6396: e9 33 9e ff ff jmp 1ce <set_adc_rate+0x8e>
> >>
> >> The inline code is unchanged, it is only the out of line code that is
> >> bigger. IMHO the subsection difference is a gcc/as bug which should
> >> not stop us using this fix.
> >
> >I don't see this. With egcs-1.1.2 and assembler 2.11.90.0.25,
> >your patch actually (and mysteriously) shrunk the kernel by 500
> >bytes - the new .text is a little smaller than the sum of the
> >old .text and .text.lock.
Another datapoint (RH7.2):
00006397 <.text.lock.es1371>:
6397: 80 bd f8 00 00 00 00 cmpb $0x0,0xf8(%ebp)
639e: f3 90 repz nop
63a0: 7e f5 jle 6397 <.text.lock.es1371>
63a2: e9 27 9e ff ff jmp 1ce <set_adc_rate+0x8e>
[kai@vaio kai]$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)
[kai@vaio kai]$ as -v
GNU assembler version 2.11.90.0.8 (i386-redhat-linux) using BFD version
2.11.90.0.8
[kai@vaio kai]$ rpm -qf `which as`
binutils-2.11.90.0.8-9
> NOTE: The patch had one .previous left in include/asm-i386/spinlock.h,
> which needs to be changed to .subsection 0\n. But that change
> makes no difference.
According to my reading of the docs and also to my testing, .subsection 0
and .previous should be equivalent here. But .subsection 0 may be cleaner.
--Kai
On Mon, 24 Dec 2001 11:05:00 +0100 (CET),
Kai Germaschewski <[email protected]> wrote:
>According to my reading of the docs and also to my testing, .subsection 0
>and .previous should be equivalent here. But .subsection 0 may be cleaner.
As I read the gas info text, .subsection changes the current entry on
top of stack, it does not stack a new section/subsection pair.
.previous swaps the top of stack with the previous top of stack. I
suspect that you would get different results if there was more than one
section on the stack. In any case, .subsection 0 is "obviously correct".