2008-01-30 11:09:51

by Abel Bernabeu

[permalink] [raw]
Subject: Elf loader crash while zero-filling .bss

----[ The context

I am relatively new into kernel hacking and I am trying to get Linux
running on a ARM based SBC.

I already did my work adding support for the board in my working copy
of Linux 2.6.22.10. The kernel already boots and runs small assembler
programs I've written for testing (with the boot parameter
init=myprogram).

Now I am trying to execute some bigger C applications: in instance
BusyBox. I've chosen the buildroot package in order to produce a small
"distro".

Then I've tried to boot the system using init=/bin/sh but I am getting
a crash while loading this non-toy binary.

And believe me, this is not a problem of EABI/OABI compatibility (I
hope I could cope with that without messing here), seems more serious.

----[ The details

>From the perspective of the user nothing is shown at the console
(literally nothing)... but using the debugger I can see a "data abort"
exception happening at binfmt_elf.c (load_elf_binary).

The offending code is the following:

930 /* Calling set_brk effectively mmaps the pages that we need
931 * for the bss and break sections. We must do this before
932 * mapping in the interpreter, to make sure it doesn't wind
933 * up getting placed where the bss needs to go.
934 */
- 935 retval = set_brk(elf_bss, elf_brk);
- 936 if (retval) {
- 937 send_sig(SIGKILL, current, 0);
938 goto out_free_dentry;
939 }
- 940 if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
- 941 send_sig(SIGSEGV, current, 0);
942 retval = -EFAULT; /* Nobody gets to see this, but.. */
943 goto out_free_dentry;
944 }


Stopping the debugger at the line 935 I can see the values passed to set_brk:
set_brk (0x000a4ec8, 0x000a4ec8).

This allocs exactly one new physical page for the virtual address 0xa4000.

Then, stopping the debugger at line 940 (just before the crash) I see
that elf_bss is 0xa3801 and PAGE_SIZE is 4096, so padzero is going to
to zero-fill the bytes from 0xa3801 to 0xa3fff (2047 bytes).

Tracing instruction by instruction into padzero and it subsequent call
to clear_user leads to a data exception when trying to write 0 the
address 0xa3801 (the first byte!).

I know I should provide more details in order to get some help.

The exact Linux version, gcc version and compiler are the following:

Linux version 2.6.22.10 (abel@debian) (gcc version 4.2.1) #13 Fri
Nov 30 09:49:41 CET 2007
CPU: XScale-PXA255 [69052d06] revision 6 (ARMv5TE)

The sections listing of the offending elf binary is relevant too (I think):

[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
[ 0] NULL 00000000 000000 000000 0
0 0 0
[ 1] .interp PROGBITS 000080d4 0000d4 000014 0 A 0 0 1
[ 2] .hash HASH 000080e8 0000e8 0009e8 4 A
3 0 4
[ 3] .dynsym DYNSYM 00008ad0 000ad0 001710 16 A 4 1 4
[ 4] .dynstr STRTAB 0000a1e0 0021e0 000b69 0 A
0 0 1
[ 5] .rel.dyn REL 0000ad4c 002d4c 000070 8 A 3 0 4
[ 6] .rel.plt REL 0000adbc 002dbc 000ac0 8 A
3 8 4
[ 7] .init PROGBITS 0000b87c 00387c 000018 0 AX 0 0 4
[ 8] .plt PROGBITS 0000b894 003894 001034 4 AX
0 0 4
[ 9] .text PROGBITS 0000c8c8 0048c8 06f12c 0 AX 0 0 4
[10] .fini PROGBITS 0007b9f4 0739f4 000014 0 AX
0 0 4
[11] .rodata PROGBITS 0007ba08 073a08 01f40e 0 A 0 0 8
[12] .eh_frame PROGBITS 0009ae18 092e18 000004 0 A
0 0 4
[13] .ctors PROGBITS 000a3000 093000 000008 0 WA 0 0 4
[14] .dtors PROGBITS 000a3008 093008 000008 0 WA
0 0 4
[15] .jcr PROGBITS 000a3010 093010 000004 0 WA 0 0 4
[16] .dynamic DYNAMIC 000a3014 093014 0000c0 8 WA
4 0 4
[17] .got PROGBITS 000a30d4 0930d4 000574 4 WA 0 0 4
[18] .data PROGBITS 000a3648 093648 0001b9 0 WA
0 0 4
[19] .bss NOBITS 000a3808 093801 0016c0 0 WA 0 0 8
[20] .ARM.attributes SHT_LOPROC+3 00000000 093801 000010 0
0 0 1
[21] .shstrtab STRTAB 00000000 093811 00009b 0 0 0 1

----[ Some questions

I have the following questions:

1) I don't think the assumption in the code comment is valid in my case:

930 /* Calling set_brk effectively mmaps the pages that we need
931 * for the bss and break sections. We must do this before

My .bss section spans from 0xa3801 0xa4ec8 but the page set_brk allocs
goes from 0xa4000 to 0xa4fff.

Then padzero writes to an unallocated area.

Is the assumption in the comment wrong?

2) May this crash have some relation with this (already solved) one?

http://marc.info/?l=linux-kernel&m=109865760703851&w=2

3) I haven't taken a look into the code generated by gcc but someone
in ARM Linux list showed a piece of badly generated code in another
context. Should I try do downgrade the compiler used for my kernel?

Which is the official gcc version for the kernel?

Yours, Abel.


2008-01-31 15:38:57

by Abel Bernabeu

[permalink] [raw]
Subject: Re: Elf loader crash while zero-filling .bss

2008/1/30, Abel Bernabeu <[email protected]>:

> Now I am trying to execute some bigger C applications: in instance
> BusyBox. I've chosen the buildroot package in order to produce a small
> "distro".
>
> Then I've tried to boot the system using init=/bin/sh but I am getting
> a crash while loading this non-toy binary.
>
> And believe me, this is not a problem of EABI/OABI compatibility (I
> hope I could cope with that without messing here), seems more serious.

I would like to add some more info may be useful if you are curious
about the crash.

The ELF binary I try to load (and the whole file system) are available here:

http://www.arrakis.es/~abelbg/root.tar.bz2

The boot log of the kernel follows. This time I tried gcc 4.1.0 to
discard a bad code generation problem but I got the same crash as with
gcc 4.2.1, so is not a gcc bug).

Thanks in advance guys.



Linux version 2.6.22.10m2grp (root@abel) (gcc version 4.1.0) #1 Thu
Jan 31 13:43:35 CET 2008
CPU: XScale-PXA255 [69052d06] revision 6 (ARMv5TE), cr=0000397f
Machine: M2
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
Memory clock: 99.53MHz (*27)
Run Mode clock: 398.13MHz (*4)
Turbo Mode clock: 398.13MHz (*1.0, active)
CPU0: D VIVT undefined 5 cache
CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
Built 1 zonelists. Total pages: 8128
Kernel command line: console=tty0 console=ttyS0,115200n8
memmap=32M@0xa0000000 rw
mtdparts=physmap-flash.0:0x80000(boot),0x780000(kernel),-(fs)
root=mtd2 rootfstype=jffs2 init=/bin/sh
PID hash table entries: 128 (order: 7, 512 bytes)
Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
Memory: 32MB = 32MB total
Memory: 32336KB available (1300K code, 113K data, 8K init)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Time: pxa_timer clocksource has been installed.
JFFS2 version 2.2. (c) 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
ipmi message handler version 39.1
pxa2xx-uart.0: ttyS0 at MMIO 0x40100000 (irq = 15) is a FFUART
pxa2xx-uart.1: ttyS1 at MMIO 0x40200000 (irq = 14) is a BTUART
pxa2xx-uart.2: ttyS2 at MMIO 0x40700000 (irq = 13) is a STUART
pxa2xx-uart.3: ttyS3 at MMIO 0x41600000 (irq = 0) is a HWUART
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: module loaded
physmap platform flash device: 02000000 at 00000000
physmap-flash.0: Found 2 x16 devices at 0x0 in 32-bit bank
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
cfi_cmdset_0001: Erase suspend on write enabled
3 cmdlinepart partitions found on MTD device physmap-flash.0
Creating 3 MTD partitions on "physmap-flash.0":
0x00000000-0x00080000 : "boot"
mtd: Giving out device 0 to boot
0x00080000-0x00800000 : "kernel"
mtd: Giving out device 1 to kernel
0x00800000-0x02000000 : "fs"
mtd: Giving out device 2 to fs
Generic platform RAM MTD, (c) 2004 Simtec Electronics
pxa2xx_udc: version 4-May-2005
gs_bind: Gadget Serial v2.2 bound
gs_module_init: Gadget Serial v2.2 loaded
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
XScale DSP coprocessor detected.
MTDSB: dev_name "mtd2"
MTDSB: mtd%d, mtdnr 2
MTDSB: New superblock for device 2 ("fs")
Empty flash at 0x0000fffc ends at 0x00010000
CLEANMARKER node found at 0x00010000, not first node in block (0x00000000)
Empty flash at 0x0001fffc ends at 0x00020000
CLEANMARKER node found at 0x00020000, not first node in block (0x00000000)
Empty flash at 0x0002fffc ends at 0x00030000
CLEANMARKER node found at 0x00030000, not first node in block (0x00000000)
Empty flash at 0x0004fffc ends at 0x00050000
CLEANMARKER node found at 0x00050000, not first node in block (0x00040000)
Empty flash at 0x0005ffd8 ends at 0x00060000
CLEANMARKER node found at 0x00060000, not first node in block (0x00040000)
Empty flash at 0x0006fffc ends at 0x00070000
CLEANMARKER node found at 0x00070000, not first node in block (0x00040000)
Empty flash at 0x0008fffc ends at 0x00090000
CLEANMARKER node found at 0x00090000, not first node in block (0x00080000)
Empty flash at 0x0009ffe8 ends at 0x000a0000
CLEANMARKER node found at 0x000a0000, not first node in block (0x00080000)
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2cac:
0x0008 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2d88:
0x0004 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2e2c:
0x0002 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2e88:
0x1000 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2e98:
0x0040 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2f08:
0x0010 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2f30:
0x0020 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2f78:
0x0001 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a2f90:
0x0001 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a30e8:
0x4000 instead
Further such events for this erase block will not be printed
Empty flash at 0x000a3800 ends at 0x000a381c
Empty flash at 0x000a3820 ends at 0x000a3838
Empty flash at 0x000a383c ends at 0x000a3840

---- 8< ----- JFFS2 is producing a lot noisy "Empty flash at " messages I ommit

Empty flash at 0x000bff78 ends at 0x000bff84
Empty flash at 0x000bff8c ends at 0x000bffbc
Empty flash at 0x000bffc0 ends at 0x000bfff0
VFS: Mounted root (jffs2 filesystem).
Freeing init memory: 8K
Unhandled fault: section translation fault (0x8f5) at 0x000a3801
Internal error: : 8f5 [#1]
CPU: 0 Not tainted (2.6.22.10m2grp #1)
PC is at __clear_user+0x1c/0x64
LR is at padzero+0x5c/0x70
pc : [<bf0b95a8>] lr : [<bf088f88>] psr: 80000013
sp : c0087db0 ip : 00000001 fp : c0087dc4
r10: 000a3801 r9 : 000a4ec8 r8 : c0087ef8
r7 : 00000000 r6 : c01d6000 r5 : c009e680 r4 : 000a4ec8
r3 : 00000000 r2 : 00000000 r1 : 000007ff r0 : 000a3801
Flags: Nzcv IRQs on FIQs on Mode SVC_32 Segment user
Control: 0000397f Table: a0208000 DAC: 00000015
Process sh (pid: 1, stack limit = 0xc0086260)
Stack: (0xc0087db0 to 0xc0088000)
7da0: 000007ff bf088f88 c0087e9c c0087dc8
7dc0: bf089aa0 bf088f38 00001812 00000002 c0086000 c009e5c0 00000003 00000000
7de0: c01e4720 00008000 00000000 00000001 c01ed260 00000002 000a3801 c01d607c
7e00: c0085cc0 00008000 0009ae1c 000a3000 000a3801 c01d6130 c0087e6c c0087e28
7e20: bf049878 bf048bc0 000200d2 c0011e08 00000010 c002a200 00008000 0009ae1c
7e40: 000a3000 000a3801 00000000 c01d6000 c01d6000 00000000 00000005 00000003
7e60: 00000008 00000007 00000007 c01d6000 00000000 c0012400 c01d6000 c0087ef8
7e80: fffffffe c0086000 c0021e00 00000000 c0087ecc c0087ea0 bf061ef8 bf089258
7ea0: c000a404 c006617b c000d5f8 c01d6000 00000000 c000d684 c000a404 c0087ef8
7ec0: c0087ef4 c0087ed0 bf0637dc bf061e98 c006617b c000d684 c000d5f8 c0087ef8
7ee0: c0086000 00000000 c0087f5c c0087ef8 bf01ae50 bf0636c4 00000000 00000000
7f00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7f20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7f40: c0017124 00000000 bf016dd8 00000000 c0087f6c c0087f60 bf017288 bf01ae1c
7f60: c0087f84 c0087f70 bf017310 bf017274 00000000 c0017128 c0087ff4 c0087f88
7f80: bf008a1c bf0172a0 ea000000 e5d73034 e2844001 e31300ff 00000000 00000000
7fa0: 00000000 c0087fb0 bf017d04 bf0229ec 00000000 00000000 bf0088c4 bf0288f8
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 c0087ff8 bf0288f8 bf0088d0 81c730b0 e2800005
Backtrace:
[<bf088f2c>] (padzero+0x0/0x70) from [<bf089aa0>] (load_elf_binary+0x854/0x192c)
[<bf08924c>] (load_elf_binary+0x0/0x192c) from [<bf061ef8>]
(search_binary_handler+0x6c/0x12c)
[<bf061e8c>] (search_binary_handler+0x0/0x12c) from [<bf0637dc>]
(do_execve+0x124/0x1e4)
[<bf0636b8>] (do_execve+0x0/0x1e4) from [<bf01ae50>] (kernel_execve+0x40/0x88)
[<bf01ae10>] (kernel_execve+0x0/0x88) from [<bf017288>]
(run_init_process+0x20/0x2c)
r7:00000000 r6:bf016dd8 r5:00000000 r4:c0017124
[<bf017268>] (run_init_process+0x0/0x2c) from [<bf017310>] (init_post+0x7c/0xec)
[<bf017294>] (init_post+0x0/0xec) from [<bf008a1c>] (kernel_init+0x158/0x2a0)
r4:c0017128
[<bf0088c4>] (kernel_init+0x0/0x2a0) from [<bf0288f8>] (do_exit+0x0/0x768)
Code: ba00000d e210c003 0a000005 e35c0002 (e4e02001)
Kernel panic - not syncing: Attempted to kill init!

2008-02-08 16:15:10

by Abel Bernabeu

[permalink] [raw]
Subject: Re: Elf loader crash while zero-filling .bss

2008/1/31, Abel Bernabeu <[email protected]>:
> 2008/1/30, Abel Bernabeu <[email protected]>:
>
> > Now I am trying to execute some bigger C applications: in instance
> > BusyBox. I've chosen the buildroot package in order to produce a small
> > "distro".
> >
> > Then I've tried to boot the system using init=/bin/sh but I am getting
> > a crash while loading this non-toy binary.

Hi guys,

Excuse me for posting even another message about the same crash (I am
responding myself anyway :). It is my crash and I will find why
happens! :D

I've found THE COMMON DENOMINATOR present in all the binaries that get
my kernel crash.

The offset of some sections fall in the middle of the .bss section. In
instance, look at the sections 12 (.comment) and 13 (.ARM.atributes).
Both sections are overlapping with 11 (.bss):

[11] .bss NOBITS 0001143c 00143c 000048 0 WA 0 0 4
[12] .comment PROGBITS 00000000 00143c 000094 0 0 0 1
[13] .ARM.attributes SHT_LOPROC+3 00000000 0014d0 000010 0 0 0 1
[14] .shstrtab STRTAB 00000000 0014e0 00007b 0 0 0 1
[15] .symtab SYMTAB 00000000 001804 000c70 16 16 92 4
[16] .strtab STRTAB 00000000 002474 000840 0 0 0 1

Is this fact meaningful??

I don't think this .elf file is correct and you can blame my
cross-compiler building skills... but the kernel should be friendly
enough to tell me about the problem instead of crashing :)

Any ideas?

Yours, Abel.

2008-02-08 16:39:46

by Andreas Schwab

[permalink] [raw]
Subject: Re: Elf loader crash while zero-filling .bss

"Abel Bernabeu" <[email protected]> writes:

> The offset of some sections fall in the middle of the .bss section. In
> instance, look at the sections 12 (.comment) and 13 (.ARM.atributes).
> Both sections are overlapping with 11 (.bss):
>
> [11] .bss NOBITS 0001143c 00143c 000048 0 WA 0 0 4
> [12] .comment PROGBITS 00000000 00143c 000094 0 0 0 1
> [13] .ARM.attributes SHT_LOPROC+3 00000000 0014d0 000010 0 0 0 1
> [14] .shstrtab STRTAB 00000000 0014e0 00007b 0 0 0 1
> [15] .symtab SYMTAB 00000000 001804 000c70 16 16 92 4
> [16] .strtab STRTAB 00000000 002474 000840 0 0 0 1
>
> Is this fact meaningful??

The fifth column is the file offset. Since the .bss section does not
occupy any file space the other sections can be put directly behind it
in the file.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."