2006-12-18 01:55:25

by Andrew J. Barr

[permalink] [raw]
Subject: BUG on 2.6.20-rc1 when using gdb

When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
trace as the program was running (e.g. I had typed 'run' in gdb):

WARNING at kernel/softirq.c:137 local_bh_enable()
[<c0103cd6>] dump_trace+0x68/0x1d9
[<c0103e5f>] show_trace_log_lvl+0x18/0x2c
[<c01044d3>] show_trace+0xf/0x11
[<c010455e>] dump_stack+0x12/0x14
[<c011cc7d>] local_bh_enable+0x44/0x94
[<c02871b9>] unix_release_sock+0x6e/0x1fe
[<c02887eb>] unix_stream_connect+0x3b4/0x3cf
[<c0232dee>] sys_connect+0x82/0xad
[<c0233641>] sys_socketcall+0xac/0x261
[<c0102d38>] syscall_call+0x7/0xb
[<b7f70822>] 0xb7f70822
=======================
------------[ cut here ]------------
kernel BUG at fs/buffer.c:1235!
invalid opcode: 0000 [#1]
PREEMPT
Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
cpufreq_ondemand cpufreq_performance cpufreq_powersave
speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
ide_core ehci_hcd uhci_hcd usbcore
CPU: 0
EIP: 0060:[<c0179266>] Not tainted VLI
EFLAGS: 00010046 (2.6.20-rc1 #1)
EIP is at __find_get_block+0x1c/0x16f
eax: 00000086 ebx: 00000000 ecx: 00000000 edx: 0088a800
esi: 0088a800 edi: 00000000 ebp: dfffd040 esp: cad2dd30
ds: 007b es: 007b ss: 0068
Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
task.ti=cad2c000)
Stack: cad2dd58 c02caa0b 00000002 0000000e 0000000b 00000001 e8836580
0088a800
00000000 00000000 e8836610 00000000 c01793dc 00001000 c03ab3e0
f3cadd80
00000086 c90d41b0 0088a800 00000000 dfffd040 00008000 00000000
00000002
Call Trace:
[<c01793dc>] __getblk+0x23/0x268
[<f040d4c6>] ext3_getblk+0x10b/0x244 [ext3]
[<f040e364>] ext3_bread+0x19/0x70 [ext3]
[<f04106f3>] dx_probe+0x43/0x2c9 [ext3]
[<f04119b3>] ext3_htree_fill_tree+0x99/0x1ba [ext3]
[<f040ab77>] ext3_readdir+0x1d4/0x5ed [ext3]
[<c0167b29>] vfs_readdir+0x63/0x8d
[<c0167bb6>] sys_getdents64+0x63/0xa5
[<c0102d38>] syscall_call+0x7/0xb
[<b7f70822>] 0xb7f70822
=======================
Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
EIP: [<c0179266>] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30

This happens on 2.6.20-rc1 but not 2.6.19.

Andrew Barr
[email protected]


2006-12-20 00:42:28

by Andrew Morton

[permalink] [raw]
Subject: Re: BUG on 2.6.20-rc1 when using gdb

On Sun, 17 Dec 2006 20:55:18 -0500
"Andrew J. Barr" <[email protected]> wrote:

> When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> trace as the program was running (e.g. I had typed 'run' in gdb):
>
> WARNING at kernel/softirq.c:137 local_bh_enable()
> [<c0103cd6>] dump_trace+0x68/0x1d9
> [<c0103e5f>] show_trace_log_lvl+0x18/0x2c
> [<c01044d3>] show_trace+0xf/0x11
> [<c010455e>] dump_stack+0x12/0x14
> [<c011cc7d>] local_bh_enable+0x44/0x94
> [<c02871b9>] unix_release_sock+0x6e/0x1fe
> [<c02887eb>] unix_stream_connect+0x3b4/0x3cf
> [<c0232dee>] sys_connect+0x82/0xad
> [<c0233641>] sys_socketcall+0xac/0x261
> [<c0102d38>] syscall_call+0x7/0xb
> [<b7f70822>] 0xb7f70822
> =======================
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:1235!
> invalid opcode: 0000 [#1]
> PREEMPT
> Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> cpufreq_ondemand cpufreq_performance cpufreq_powersave
> speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> ide_core ehci_hcd uhci_hcd usbcore
> CPU: 0
> EIP: 0060:[<c0179266>] Not tainted VLI
> EFLAGS: 00010046 (2.6.20-rc1 #1)
> EIP is at __find_get_block+0x1c/0x16f
> eax: 00000086 ebx: 00000000 ecx: 00000000 edx: 0088a800
> esi: 0088a800 edi: 00000000 ebp: dfffd040 esp: cad2dd30
> ds: 007b es: 007b ss: 0068
> Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> task.ti=cad2c000)
> Stack: cad2dd58 c02caa0b 00000002 0000000e 0000000b 00000001 e8836580
> 0088a800
> 00000000 00000000 e8836610 00000000 c01793dc 00001000 c03ab3e0
> f3cadd80
> 00000086 c90d41b0 0088a800 00000000 dfffd040 00008000 00000000
> 00000002
> Call Trace:
> [<c01793dc>] __getblk+0x23/0x268
> [<f040d4c6>] ext3_getblk+0x10b/0x244 [ext3]
> [<f040e364>] ext3_bread+0x19/0x70 [ext3]
> [<f04106f3>] dx_probe+0x43/0x2c9 [ext3]
> [<f04119b3>] ext3_htree_fill_tree+0x99/0x1ba [ext3]
> [<f040ab77>] ext3_readdir+0x1d4/0x5ed [ext3]
> [<c0167b29>] vfs_readdir+0x63/0x8d
> [<c0167bb6>] sys_getdents64+0x63/0xa5
> [<c0102d38>] syscall_call+0x7/0xb
> [<b7f70822>] 0xb7f70822
> =======================
> Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> EIP: [<c0179266>] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
>
> This happens on 2.6.20-rc1 but not 2.6.19.
>

And it's repeatable, yes?

And you're sure that use of gdb triggers it?

Something is forgetting to reenable local interrupts.

2006-12-20 00:53:48

by Dave Airlie

[permalink] [raw]
Subject: Re: BUG on 2.6.20-rc1 when using gdb

On 12/20/06, Andrew Morton <[email protected]> wrote:
> > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> > trace as the program was running (e.g. I had typed 'run' in gdb):
> >
> > WARNING at kernel/softirq.c:137 local_bh_enable()
> > [<c0103cd6>] dump_trace+0x68/0x1d9
> > [<c0103e5f>] show_trace_log_lvl+0x18/0x2c
> > [<c01044d3>] show_trace+0xf/0x11
> > [<c010455e>] dump_stack+0x12/0x14
> > [<c011cc7d>] local_bh_enable+0x44/0x94
> > [<c02871b9>] unix_release_sock+0x6e/0x1fe
> > [<c02887eb>] unix_stream_connect+0x3b4/0x3cf
> > [<c0232dee>] sys_connect+0x82/0xad
> > [<c0233641>] sys_socketcall+0xac/0x261
> > [<c0102d38>] syscall_call+0x7/0xb
> > [<b7f70822>] 0xb7f70822
> > =======================
> > ------------[ cut here ]------------
> > kernel BUG at fs/buffer.c:1235!
> > invalid opcode: 0000 [#1]
> > PREEMPT
> > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> > cpufreq_ondemand cpufreq_performance cpufreq_powersave
> > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> > ide_core ehci_hcd uhci_hcd usbcore
> > CPU: 0
> > EIP: 0060:[<c0179266>] Not tainted VLI
> > EFLAGS: 00010046 (2.6.20-rc1 #1)
> > EIP is at __find_get_block+0x1c/0x16f
> > eax: 00000086 ebx: 00000000 ecx: 00000000 edx: 0088a800
> > esi: 0088a800 edi: 00000000 ebp: dfffd040 esp: cad2dd30
> > ds: 007b es: 007b ss: 0068
> > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> > task.ti=cad2c000)
> > Stack: cad2dd58 c02caa0b 00000002 0000000e 0000000b 00000001 e8836580
> > 0088a800
> > 00000000 00000000 e8836610 00000000 c01793dc 00001000 c03ab3e0
> > f3cadd80
> > 00000086 c90d41b0 0088a800 00000000 dfffd040 00008000 00000000
> > 00000002
> > Call Trace:
> > [<c01793dc>] __getblk+0x23/0x268
> > [<f040d4c6>] ext3_getblk+0x10b/0x244 [ext3]
> > [<f040e364>] ext3_bread+0x19/0x70 [ext3]
> > [<f04106f3>] dx_probe+0x43/0x2c9 [ext3]
> > [<f04119b3>] ext3_htree_fill_tree+0x99/0x1ba [ext3]
> > [<f040ab77>] ext3_readdir+0x1d4/0x5ed [ext3]
> > [<c0167b29>] vfs_readdir+0x63/0x8d
> > [<c0167bb6>] sys_getdents64+0x63/0xa5
> > [<c0102d38>] syscall_call+0x7/0xb
> > [<b7f70822>] 0xb7f70822
> > =======================
> > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> > EIP: [<c0179266>] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
> >
> > This happens on 2.6.20-rc1 but not 2.6.19.
> >
>
> And it's repeatable, yes?
>
> And you're sure that use of gdb triggers it?
>
> Something is forgetting to reenable local interrupts.

I've managed to get nearly the same thing on a test system I built
yesterday, my app when running under gdb would also blow up in
__find_get_block.

I was using close to Linus's git head...

Dave.

2006-12-20 00:54:39

by Dave Airlie

[permalink] [raw]
Subject: Re: BUG on 2.6.20-rc1 when using gdb

On 12/20/06, Dave Airlie <[email protected]> wrote:
> On 12/20/06, Andrew Morton <[email protected]> wrote:
> > > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> > > trace as the program was running (e.g. I had typed 'run' in gdb):
> > >
> > > WARNING at kernel/softirq.c:137 local_bh_enable()
> > > [<c0103cd6>] dump_trace+0x68/0x1d9
> > > [<c0103e5f>] show_trace_log_lvl+0x18/0x2c
> > > [<c01044d3>] show_trace+0xf/0x11
> > > [<c010455e>] dump_stack+0x12/0x14
> > > [<c011cc7d>] local_bh_enable+0x44/0x94
> > > [<c02871b9>] unix_release_sock+0x6e/0x1fe
> > > [<c02887eb>] unix_stream_connect+0x3b4/0x3cf
> > > [<c0232dee>] sys_connect+0x82/0xad
> > > [<c0233641>] sys_socketcall+0xac/0x261
> > > [<c0102d38>] syscall_call+0x7/0xb
> > > [<b7f70822>] 0xb7f70822
> > > =======================
> > > ------------[ cut here ]------------
> > > kernel BUG at fs/buffer.c:1235!
> > > invalid opcode: 0000 [#1]
> > > PREEMPT
> > > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> > > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> > > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> > > cpufreq_ondemand cpufreq_performance cpufreq_powersave
> > > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> > > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> > > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> > > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> > > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> > > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> > > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> > > ide_core ehci_hcd uhci_hcd usbcore
> > > CPU: 0
> > > EIP: 0060:[<c0179266>] Not tainted VLI
> > > EFLAGS: 00010046 (2.6.20-rc1 #1)
> > > EIP is at __find_get_block+0x1c/0x16f
> > > eax: 00000086 ebx: 00000000 ecx: 00000000 edx: 0088a800
> > > esi: 0088a800 edi: 00000000 ebp: dfffd040 esp: cad2dd30
> > > ds: 007b es: 007b ss: 0068
> > > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> > > task.ti=cad2c000)
> > > Stack: cad2dd58 c02caa0b 00000002 0000000e 0000000b 00000001 e8836580
> > > 0088a800
> > > 00000000 00000000 e8836610 00000000 c01793dc 00001000 c03ab3e0
> > > f3cadd80
> > > 00000086 c90d41b0 0088a800 00000000 dfffd040 00008000 00000000
> > > 00000002
> > > Call Trace:
> > > [<c01793dc>] __getblk+0x23/0x268
> > > [<f040d4c6>] ext3_getblk+0x10b/0x244 [ext3]
> > > [<f040e364>] ext3_bread+0x19/0x70 [ext3]
> > > [<f04106f3>] dx_probe+0x43/0x2c9 [ext3]
> > > [<f04119b3>] ext3_htree_fill_tree+0x99/0x1ba [ext3]
> > > [<f040ab77>] ext3_readdir+0x1d4/0x5ed [ext3]
> > > [<c0167b29>] vfs_readdir+0x63/0x8d
> > > [<c0167bb6>] sys_getdents64+0x63/0xa5
> > > [<c0102d38>] syscall_call+0x7/0xb
> > > [<b7f70822>] 0xb7f70822
> > > =======================
> > > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> > > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> > > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> > > EIP: [<c0179266>] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
> > >
> > > This happens on 2.6.20-rc1 but not 2.6.19.
> > >
> >
> > And it's repeatable, yes?
> >
> > And you're sure that use of gdb triggers it?
> >
> > Something is forgetting to reenable local interrupts.
>
> I've managed to get nearly the same thing on a test system I built
> yesterday, my app when running under gdb would also blow up in
> __find_get_block.
>
> I was using close to Linus's git head...

And of course it was on a fresh 32-bit x86 with FC6 on it.

Dave.

2006-12-20 11:21:57

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: BUG on 2.6.20-rc1 when using gdb

Andrew Morton wrote:
> On Sun, 17 Dec 2006 20:55:18 -0500
> "Andrew J. Barr" <[email protected]> wrote:
>
>
>> When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
>> trace as the program was running (e.g. I had typed 'run' in gdb):
>>
>> WARNING at kernel/softirq.c:137 local_bh_enable()
>> [<c0103cd6>] dump_trace+0x68/0x1d9
>> [<c0103e5f>] show_trace_log_lvl+0x18/0x2c
>> [<c01044d3>] show_trace+0xf/0x11
>> [<c010455e>] dump_stack+0x12/0x14
>> [<c011cc7d>] local_bh_enable+0x44/0x94
>> [<c02871b9>] unix_release_sock+0x6e/0x1fe
>> [<c02887eb>] unix_stream_connect+0x3b4/0x3cf
>> [<c0232dee>] sys_connect+0x82/0xad
>> [<c0233641>] sys_socketcall+0xac/0x261
>> [<c0102d38>] syscall_call+0x7/0xb
>> [<b7f70822>] 0xb7f70822
>> =======================
>> ------------[ cut here ]------------
>> kernel BUG at fs/buffer.c:1235!
>> invalid opcode: 0000 [#1]
>> PREEMPT
>> Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
>> exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
>> battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
>> cpufreq_ondemand cpufreq_performance cpufreq_powersave
>> speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
>> snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
>> sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
>> crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
>> serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
>> pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
>> ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
>> ide_core ehci_hcd uhci_hcd usbcore
>> CPU: 0
>> EIP: 0060:[<c0179266>] Not tainted VLI
>> EFLAGS: 00010046 (2.6.20-rc1 #1)
>> EIP is at __find_get_block+0x1c/0x16f
>> eax: 00000086 ebx: 00000000 ecx: 00000000 edx: 0088a800
>> esi: 0088a800 edi: 00000000 ebp: dfffd040 esp: cad2dd30
>> ds: 007b es: 007b ss: 0068
>> Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
>> task.ti=cad2c000)
>> Stack: cad2dd58 c02caa0b 00000002 0000000e 0000000b 00000001 e8836580
>> 0088a800
>> 00000000 00000000 e8836610 00000000 c01793dc 00001000 c03ab3e0
>> f3cadd80
>> 00000086 c90d41b0 0088a800 00000000 dfffd040 00008000 00000000
>> 00000002
>> Call Trace:
>> [<c01793dc>] __getblk+0x23/0x268
>> [<f040d4c6>] ext3_getblk+0x10b/0x244 [ext3]
>> [<f040e364>] ext3_bread+0x19/0x70 [ext3]
>> [<f04106f3>] dx_probe+0x43/0x2c9 [ext3]
>> [<f04119b3>] ext3_htree_fill_tree+0x99/0x1ba [ext3]
>> [<f040ab77>] ext3_readdir+0x1d4/0x5ed [ext3]
>> [<c0167b29>] vfs_readdir+0x63/0x8d
>> [<c0167bb6>] sys_getdents64+0x63/0xa5
>> [<c0102d38>] syscall_call+0x7/0xb
>> [<b7f70822>] 0xb7f70822
>> =======================
>> Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
>> 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
>> eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
>> EIP: [<c0179266>] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
>>
>> This happens on 2.6.20-rc1 but not 2.6.19.
>>
>>
>
> And it's repeatable, yes?
>
> And you're sure that use of gdb triggers it?
>
> Something is forgetting to reenable local interrupts.

"walt" <[email protected]> reported a similar problem which he bisected
down to the PDA changeset which touches ptrace
(66e10a44d724f1464b5e8b5a3eae1e2cbbc2cca6). I haven't managed to repo
the problem, but I guess there's something nasty going on in ptrace -
maybe its screwing up eflags on the stack or something. Need to
double-check all the conversions from kernel<->usermode registers. Hm,
wonder if its fixed with the %gs->%fs conversion patch applied?

J

2006-12-20 18:37:07

by Frederik Deweerdt

[permalink] [raw]
Subject: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

On Wed, Dec 20, 2006 at 03:21:53AM -0800, Jeremy Fitzhardinge wrote:
> "walt" <[email protected]> reported a similar problem which he bisected
> down to the PDA changeset which touches ptrace
> (66e10a44d724f1464b5e8b5a3eae1e2cbbc2cca6). I haven't managed to repo
> the problem, but I guess there's something nasty going on in ptrace -
> maybe its screwing up eflags on the stack or something. Need to
> double-check all the conversions from kernel<->usermode registers. Hm,
> wonder if its fixed with the %gs->%fs conversion patch applied?
>
Hi Jeremy,

Same problems here with 2.6.20-rc1-mm1 (ie with the %gs->%fs patch).
It seems to me that the problem comes from the EFL_OFFSET no longer
beeing accurate.
The following patch fixes the problem for me.

Regards,
Frederik

Signed-off-by: Frederik Deweerdt <[email protected]>

diff --git a/arch/i386/kernel/ptrace.c b/arch/i386/kernel/ptrace.c
index 7f7d830..00d8a5a 100644
--- a/arch/i386/kernel/ptrace.c
+++ b/arch/i386/kernel/ptrace.c
@@ -45,7 +45,7 @@
/*
* Offset of eflags on child stack..
*/
-#define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
+#define EFL_OFFSET ((EFL-1)*4-sizeof(struct pt_regs))

static inline struct pt_regs *get_child_regs(struct task_struct *task)
{

2006-12-20 19:02:22

by Andrew J. Barr

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

On Wed, 2006-12-20 at 18:35 +0000, Frederik Deweerdt wrote:
> On Wed, Dec 20, 2006 at 03:21:53AM -0800, Jeremy Fitzhardinge wrote:
> > "walt" <[email protected]> reported a similar problem which he bisected
> > down to the PDA changeset which touches ptrace
> > (66e10a44d724f1464b5e8b5a3eae1e2cbbc2cca6). I haven't managed to repo
> > the problem, but I guess there's something nasty going on in ptrace -
> > maybe its screwing up eflags on the stack or something. Need to
> > double-check all the conversions from kernel<->usermode registers. Hm,
> > wonder if its fixed with the %gs->%fs conversion patch applied?
> >
> Hi Jeremy,
>
> Same problems here with 2.6.20-rc1-mm1 (ie with the %gs->%fs patch).
> It seems to me that the problem comes from the EFL_OFFSET no longer
> beeing accurate.
> The following patch fixes the problem for me.

Me too. Thanks.

Andrew

> Regards,
> Frederik
>
> Signed-off-by: Frederik Deweerdt <[email protected]>
>
> diff --git a/arch/i386/kernel/ptrace.c b/arch/i386/kernel/ptrace.c
> index 7f7d830..00d8a5a 100644
> --- a/arch/i386/kernel/ptrace.c
> +++ b/arch/i386/kernel/ptrace.c
> @@ -45,7 +45,7 @@
> /*
> * Offset of eflags on child stack..
> */
> -#define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
> +#define EFL_OFFSET ((EFL-1)*4-sizeof(struct pt_regs))
>
> static inline struct pt_regs *get_child_regs(struct task_struct *task)
> {

2006-12-20 19:22:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

Frederik Deweerdt wrote:
> Same problems here with 2.6.20-rc1-mm1 (ie with the %gs->%fs patch).
> It seems to me that the problem comes from the EFL_OFFSET no longer
> beeing accurate.
> The following patch fixes the problem for me.
>

Thanks Frederik; that's exactly the kind of thing I thought it might
be. I wonder if there's some way we can make this more robust
though... Does this work for you? I did a slightly larger cleanup
which should make it less fragile and more comprehensible.

J

diff -r e775f6e42258 arch/i386/kernel/ptrace.c
--- a/arch/i386/kernel/ptrace.c Tue Dec 19 10:32:40 2006 -0800
+++ b/arch/i386/kernel/ptrace.c Wed Dec 20 11:18:56 2006 -0800
@@ -45,7 +45,7 @@
/*
* Offset of eflags on child stack..
*/
-#define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
+#define EFL_OFFSET offsetof(struct pt_regs, eflags)

static inline struct pt_regs *get_child_regs(struct task_struct *task)
{
@@ -54,24 +54,24 @@ static inline struct pt_regs *get_child_
}

/*
- * this routine will get a word off of the processes privileged stack.
- * the offset is how far from the base addr as stored in the TSS.
- * this routine assumes that all the privileged stacks are in our
+ * This routine will get a word off of the processes privileged stack.
+ * the offset is bytes into the pt_regs structure on the stack.
+ * This routine assumes that all the privileged stacks are in our
* data space.
*/
static inline int get_stack_long(struct task_struct *task, int offset)
{
unsigned char *stack;

- stack = (unsigned char *)task->thread.esp0;
+ stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
stack += offset;
return (*((int *)stack));
}

/*
- * this routine will put a word on the processes privileged stack.
- * the offset is how far from the base addr as stored in the TSS.
- * this routine assumes that all the privileged stacks are in our
+ * This routine will put a word on the processes privileged stack.
+ * the offset is bytes into the pt_regs structure on the stack.
+ * This routine assumes that all the privileged stacks are in our
* data space.
*/
static inline int put_stack_long(struct task_struct *task, int offset,
@@ -79,7 +79,7 @@ static inline int put_stack_long(struct
{
unsigned char * stack;

- stack = (unsigned char *) task->thread.esp0;
+ stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
stack += offset;
*(unsigned long *) stack = data;
return 0;
@@ -114,7 +114,7 @@ static int putreg(struct task_struct *ch
}
if (regno > ES*4)
regno -= 1*4;
- put_stack_long(child, regno - sizeof(struct pt_regs), value);
+ put_stack_long(child, regno, value);
return 0;
}

@@ -137,7 +137,6 @@ static unsigned long getreg(struct task_
default:
if (regno > ES*4)
regno -= 1*4;
- regno = regno - sizeof(struct pt_regs);
retval &= get_stack_long(child, regno);
}
return retval;

2006-12-20 20:36:01

by walt

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

Jeremy Fitzhardinge wrote:
> Frederik Deweerdt wrote:
>> Same problems here with 2.6.20-rc1-mm1 (ie with the %gs->%fs patch).
>> It seems to me that the problem comes from the EFL_OFFSET no longer
>> beeing accurate.
>> The following patch fixes the problem for me.
>>
>
> Thanks Frederik; that's exactly the kind of thing I thought it might
> be. I wonder if there's some way we can make this more robust
> though... Does this work for you? I did a slightly larger cleanup
> which should make it less fragile and more comprehensible.
<patch snipped>

Hi Jeremy,

Your patch works fine for me. (I didn't try the first patch, but I
will if anyone wants.) Thanks!

2006-12-20 20:44:06

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

On Wed, Dec 20, 2006 at 11:21:50AM -0800, Jeremy Fitzhardinge wrote:
> Frederik Deweerdt wrote:
> > Same problems here with 2.6.20-rc1-mm1 (ie with the %gs->%fs patch).
> > It seems to me that the problem comes from the EFL_OFFSET no longer
> > beeing accurate.
> > The following patch fixes the problem for me.
> >
>
> Thanks Frederik; that's exactly the kind of thing I thought it might
> be. I wonder if there's some way we can make this more robust
> though... Does this work for you? I did a slightly larger cleanup
> which should make it less fragile and more comprehensible.
>
It works too, thanks. BTW, I wondered if the "case GS:" in getreg() made
sense now?

Frederik
> J
>
> diff -r e775f6e42258 arch/i386/kernel/ptrace.c
> --- a/arch/i386/kernel/ptrace.c Tue Dec 19 10:32:40 2006 -0800
> +++ b/arch/i386/kernel/ptrace.c Wed Dec 20 11:18:56 2006 -0800
> @@ -45,7 +45,7 @@
> /*
> * Offset of eflags on child stack..
> */
> -#define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
> +#define EFL_OFFSET offsetof(struct pt_regs, eflags)
>
> static inline struct pt_regs *get_child_regs(struct task_struct *task)
> {
> @@ -54,24 +54,24 @@ static inline struct pt_regs *get_child_
> }
>
> /*
> - * this routine will get a word off of the processes privileged stack.
> - * the offset is how far from the base addr as stored in the TSS.
> - * this routine assumes that all the privileged stacks are in our
> + * This routine will get a word off of the processes privileged stack.
> + * the offset is bytes into the pt_regs structure on the stack.
> + * This routine assumes that all the privileged stacks are in our
> * data space.
> */
> static inline int get_stack_long(struct task_struct *task, int offset)
> {
> unsigned char *stack;
>
> - stack = (unsigned char *)task->thread.esp0;
> + stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
> stack += offset;
> return (*((int *)stack));
> }
>
> /*
> - * this routine will put a word on the processes privileged stack.
> - * the offset is how far from the base addr as stored in the TSS.
> - * this routine assumes that all the privileged stacks are in our
> + * This routine will put a word on the processes privileged stack.
> + * the offset is bytes into the pt_regs structure on the stack.
> + * This routine assumes that all the privileged stacks are in our
> * data space.
> */
> static inline int put_stack_long(struct task_struct *task, int offset,
> @@ -79,7 +79,7 @@ static inline int put_stack_long(struct
> {
> unsigned char * stack;
>
> - stack = (unsigned char *) task->thread.esp0;
> + stack = (unsigned char *)task->thread.esp0 - sizeof(struct pt_regs);
> stack += offset;
> *(unsigned long *) stack = data;
> return 0;
> @@ -114,7 +114,7 @@ static int putreg(struct task_struct *ch
> }
> if (regno > ES*4)
> regno -= 1*4;
> - put_stack_long(child, regno - sizeof(struct pt_regs), value);
> + put_stack_long(child, regno, value);
> return 0;
> }
>
> @@ -137,7 +137,6 @@ static unsigned long getreg(struct task_
> default:
> if (regno > ES*4)
> regno -= 1*4;
> - regno = regno - sizeof(struct pt_regs);
> retval &= get_stack_long(child, regno);
> }
> return retval;
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-12-20 20:53:38

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

Frederik Deweerdt wrote:
> It works too, thanks. BTW, I wondered if the "case GS:" in getreg() made
> sense now?

Sorry, what do you mean? It looks OK to me, but I'm not sure what
you're referring to.

J

2006-12-20 21:09:23

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: [-mm patch] ptrace: Fix EFL_OFFSET value according to i386 pda changes (was Re: BUG on 2.6.20-rc1 when using gdb)

On Wed, Dec 20, 2006 at 12:53:33PM -0800, Jeremy Fitzhardinge wrote:
> Frederik Deweerdt wrote:
> > It works too, thanks. BTW, I wondered if the "case GS:" in getreg() made
> > sense now?
>
> Sorry, what do you mean? It looks OK to me, but I'm not sure what
> you're referring to.
My bad, that's the code I'm referring to:

121 static unsigned long getreg(struct task_struct *child,
122 unsigned long regno)
[...]
126 switch (regno >> 2) {
127 case GS:
128 retval = child->thread.gs;
129 break;

What seem weird to me is that putreg(GS) will end up putting 'value' in:
child->thread.esp0 - sizeof(struct pt_regs) + (GS - 1)*4
whereas getreg(GS) will return the value of child->thread.gs
I must miss something, but the symetry seemed odd to me.

Regards,
Frederik
>
> J
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>