2009-09-16 10:29:44

by Bastian Blank

[permalink] [raw]
Subject: BUG: oops in gss_validate on 2.6.31

Hi

Since 2.6.31 my gssapi authenticated nfs oopses.

BUG: unable to handle kernel NULL pointer dereference at 00000010
IP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss]
*pdpt = 0000000001473001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-13/range
Modules linked in: kvm_intel kvm ext4 jbd2 crc16 usb_storage usbhid hid i915 drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap xt_mac ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic xcbc rmd160 sha1_generic hmac crypto_null af_key fuse rpcsec_gss_krb5 auth_rpcgss sunrpc loop acpi_cpufreq arc4 snd_hda_codec_analog ecb snd_hda_intel snd_hda_codec iwl3945 snd_hwdep iwlcore snd_pcm snd_seq snd_timer thinkpad_acpi snd_seq_device nsc_ircc i2c_i801 btusb mac80211 i2c_core serio_raw snd soundcore battery button psmouse processor rng_core snd_page_alloc evdev nvram ac cfg80211 bluetooth irda rfkill crc_ccitt ext3 jbd mbcache sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ahci libata scsi_mod sdhci_pci piix sdhci firewire_ohci firewire_core crc_itu_t ide_core mmc_core led_class uhci_hcd ehci_hcd usbcore nls_base e1000e intel_agp agpgart video output thermal fan thermal_sys [last unloaded: kvm]

Pid: 2025, comm: rpciod/0 Not tainted (2.6.31-trunk-686-bigmem #1) 170255G
EIP: 0060:[<f8dd594a>] EFLAGS: 00010246 CPU: 0
EIP is at gss_validate+0xad/0x175 [auth_rpcgss]
EAX: d5d7e830 EBX: f60f5ef8 ECX: f60f5ee4 EDX: f60f5ef8
ESI: 00000025 EDI: 00000000 EBP: cdc30bc0 ESP: f60f5edc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process rpciod/0 (pid: 2025, ti=f60f4000 task=f6685ae0 task.ti=f60f4000)
Stack:
f5c512c0 d5d7e830 00000025 d5d7e830 f60f5ef4 00000004 9cd00000 f60f5ef4
<0> 00000004 00000000 00000000 00000000 00000001 00000000 f847888c 00000004
<0> 00000004 be91f5c4 cdc30bc0 f5c512c0 d5d7e828 f3c807f8 f8d9de34 be91f5c4
Call Trace:
[<f8d9de34>] ? rpcauth_checkverf+0x4a/0x60 [sunrpc]
[<f8d972a0>] ? call_decode+0x30f/0x5de [sunrpc]
[<f8d96199>] ? rpcproc_decode_null+0x0/0x21 [sunrpc]
[<f8d9d246>] ? __rpc_execute+0x76/0x21e [sunrpc]
[<c10528b6>] ? worker_thread+0x146/0x1d9
[<f8d9d473>] ? rpc_async_schedule+0x0/0x29 [sunrpc]
[<c105710f>] ? autoremove_wake_function+0x0/0x4f
[<c1052770>] ? worker_thread+0x0/0x1d9
[<c1056d7f>] ? kthread+0x7a/0x7f
[<c1056d05>] ? kthread+0x0/0x7f
[<c1009d07>] ? kernel_thread_helper+0x7/0x10
Code: 24 18 89 da 89 44 24 10 8d 44 24 10 c7 44 24 14 04 00 00 00 e8 a4 f0 fc ff 89 da 8b 44 24 04 89 74 24 08 8d 4c 24 08 89 44 24 0c <8b> 47 10 e8 3c 16 00 00 3d 00 00 0c 00 89 c2 75 0a 8d 45 28 f0
EIP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss] SS:ESP 0068:f60f5edc
CR2: 0000000000000010
---[ end trace 92895856d62132dd ]---

I saw this two times in the last days. Always under load. I've never
seen this with 2.6.30. The server is a 2.6.30 machine.

Bastian

--
Without followers, evil cannot spread.
-- Spock, "And The Children Shall Lead", stardate 5029.5


2009-09-16 12:30:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: BUG: oops in gss_validate on 2.6.31

On Wed, 2009-09-16 at 12:29 +0200, Bastian Blank wrote:
> Hi
>
> Since 2.6.31 my gssapi authenticated nfs oopses.
>
> BUG: unable to handle kernel NULL pointer dereference at 00000010
> IP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss]
> *pdpt = 0000000001473001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/virtual/block/dm-13/range
> Modules linked in: kvm_intel kvm ext4 jbd2 crc16 usb_storage usbhid hid i915 drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap xt_mac ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic xcbc rmd160 sha1_generic hmac crypto_null af_key fuse rpcsec_gss_krb5 auth_rpcgss sunrpc loop acpi_cpufreq arc4 snd_hda_codec_analog ecb snd_hda_intel snd_hda_codec iwl3945 snd_hwdep iwlcore snd_pcm snd_seq snd_timer thinkpad_acpi snd_seq_device nsc_ircc i2c_i801 btusb mac80211 i2c_core serio_raw snd soundcore battery button psmouse processor rng_core snd_page_alloc evdev nvram ac cfg80211 bluetooth irda rfkill crc_ccitt ext3 jbd mbcache sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ahci libata scsi_mod sdhci_pci piix sdhci firewire_ohci firewire_core crc_itu_t ide_core mmc_core led_class uhci_hcd ehci_hcd usbcore nls_base e1000e intel_agp agpgart video output thermal fan thermal_sys [last unloaded: kvm]
>
> Pid: 2025, comm: rpciod/0 Not tainted (2.6.31-trunk-686-bigmem #1) 170255G
> EIP: 0060:[<f8dd594a>] EFLAGS: 00010246 CPU: 0
> EIP is at gss_validate+0xad/0x175 [auth_rpcgss]
> EAX: d5d7e830 EBX: f60f5ef8 ECX: f60f5ee4 EDX: f60f5ef8
> ESI: 00000025 EDI: 00000000 EBP: cdc30bc0 ESP: f60f5edc
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process rpciod/0 (pid: 2025, ti=f60f4000 task=f6685ae0 task.ti=f60f4000)
> Stack:
> f5c512c0 d5d7e830 00000025 d5d7e830 f60f5ef4 00000004 9cd00000 f60f5ef4
> <0> 00000004 00000000 00000000 00000000 00000001 00000000 f847888c 00000004
> <0> 00000004 be91f5c4 cdc30bc0 f5c512c0 d5d7e828 f3c807f8 f8d9de34 be91f5c4
> Call Trace:
> [<f8d9de34>] ? rpcauth_checkverf+0x4a/0x60 [sunrpc]
> [<f8d972a0>] ? call_decode+0x30f/0x5de [sunrpc]
> [<f8d96199>] ? rpcproc_decode_null+0x0/0x21 [sunrpc]
> [<f8d9d246>] ? __rpc_execute+0x76/0x21e [sunrpc]
> [<c10528b6>] ? worker_thread+0x146/0x1d9
> [<f8d9d473>] ? rpc_async_schedule+0x0/0x29 [sunrpc]
> [<c105710f>] ? autoremove_wake_function+0x0/0x4f
> [<c1052770>] ? worker_thread+0x0/0x1d9
> [<c1056d7f>] ? kthread+0x7a/0x7f
> [<c1056d05>] ? kthread+0x0/0x7f
> [<c1009d07>] ? kernel_thread_helper+0x7/0x10
> Code: 24 18 89 da 89 44 24 10 8d 44 24 10 c7 44 24 14 04 00 00 00 e8 a4 f0 fc ff 89 da 8b 44 24 04 89 74 24 08 8d 4c 24 08 89 44 24 0c <8b> 47 10 e8 3c 16 00 00 3d 00 00 0c 00 89 c2 75 0a 8d 45 28 f0
> EIP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss] SS:ESP 0068:f60f5edc
> CR2: 0000000000000010
> ---[ end trace 92895856d62132dd ]---
>
> I saw this two times in the last days. Always under load. I've never
> seen this with 2.6.30. The server is a 2.6.30 machine.

Hmm... I don't see any obvious candidates in the changelog. My only
guess is that something is amiss after the merge of the nfsv4.1
backchannel code.

Would you be able to do a git bisect in order to finger the culprit?

Cheers
Trond

2009-09-16 12:47:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: BUG: oops in gss_validate on 2.6.31

On Thu, 2009-09-17 at 15:39 +0300, Aioanei Rares wrote:
> Trond Myklebust wrote:
> > On Wed, 2009-09-16 at 12:29 +0200, Bastian Blank wrote:
> >
> >> Hi
> >>
> >> Since 2.6.31 my gssapi authenticated nfs oopses.
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at 00000010
> >> IP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss]
> >> *pdpt = 0000000001473001 *pde = 0000000000000000
> >> Oops: 0000 [#1] SMP
> >> last sysfs file: /sys/devices/virtual/block/dm-13/range
> >> Modules linked in: kvm_intel kvm ext4 jbd2 crc16 usb_storage usbhid hid i915 drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap xt_mac ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic xcbc rmd160 sha1_generic hmac crypto_null af_key fuse rpcsec_gss_krb5 auth_rpcgss sunrpc loop acpi_cpufreq arc4 snd_hda_codec_analog ecb snd_hda_intel snd_hda_codec iwl3945 snd_hwdep iwlcore snd_pcm snd_seq snd_timer thinkpad_acpi snd_seq_device nsc_ircc i2c_i801 btusb mac80211 i2c_core serio_raw snd soundcore battery button psmouse processor rng_core snd_page_alloc evdev nvram ac cfg80211 bluetooth irda rfkill crc_ccitt ext3 jbd mbcache sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ahci libata scsi_mod sdhci_pci piix sdhci firewire_ohci firewire_core crc_itu_t ide_core mmc_core led_class uhci_hcd ehci_hcd usbcore nls_base e1000e intel_agp agpgart video output thermal fan thermal_sys [last unloaded: kvm]
> >>
> >> Pid: 2025, comm: rpciod/0 Not tainted (2.6.31-trunk-686-bigmem #1) 170255G
> >> EIP: 0060:[<f8dd594a>] EFLAGS: 00010246 CPU: 0
> >> EIP is at gss_validate+0xad/0x175 [auth_rpcgss]
> >> EAX: d5d7e830 EBX: f60f5ef8 ECX: f60f5ee4 EDX: f60f5ef8
> >> ESI: 00000025 EDI: 00000000 EBP: cdc30bc0 ESP: f60f5edc
> >> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >> Process rpciod/0 (pid: 2025, ti=f60f4000 task=f6685ae0 task.ti=f60f4000)
> >> Stack:
> >> f5c512c0 d5d7e830 00000025 d5d7e830 f60f5ef4 00000004 9cd00000 f60f5ef4
> >> <0> 00000004 00000000 00000000 00000000 00000001 00000000 f847888c 00000004
> >> <0> 00000004 be91f5c4 cdc30bc0 f5c512c0 d5d7e828 f3c807f8 f8d9de34 be91f5c4
> >> Call Trace:
> >> [<f8d9de34>] ? rpcauth_checkverf+0x4a/0x60 [sunrpc]
> >> [<f8d972a0>] ? call_decode+0x30f/0x5de [sunrpc]
> >> [<f8d96199>] ? rpcproc_decode_null+0x0/0x21 [sunrpc]
> >> [<f8d9d246>] ? __rpc_execute+0x76/0x21e [sunrpc]
> >> [<c10528b6>] ? worker_thread+0x146/0x1d9
> >> [<f8d9d473>] ? rpc_async_schedule+0x0/0x29 [sunrpc]
> >> [<c105710f>] ? autoremove_wake_function+0x0/0x4f
> >> [<c1052770>] ? worker_thread+0x0/0x1d9
> >> [<c1056d7f>] ? kthread+0x7a/0x7f
> >> [<c1056d05>] ? kthread+0x0/0x7f
> >> [<c1009d07>] ? kernel_thread_helper+0x7/0x10
> >> Code: 24 18 89 da 89 44 24 10 8d 44 24 10 c7 44 24 14 04 00 00 00 e8 a4 f0 fc ff 89 da 8b 44 24 04 89 74 24 08 8d 4c 24 08 89 44 24 0c <8b> 47 10 e8 3c 16 00 00 3d 00 00 0c 00 89 c2 75 0a 8d 45 28 f0
> >> EIP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss] SS:ESP 0068:f60f5edc
> >> CR2: 0000000000000010
> >> ---[ end trace 92895856d62132dd ]---
> >>
> >> I saw this two times in the last days. Always under load. I've never
> >> seen this with 2.6.30. The server is a 2.6.30 machine.
> >>
> >
> > Hmm... I don't see any obvious candidates in the changelog. My only
> > guess is that something is amiss after the merge of the nfsv4.1
> > backchannel code.
> >
> > Would you be able to do a git bisect in order to finger the culprit?
> >
> > Cheers
> > Trond
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> Guess i'll have to look into some manuals, since I'm not my git-fu is
> weak, and I'll get back to you. Meanwhile I'll test my .config with a
> release kernel.

I believe that starting with something along the lines of

git bisect start v2.6.31 v2.6.30 -- net/sunrpc include/linux/sunrpc

should be the most efficient thing to do. Then use 'git bisect bad' and
'git bisect good' to label the resulting kernels as bad or good.

Cheers
Trond

2009-09-16 12:15:28

by Aioanei Rares

[permalink] [raw]
Subject: BUG : drivers/staging/comedi/drivers/cb_pcidio.o linking error

Trying to compile the latest git kernel pulled today (16.09) greeted me
with this error :

BUILD arch/x86/boot/bzImage
Root device is (8, 2)
Setup is 13996 bytes (padded to 14336 bytes).
System is 2381 kB
CRC bfd3396b
Kernel: arch/x86/boot/bzImage is ready (#7)
Building modules, stage 2.
MODPOST 1250 modules
WARNING: drivers/staging/comedi/drivers/cb_pcidio.o(.text+0xe5): Section
mismatch in reference from the function pcidio_attach() to the variable
.devinit.rodata:pcidio_pci_table
The function pcidio_attach() references
the variable __devinitconst pcidio_pci_table.
This is often because pcidio_attach lacks a __devinitconst
annotation or the annotation of pcidio_pci_table is wrong.

ERROR: "per_cpu__cpu_llc_id" [drivers/edac/edac_core.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

I was compiling with make CONFIG_DEBUG_SECTION_MISMATCH=y all
on a Debian testing/unstable system with gcc arares@debian:~/linux-2.6$
gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.4-2'
--with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--enable-shared --enable-multiarch --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-mpfr --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.4 (Debian 4.3.4-2)

and make arares@debian:~/linux-2.6$ make -v
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

binutils :

arares@debian:~/linux-2.6$ as -v
GNU assembler version 2.19.91 (x86_64-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.19.91.20090910

arares@debian:~/linux-2.6$ ld -v
GNU ld (GNU Binutils for Debian) 2.19.91.20090910

...and a fully updated system.

2009-09-16 12:38:58

by Aioanei Rares

[permalink] [raw]
Subject: Re: BUG: oops in gss_validate on 2.6.31

Trond Myklebust wrote:
> On Wed, 2009-09-16 at 12:29 +0200, Bastian Blank wrote:
>
>> Hi
>>
>> Since 2.6.31 my gssapi authenticated nfs oopses.
>>
>> BUG: unable to handle kernel NULL pointer dereference at 00000010
>> IP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss]
>> *pdpt = 0000000001473001 *pde = 0000000000000000
>> Oops: 0000 [#1] SMP
>> last sysfs file: /sys/devices/virtual/block/dm-13/range
>> Modules linked in: kvm_intel kvm ext4 jbd2 crc16 usb_storage usbhid hid i915 drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap xt_mac ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic xcbc rmd160 sha1_generic hmac crypto_null af_key fuse rpcsec_gss_krb5 auth_rpcgss sunrpc loop acpi_cpufreq arc4 snd_hda_codec_analog ecb snd_hda_intel snd_hda_codec iwl3945 snd_hwdep iwlcore snd_pcm snd_seq snd_timer thinkpad_acpi snd_seq_device nsc_ircc i2c_i801 btusb mac80211 i2c_core serio_raw snd soundcore battery button psmouse processor rng_core snd_page_alloc evdev nvram ac cfg80211 bluetooth irda rfkill crc_ccitt ext3 jbd mbcache sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ahci libata scsi_mod sdhci_pci piix sdhci firewire_ohci firewire_core crc_itu_t ide_core mmc_core led_class uhci_hcd ehci_hcd usbcore nls_base e1000e intel_agp agpgart video output thermal fan thermal_sys [last unloaded: kvm]
>>
>> Pid: 2025, comm: rpciod/0 Not tainted (2.6.31-trunk-686-bigmem #1) 170255G
>> EIP: 0060:[<f8dd594a>] EFLAGS: 00010246 CPU: 0
>> EIP is at gss_validate+0xad/0x175 [auth_rpcgss]
>> EAX: d5d7e830 EBX: f60f5ef8 ECX: f60f5ee4 EDX: f60f5ef8
>> ESI: 00000025 EDI: 00000000 EBP: cdc30bc0 ESP: f60f5edc
>> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> Process rpciod/0 (pid: 2025, ti=f60f4000 task=f6685ae0 task.ti=f60f4000)
>> Stack:
>> f5c512c0 d5d7e830 00000025 d5d7e830 f60f5ef4 00000004 9cd00000 f60f5ef4
>> <0> 00000004 00000000 00000000 00000000 00000001 00000000 f847888c 00000004
>> <0> 00000004 be91f5c4 cdc30bc0 f5c512c0 d5d7e828 f3c807f8 f8d9de34 be91f5c4
>> Call Trace:
>> [<f8d9de34>] ? rpcauth_checkverf+0x4a/0x60 [sunrpc]
>> [<f8d972a0>] ? call_decode+0x30f/0x5de [sunrpc]
>> [<f8d96199>] ? rpcproc_decode_null+0x0/0x21 [sunrpc]
>> [<f8d9d246>] ? __rpc_execute+0x76/0x21e [sunrpc]
>> [<c10528b6>] ? worker_thread+0x146/0x1d9
>> [<f8d9d473>] ? rpc_async_schedule+0x0/0x29 [sunrpc]
>> [<c105710f>] ? autoremove_wake_function+0x0/0x4f
>> [<c1052770>] ? worker_thread+0x0/0x1d9
>> [<c1056d7f>] ? kthread+0x7a/0x7f
>> [<c1056d05>] ? kthread+0x0/0x7f
>> [<c1009d07>] ? kernel_thread_helper+0x7/0x10
>> Code: 24 18 89 da 89 44 24 10 8d 44 24 10 c7 44 24 14 04 00 00 00 e8 a4 f0 fc ff 89 da 8b 44 24 04 89 74 24 08 8d 4c 24 08 89 44 24 0c <8b> 47 10 e8 3c 16 00 00 3d 00 00 0c 00 89 c2 75 0a 8d 45 28 f0
>> EIP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss] SS:ESP 0068:f60f5edc
>> CR2: 0000000000000010
>> ---[ end trace 92895856d62132dd ]---
>>
>> I saw this two times in the last days. Always under load. I've never
>> seen this with 2.6.30. The server is a 2.6.30 machine.
>>
>
> Hmm... I don't see any obvious candidates in the changelog. My only
> guess is that something is amiss after the merge of the nfsv4.1
> backchannel code.
>
> Would you be able to do a git bisect in order to finger the culprit?
>
> Cheers
> Trond
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Guess i'll have to look into some manuals, since I'm not my git-fu is
weak, and I'll get back to you. Meanwhile I'll test my .config with a
release kernel.

Thanks,

2009-09-16 12:48:32

by Aioanei Rares

[permalink] [raw]
Subject: Re: BUG: oops in gss_validate on 2.6.31

Trond Myklebust wrote:
> On Thu, 2009-09-17 at 15:39 +0300, Aioanei Rares wrote:
>
>> Trond Myklebust wrote:
>>
>>> On Wed, 2009-09-16 at 12:29 +0200, Bastian Blank wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> Since 2.6.31 my gssapi authenticated nfs oopses.
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 00000010
>>>> IP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss]
>>>> *pdpt = 0000000001473001 *pde = 0000000000000000
>>>> Oops: 0000 [#1] SMP
>>>> last sysfs file: /sys/devices/virtual/block/dm-13/range
>>>> Modules linked in: kvm_intel kvm ext4 jbd2 crc16 usb_storage usbhid hid i915 drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap xt_mac ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic xcbc rmd160 sha1_generic hmac crypto_null af_key fuse rpcsec_gss_krb5 auth_rpcgss sunrpc loop acpi_cpufreq arc4 snd_hda_codec_analog ecb snd_hda_intel snd_hda_codec iwl3945 snd_hwdep iwlcore snd_pcm snd_seq snd_timer thinkpad_acpi snd_seq_device nsc_ircc i2c_i801 btusb mac80211 i2c_core serio_raw snd soundcore battery button psmouse processor rng_core snd_page_alloc evdev nvram ac cfg80211 bluetooth irda rfkill crc_ccitt ext3 jbd mbcache sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ahci libata scsi_mod sdhci_pci piix sdhci firewire_ohci firewire_core crc_itu_t ide_core mmc_core led_class uhci_hcd ehci_hcd usbcore nls_base e1000e intel_agp agpgart video output thermal fan thermal_sys [last unloaded: kvm]
>>>>
>>>> Pid: 2025, comm: rpciod/0 Not tainted (2.6.31-trunk-686-bigmem #1) 170255G
>>>> EIP: 0060:[<f8dd594a>] EFLAGS: 00010246 CPU: 0
>>>> EIP is at gss_validate+0xad/0x175 [auth_rpcgss]
>>>> EAX: d5d7e830 EBX: f60f5ef8 ECX: f60f5ee4 EDX: f60f5ef8
>>>> ESI: 00000025 EDI: 00000000 EBP: cdc30bc0 ESP: f60f5edc
>>>> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>> Process rpciod/0 (pid: 2025, ti=f60f4000 task=f6685ae0 task.ti=f60f4000)
>>>> Stack:
>>>> f5c512c0 d5d7e830 00000025 d5d7e830 f60f5ef4 00000004 9cd00000 f60f5ef4
>>>> <0> 00000004 00000000 00000000 00000000 00000001 00000000 f847888c 00000004
>>>> <0> 00000004 be91f5c4 cdc30bc0 f5c512c0 d5d7e828 f3c807f8 f8d9de34 be91f5c4
>>>> Call Trace:
>>>> [<f8d9de34>] ? rpcauth_checkverf+0x4a/0x60 [sunrpc]
>>>> [<f8d972a0>] ? call_decode+0x30f/0x5de [sunrpc]
>>>> [<f8d96199>] ? rpcproc_decode_null+0x0/0x21 [sunrpc]
>>>> [<f8d9d246>] ? __rpc_execute+0x76/0x21e [sunrpc]
>>>> [<c10528b6>] ? worker_thread+0x146/0x1d9
>>>> [<f8d9d473>] ? rpc_async_schedule+0x0/0x29 [sunrpc]
>>>> [<c105710f>] ? autoremove_wake_function+0x0/0x4f
>>>> [<c1052770>] ? worker_thread+0x0/0x1d9
>>>> [<c1056d7f>] ? kthread+0x7a/0x7f
>>>> [<c1056d05>] ? kthread+0x0/0x7f
>>>> [<c1009d07>] ? kernel_thread_helper+0x7/0x10
>>>> Code: 24 18 89 da 89 44 24 10 8d 44 24 10 c7 44 24 14 04 00 00 00 e8 a4 f0 fc ff 89 da 8b 44 24 04 89 74 24 08 8d 4c 24 08 89 44 24 0c <8b> 47 10 e8 3c 16 00 00 3d 00 00 0c 00 89 c2 75 0a 8d 45 28 f0
>>>> EIP: [<f8dd594a>] gss_validate+0xad/0x175 [auth_rpcgss] SS:ESP 0068:f60f5edc
>>>> CR2: 0000000000000010
>>>> ---[ end trace 92895856d62132dd ]---
>>>>
>>>> I saw this two times in the last days. Always under load. I've never
>>>> seen this with 2.6.30. The server is a 2.6.30 machine.
>>>>
>>>>
>>> Hmm... I don't see any obvious candidates in the changelog. My only
>>> guess is that something is amiss after the merge of the nfsv4.1
>>> backchannel code.
>>>
>>> Would you be able to do a git bisect in order to finger the culprit?
>>>
>>> Cheers
>>> Trond
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>>
>>>
>> Guess i'll have to look into some manuals, since I'm not my git-fu is
>> weak, and I'll get back to you. Meanwhile I'll test my .config with a
>> release kernel.
>>
>
> I believe that starting with something along the lines of
>
> git bisect start v2.6.31 v2.6.30 -- net/sunrpc include/linux/sunrpc
>
> should be the most efficient thing to do. Then use 'git bisect bad' and
> 'git bisect good' to label the resulting kernels as bad or good.
>
> Cheers
> Trond
>
>
>
Thanks a whole lot :-) Will keep you posted.