2007-11-30 13:03:44

by Vincent Fortier

[permalink] [raw]
Subject: 2.6.22.14 oops msg with commvault galaxy ?

Hi all,

I'm using a 2.6.22.14 + CFS v24 and I got theses errors when starting up
my commvault galaxy client... Do anybody know what this could mean?

Message from syslogd@printemps at Fri Nov 30 12:54:57 2007 ...
printemps kernel: [750078.538268] Oops: 0000 [#1]
printemps kernel: [750078.538284] SMP
printemps kernel: [750078.538528] CPU: 2
printemps kernel: [750078.538529] EIP: 0060:[<c01d915a>] Not
tainted VLI
printemps kernel: [750078.538530] EFLAGS: 00010297
(2.6.22.14-cfs-etch-686-envcan #1)
printemps kernel: [750078.538580] EIP is at vsnprintf+0x2af/0x48c
printemps kernel: [750078.538597] eax: 80000000 ebx: ffffffff ecx:
80000000 edx: fffffffe
printemps kernel: [750078.538618] esi: e4a85017 edi: cf07feac ebp:
ffffffff esp: cf07fe4c
printemps kernel: [750078.538637] ds: 007b es: 007b fs: 00d8 gs:
0033 ss: 0068
printemps kernel: [750078.538656] Process clBackup (pid: 29277,
ti=cf07e000 task=f6d9f8c0 task.ti=cf07e000)
printemps kernel: [750078.538676] Stack: e4834000 00001000 c033b638
f89e056c c02360f1 e4834000 1b57afe8 e4a85017
printemps kernel: [750078.538721] 00ef2608 00000000 ffffffff
00000000 ffffffff c0337eab 00000003 00000017
printemps kernel: [750078.538767] c037a340 e4834000 c01d93b8
cf07feac cf07feac c023566c e4a85017 c0337eaa
printemps kernel: [750078.538810] Call Trace:
printemps kernel: [750078.538839] [<c02360f1>] dev_uevent+0x189/0x1e0
printemps kernel: [750078.538864] [<c01d93b8>] sprintf+0x20/0x23
printemps kernel: [750078.538885] [<c023566c>] show_uevent+0xad/0xd5
printemps kernel: [750078.538907] [<c01571d1>] get_page_from_freelist
+0x273/0x30a
printemps kernel: [750078.538933] [<c01323fc>] group_send_sig_info
+0x12/0x56
printemps kernel: [750078.538956] [<c01572ba>] __alloc_pages+0x52/0x286
printemps kernel: [750078.538984] [<c02355bf>] show_uevent+0x0/0xd5
printemps kernel: [750078.539006] [<c023517e>] dev_attr_show+0x15/0x18
printemps kernel: [750078.539027] [<c01a8e9f>] sysfs_read_file
+0x87/0xd8
printemps kernel: [750078.539048] [<c01880bc>] sys_getxattr+0x46/0x4e
printemps kernel: [750078.539071] [<c01a8e18>] sysfs_read_file+0x0/0xd8
printemps kernel: [750078.539092] [<c0171fb7>] vfs_read+0xa6/0x128
printemps kernel: [750078.539115] [<c01723b3>] sys_read+0x41/0x67
printemps kernel: [750078.539137] [<c0103d8a>] syscall_call+0x7/0xb
printemps kernel: [750078.539162] =======================
printemps kernel: [750078.539177] Code: 74 24 28 73 03 c6 06 20 4d 46 85
ed 7f f1 e9 b9 00 00 00 8b 0f b8 39 0a 33 c0 8b 54 24 30 81 f9 ff 0f 00
00 0f 46 c8 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6
44 24 2c 10 89 c3
printemps kernel: [750078.539346] EIP: [<c01d915a>] vsnprintf
+0x2af/0x48c SS:ESP 0068:cf07fe4c


thnx very much!

- vin


2007-11-30 17:13:53

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Fri, 30 Nov 2007 13:02:54 +0000 Vincent Fortier wrote:

> Hi all,
>
> I'm using a 2.6.22.14 + CFS v24 and I got theses errors when starting up
> my commvault galaxy client... Do anybody know what this could mean?

Can you provide a few lines of syslog before the Oops: line,
which should contain some info about what happened, e.g.:

Unable to handle kernel paging request at virtual address e4a85017
printing eip:
c01d915a
*pde = 37d0d067
*pte = 00000000

> Message from syslogd@printemps at Fri Nov 30 12:54:57 2007 ...
> printemps kernel: [750078.538268] Oops: 0000 [#1]
> printemps kernel: [750078.538284] SMP
> printemps kernel: [750078.538528] CPU: 2
> printemps kernel: [750078.538529] EIP: 0060:[<c01d915a>] Not
> tainted VLI
> printemps kernel: [750078.538530] EFLAGS: 00010297
> (2.6.22.14-cfs-etch-686-envcan #1)
> printemps kernel: [750078.538580] EIP is at vsnprintf+0x2af/0x48c
> printemps kernel: [750078.538597] eax: 80000000 ebx: ffffffff ecx:
> 80000000 edx: fffffffe
> printemps kernel: [750078.538618] esi: e4a85017 edi: cf07feac ebp:
> ffffffff esp: cf07fe4c
> printemps kernel: [750078.538637] ds: 007b es: 007b fs: 00d8 gs:
> 0033 ss: 0068
> printemps kernel: [750078.538656] Process clBackup (pid: 29277,
> ti=cf07e000 task=f6d9f8c0 task.ti=cf07e000)
> printemps kernel: [750078.538676] Stack: e4834000 00001000 c033b638
> f89e056c c02360f1 e4834000 1b57afe8 e4a85017
> printemps kernel: [750078.538721] 00ef2608 00000000 ffffffff
> 00000000 ffffffff c0337eab 00000003 00000017
> printemps kernel: [750078.538767] c037a340 e4834000 c01d93b8
> cf07feac cf07feac c023566c e4a85017 c0337eaa
> printemps kernel: [750078.538810] Call Trace:
> printemps kernel: [750078.538839] [<c02360f1>] dev_uevent+0x189/0x1e0
> printemps kernel: [750078.538864] [<c01d93b8>] sprintf+0x20/0x23
> printemps kernel: [750078.538885] [<c023566c>] show_uevent+0xad/0xd5
> printemps kernel: [750078.538907] [<c01571d1>] get_page_from_freelist
> +0x273/0x30a
> printemps kernel: [750078.538933] [<c01323fc>] group_send_sig_info
> +0x12/0x56
> printemps kernel: [750078.538956] [<c01572ba>] __alloc_pages+0x52/0x286
> printemps kernel: [750078.538984] [<c02355bf>] show_uevent+0x0/0xd5
> printemps kernel: [750078.539006] [<c023517e>] dev_attr_show+0x15/0x18
> printemps kernel: [750078.539027] [<c01a8e9f>] sysfs_read_file
> +0x87/0xd8
> printemps kernel: [750078.539048] [<c01880bc>] sys_getxattr+0x46/0x4e
> printemps kernel: [750078.539071] [<c01a8e18>] sysfs_read_file+0x0/0xd8
> printemps kernel: [750078.539092] [<c0171fb7>] vfs_read+0xa6/0x128
> printemps kernel: [750078.539115] [<c01723b3>] sys_read+0x41/0x67
> printemps kernel: [750078.539137] [<c0103d8a>] syscall_call+0x7/0xb
> printemps kernel: [750078.539162] =======================
> printemps kernel: [750078.539177] Code: 74 24 28 73 03 c6 06 20 4d 46 85
> ed 7f f1 e9 b9 00 00 00 8b 0f b8 39 0a 33 c0 8b 54 24 30 81 f9 ff 0f 00
> 00 0f 46 c8 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6
> 44 24 2c 10 89 c3
> printemps kernel: [750078.539346] EIP: [<c01d915a>] vsnprintf
> +0x2af/0x48c SS:ESP 0068:cf07fe4c
>
>
> thnx very much!

---
~Randy

2007-11-30 17:35:59

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : Randy Dunlap [mailto:[email protected]]
> Envoy? : 30 novembre 2007 12:13
>
> On Fri, 30 Nov 2007 13:02:54 +0000 Vincent Fortier wrote:
>
> > Hi all,
> >
> > I'm using a 2.6.22.14 + CFS v24 and I got theses errors
> when starting
> > up my commvault galaxy client... Do anybody know what this could mean?
>
> Can you provide a few lines of syslog before the Oops: line,
> which should contain some info about what happened, e.g.:
>
> Unable to handle kernel paging request at virtual address
> e4a85017 printing eip:
> c01d915a
> *pde = 37d0d067
> *pte = 00000000

Would this be better?
[766535.379600] BUG: unable to handle kernel NULL pointer dereference at virtual address 000000c8
[766535.379636] printing eip:
[766535.379652] c01a920c
[766535.379665] *pdpt = 000000001cc2c001
[766535.379681] *pde = 0000000000000000
[766535.379698] Oops: 0000 [#1]
[766535.379713] SMP
[766535.379729] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse ide_cd ide_generic usbkbd usbmouse tsdev sg iTCO_wdt iTCO_vendor_support psmouse e752x_edac shpchp serio_raw edac_mc pcspkr evdev sr_mod pci_hotplug floppy cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic piix ide_core ehci_hcd uhci_hcd ata_piix tg3 usbcore thermal processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
[766535.379956] CPU: 0
[766535.379957] EIP: 0060:[<c01a920c>] Not tainted VLI
[766535.379959] EFLAGS: 00010202 (2.6.22.14-cfs-etch-686-envcan #1)
[766535.380011] EIP is at sysfs_open_file+0x78/0x1e4
[766535.380028] eax: 00000000 ebx: f7f02e58 ecx: 0000000d edx: 000000c8
[766535.380049] esi: f7e7ec8c edi: defadf30 ebp: c01a9194 esp: defadedc
[766535.380070] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[766535.380091] Process clBackup (pid: 22085, ti=defac000 task=de01ac60 task.ti=defac000)
[766535.380110] Stack: de093300 dd2ce408 f7e7ec48 de093300 dd2ce408 defadf30 c01a9194 c017048c
[766535.380155] dfe8a180 dd2cbd48 de093300 00008000 defadf30 0000000d c01705bd de093300
[766535.380202] 00000000 00000000 c01705fe 00000000 defadf30 dd2cbd48 dfe8a180 e0bd6f00
[766535.380246] Call Trace:
[766535.380276] [<c01a9194>] sysfs_open_file+0x0/0x1e4
[766535.380296] [<c017048c>] __dentry_open+0xc1/0x178
[766535.380321] [<c01705bd>] nameidata_to_filp+0x24/0x33
[766535.380343] [<c01705fe>] do_filp_open+0x32/0x39
[766535.380367] [<c017036b>] get_unused_fd+0x4a/0xaa
[766535.380390] [<c0170647>] do_sys_open+0x42/0xc3
[766535.380413] [<c0170701>] sys_open+0x1c/0x1e
[766535.380434] [<c0103d8a>] syscall_call+0x7/0xb
[766535.380460] =======================
[766535.380476] Code: 14 24 83 7c 24 08 00 8b 42 0c 8b 40 54 8b 70 14 0f 84 70 01 00 00 85 f6 0f 84 68 01 00 00 8b 56 04 85 d2 74 19 64 a1 08 50 3d c0 <83> 3a 02 0f 84 42 01 00 00 c1 e0 05 ff 84 10 20 01 00 00 8b 54
[766535.380644] EIP: [<c01a920c>] sysfs_open_file+0x78/0x1e4 SS:ESP 0068:defadedc

Again,

> >
> > thnx very much!
>

- vin

2007-12-04 13:47:40

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

Le vendredi 30 novembre 2007 ? 12:35 -0500, Fortier,Vincent [Montreal] a
?crit :
> > -----Message d'origine-----
> > De : Randy Dunlap [mailto:[email protected]]
> > Envoy? : 30 novembre 2007 12:13
> >
> > On Fri, 30 Nov 2007 13:02:54 +0000 Vincent Fortier wrote:
> >
> > > Hi all,
> > >
> > > I'm using a 2.6.22.14 + CFS v24 and I got theses errors
> > when starting
> > > up my commvault galaxy client... Do anybody know what this could mean?
> >
> > Can you provide a few lines of syslog before the Oops: line,
> > which should contain some info about what happened, e.g.:
> >
> > Unable to handle kernel paging request at virtual address
> > e4a85017 printing eip:
> > c01d915a
> > *pde = 37d0d067
> > *pte = 00000000
>

I've umounted the XFS/DRBD filesystem/container (tought it might have
been related?) but it did not helped... still getting the same kernel
oops.

[1097523.808915] BUG: unable to handle kernel paging request at virtual
address 80000000
[1097523.808950] printing eip:
[1097523.808963] c01d915a
[1097523.808977] *pdpt = 00000000220ea001
[1097523.808992] *pde = 0000000000000000
[1097523.809009] Oops: 0000 [#27]
[1097523.809023] SMP
[1097523.809040] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd
nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse
ide_cd ide_generic usbkbd usbmouse tsdev serio_raw sg psmouse iTCO_wdt
iTCO_vendor_support floppy e752x_edac sr_mod pcspkr evdev edac_mc shpchp
pci_hotplug cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic
piix ide_core ehci_hcd uhci_hcd tg3 ata_piix usbcore thermal processor
fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss
aacraid
[1097523.809266] CPU: 0
[1097523.809268] EIP: 0060:[<c01d915a>] Not tainted VLI
[1097523.809269] EFLAGS: 00010297 (2.6.22.14-cfs-etch-686-envcan #1)
[1097523.809323] EIP is at vsnprintf+0x2af/0x48c
[1097523.809341] eax: 80000000 ebx: ffffffff ecx: 80000000 edx:
fffffffe
[1097523.809361] esi: d89c6017 edi: dd1ffeac ebp: ffffffff esp:
dd1ffe4c
[1097523.809382] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[1097523.809403] Process clBackup (pid: 30311, ti=dd1fe000 task=f7043290
task.ti=dd1fe000)
[1097523.809423] Stack: dbc2a000 00001000 c033b638 f89e056c c02360f1
dbc2a000 27639fe8 d89c6017
[1097523.809469] 008e2408 00000000 ffffffff 00000000 ffffffff
c0337eab 00000003 00000017
[1097523.809512] c037a340 dbc2a000 c01d93b8 dd1ffeac dd1ffeac
c023566c d89c6017 c0337eaa
[1097523.809559] Call Trace:
[1097523.809588] [<c02360f1>] dev_uevent+0x189/0x1e0
[1097523.809614] [<c01d93b8>] sprintf+0x20/0x23
[1097523.809635] [<c023566c>] show_uevent+0xad/0xd5
[1097523.809658] [<c01571d1>] get_page_from_freelist+0x273/0x30a
[1097523.809686] [<c01323fc>] group_send_sig_info+0x12/0x56
[1097523.809711] [<c01572ba>] __alloc_pages+0x52/0x286
[1097523.809734] [<c02355bf>] show_uevent+0x0/0xd5
[1097523.809754] [<c023517e>] dev_attr_show+0x15/0x18
[1097523.809775] [<c01a8e9f>] sysfs_read_file+0x87/0xd8
[1097523.809796] [<c01880bc>] sys_getxattr+0x46/0x4e
[1097523.809818] [<c01a8e18>] sysfs_read_file+0x0/0xd8
[1097523.809839] [<c0171fb7>] vfs_read+0xa6/0x128
[1097523.809861] [<c01723b3>] sys_read+0x41/0x67
[1097523.809881] [<c0103d8a>] syscall_call+0x7/0xb
[1097523.809906] =======================
[1097523.809921] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9
00 00 00 8b 0f b8 39 0a 33 c0 8b 54 24 30 81 f9 ff 0f 00 00 0f 46 c8 89
c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 2c 10 89
c3
[1097523.810088] EIP: [<c01d915a>] vsnprintf+0x2af/0x48c SS:ESP
0068:dd1ffe4c

Help would really be appreciated.

- vin

2007-12-07 22:17:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, 04 Dec 2007 13:47:01 +0000 Vincent Fortier wrote:

> Le vendredi 30 novembre 2007 ? 12:35 -0500, Fortier,Vincent [Montreal] a
> ?crit :
> > > -----Message d'origine-----
> > > De : Randy Dunlap [mailto:[email protected]]
> > > Envoy? : 30 novembre 2007 12:13
> > >
> > > On Fri, 30 Nov 2007 13:02:54 +0000 Vincent Fortier wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm using a 2.6.22.14 + CFS v24 and I got theses errors
> > > when starting
> > > > up my commvault galaxy client... Do anybody know what this could mean?
> > >
> > > Can you provide a few lines of syslog before the Oops: line,
> > > which should contain some info about what happened, e.g.:
> > >
> > > Unable to handle kernel paging request at virtual address
> > > e4a85017 printing eip:
> > > c01d915a
> > > *pde = 37d0d067
> > > *pte = 00000000
> >
>
> I've umounted the XFS/DRBD filesystem/container (tought it might have
> been related?) but it did not helped... still getting the same kernel
> oops.
>
> [1097523.808915] BUG: unable to handle kernel paging request at virtual
> address 80000000
> [1097523.808950] printing eip:
> [1097523.808963] c01d915a
> [1097523.808977] *pdpt = 00000000220ea001
> [1097523.808992] *pde = 0000000000000000
> [1097523.809009] Oops: 0000 [#27]
> [1097523.809023] SMP
> [1097523.809040] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd
> nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse
> ide_cd ide_generic usbkbd usbmouse tsdev serio_raw sg psmouse iTCO_wdt
> iTCO_vendor_support floppy e752x_edac sr_mod pcspkr evdev edac_mc shpchp
> pci_hotplug cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic
> piix ide_core ehci_hcd uhci_hcd tg3 ata_piix usbcore thermal processor
> fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss
> aacraid
> [1097523.809266] CPU: 0
> [1097523.809268] EIP: 0060:[<c01d915a>] Not tainted VLI
> [1097523.809269] EFLAGS: 00010297 (2.6.22.14-cfs-etch-686-envcan #1)
> [1097523.809323] EIP is at vsnprintf+0x2af/0x48c
> [1097523.809341] eax: 80000000 ebx: ffffffff ecx: 80000000 edx:
> fffffffe
> [1097523.809361] esi: d89c6017 edi: dd1ffeac ebp: ffffffff esp:
> dd1ffe4c
> [1097523.809382] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [1097523.809403] Process clBackup (pid: 30311, ti=dd1fe000 task=f7043290
> task.ti=dd1fe000)
> [1097523.809423] Stack: dbc2a000 00001000 c033b638 f89e056c c02360f1
> dbc2a000 27639fe8 d89c6017
> [1097523.809469] 008e2408 00000000 ffffffff 00000000 ffffffff
> c0337eab 00000003 00000017
> [1097523.809512] c037a340 dbc2a000 c01d93b8 dd1ffeac dd1ffeac
> c023566c d89c6017 c0337eaa
> [1097523.809559] Call Trace:
> [1097523.809588] [<c02360f1>] dev_uevent+0x189/0x1e0
> [1097523.809614] [<c01d93b8>] sprintf+0x20/0x23
> [1097523.809635] [<c023566c>] show_uevent+0xad/0xd5
> [1097523.809658] [<c01571d1>] get_page_from_freelist+0x273/0x30a
> [1097523.809686] [<c01323fc>] group_send_sig_info+0x12/0x56
> [1097523.809711] [<c01572ba>] __alloc_pages+0x52/0x286
> [1097523.809734] [<c02355bf>] show_uevent+0x0/0xd5
> [1097523.809754] [<c023517e>] dev_attr_show+0x15/0x18
> [1097523.809775] [<c01a8e9f>] sysfs_read_file+0x87/0xd8
> [1097523.809796] [<c01880bc>] sys_getxattr+0x46/0x4e
> [1097523.809818] [<c01a8e18>] sysfs_read_file+0x0/0xd8
> [1097523.809839] [<c0171fb7>] vfs_read+0xa6/0x128
> [1097523.809861] [<c01723b3>] sys_read+0x41/0x67
> [1097523.809881] [<c0103d8a>] syscall_call+0x7/0xb
> [1097523.809906] =======================
> [1097523.809921] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9
> 00 00 00 8b 0f b8 39 0a 33 c0 8b 54 24 30 81 f9 ff 0f 00 00 0f 46 c8 89
> c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 2c 10 89
> c3
> [1097523.810088] EIP: [<c01d915a>] vsnprintf+0x2af/0x48c SS:ESP
> 0068:dd1ffe4c
>
> Help would really be appreciated.

Let's try the last_sysfs_file (name) patch.
I've attempted to update it for 2.6.22.14.
Andrew, does this change in fs/sysfs/file.c look OK?

---
~Randy



From: Randy Dunlap <[email protected]>

Record last_sysfs_file name to print during oopsen so that we can
have a clue.

Signed-off-by: Randy Dunlap <[email protected]>
---
arch/i386/kernel/traps.c | 6 ++++++
fs/sysfs/file.c | 6 ++++++
2 files changed, 12 insertions(+)

--- linux-2.6.22.14.orig/arch/i386/kernel/traps.c
+++ linux-2.6.22.14/arch/i386/kernel/traps.c
@@ -411,6 +411,12 @@ void die(const char * str, struct pt_reg
#endif
if (nl)
printk("\n");
+ {
+ extern char last_sysfs_file[];
+
+ printk(KERN_ALERT "last sysfs file: %s\n",
+ last_sysfs_file);
+ }
if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_no, SIGSEGV) !=
NOTIFY_STOP) {
--- linux-2.6.22.14.orig/fs/sysfs/file.c
+++ linux-2.6.22.14/fs/sysfs/file.c
@@ -8,6 +8,7 @@
#include <linux/namei.h>
#include <linux/poll.h>
#include <linux/list.h>
+#include <linux/limits.h>
#include <asm/uaccess.h>
#include <asm/semaphore.h>

@@ -245,6 +246,8 @@ out:
return len;
}

+char last_sysfs_file[PATH_MAX];
+
static int sysfs_open_file(struct inode *inode, struct file *file)
{
struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
@@ -279,6 +282,9 @@ static int sysfs_open_file(struct inode
if (!ops)
goto Eaccess;

+ d_path(file->f_path.dentry, sysfs_mount, last_sysfs_file,
+ sizeof(last_sysfs_file));
+
/* make sure we have a collection to add our buffers to */
mutex_lock(&inode->i_mutex);
if (!(set = inode->i_private)) {

2007-12-07 23:12:37

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Fri, 7 Dec 2007 14:15:36 -0800
Randy Dunlap <[email protected]> wrote:

> > Help would really be appreciated.
>
> Let's try the last_sysfs_file (name) patch.
> I've attempted to update it for 2.6.22.14.
> Andrew, does this change in fs/sysfs/file.c look OK?

umm, yup.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/gregkh-driver-sysfs-crash-debugging.patch

should work.

2007-12-08 01:16:53

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Fri, 7 Dec 2007 15:11:13 -0800 Andrew Morton wrote:

> On Fri, 7 Dec 2007 14:15:36 -0800
> Randy Dunlap <[email protected]> wrote:
>
> > > Help would really be appreciated.
> >
> > Let's try the last_sysfs_file (name) patch.
> > I've attempted to update it for 2.6.22.14.
> > Andrew, does this change in fs/sysfs/file.c look OK?
>
> umm, yup.
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/gregkh-driver-sysfs-crash-debugging.patch
>
> should work.

Thanks.
I produced a cleanly applying version of it for 2.6.22.14.

Vincent, please apply this patch so we can know which file in sysfs
these oopses are happening with.

---


From: Andrew Morton <[email protected]>

Display the most-recently-opened sysfs file's name when oopsing.

From: Adrian Bunk <[email protected]>

Build fix

From: Greg Kroah-Hartman <[email protected]>

Modified to make the api call cleaner, and available to all arches if
need be. Also added it to x86-64's crash dump message.


Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/i386/kernel/traps.c | 1 +
arch/x86_64/kernel/traps.c | 1 +
fs/sysfs/file.c | 14 ++++++++++++++
include/linux/sysfs.h | 6 ++++++
4 files changed, 22 insertions(+)

--- linux-2.6.22.14.orig/arch/i386/kernel/traps.c
+++ linux-2.6.22.14/arch/i386/kernel/traps.c
@@ -411,6 +411,7 @@ void die(const char * str, struct pt_reg
#endif
if (nl)
printk("\n");
+ sysfs_printk_last_file();
if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_no, SIGSEGV) !=
NOTIFY_STOP) {
--- linux-2.6.22.14.orig/arch/x86_64/kernel/traps.c
+++ linux-2.6.22.14/arch/x86_64/kernel/traps.c
@@ -516,6 +516,7 @@ void __kprobes __die(const char * str, s
printk("DEBUG_PAGEALLOC");
#endif
printk("\n");
+ sysfs_printk_last_file();
notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
show_registers(regs);
/* Executive summary in case the oops scrolled away */
--- linux-2.6.22.14.orig/fs/sysfs/file.c
+++ linux-2.6.22.14/fs/sysfs/file.c
@@ -8,6 +8,7 @@
#include <linux/namei.h>
#include <linux/poll.h>
#include <linux/list.h>
+#include <linux/limits.h>
#include <asm/uaccess.h>
#include <asm/semaphore.h>

@@ -15,6 +16,13 @@

#define to_sattr(a) container_of(a,struct subsys_attribute, attr)

+/* used in crash dumps to help with debugging */
+static char last_sysfs_file[PATH_MAX];
+void sysfs_printk_last_file(void)
+{
+ printk(KERN_EMERG "last sysfs file: %s\n", last_sysfs_file);
+}
+
/*
* Subsystem file operations.
* These operations allow subsystems to have files that can be
@@ -253,6 +261,12 @@ static int sysfs_open_file(struct inode
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
int error = 0;
+ char *p;
+
+ p = d_path(file->f_dentry, sysfs_mount, last_sysfs_file,
+ sizeof(last_sysfs_file));
+ if (p)
+ memmove(last_sysfs_file, p, strlen(p) + 1);

if (!kobj || !attr)
goto Einval;
--- linux-2.6.22.14.orig/include/linux/sysfs.h
+++ linux-2.6.22.14/include/linux/sysfs.h
@@ -125,6 +125,7 @@ void sysfs_remove_file_from_group(struct
const struct attribute *attr, const char *group);

void sysfs_notify(struct kobject * k, char *dir, char *attr);
+void sysfs_printk_last_file(void);


extern int sysfs_make_shadowed_dir(struct kobject *kobj,
@@ -240,6 +241,11 @@ static inline int __must_check sysfs_ini
return 0;
}

+static inline void sysfs_printk_last_file(void)
+{
+ ;
+}
+
#endif /* CONFIG_SYSFS */

#endif /* _SYSFS_H_ */

2007-12-10 13:21:19

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : Randy Dunlap [mailto:[email protected]]
> Envoy? : 7 d?cembre 2007 20:15
>
> On Fri, 7 Dec 2007 15:11:13 -0800 Andrew Morton wrote:
>
> > On Fri, 7 Dec 2007 14:15:36 -0800
> > Randy Dunlap <[email protected]> wrote:
> >
> > > > Help would really be appreciated.
> > >
> > > Let's try the last_sysfs_file (name) patch.
> > > I've attempted to update it for 2.6.22.14.
> > > Andrew, does this change in fs/sysfs/file.c look OK?
> >
> > umm, yup.
> >
> >
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-r
> >
> c6/2.6.21-rc6-mm1/broken-out/gregkh-driver-sysfs-crash-debugging.patch
> >
> > should work.
>
> Thanks.
> I produced a cleanly applying version of it for 2.6.22.14.
>
> Vincent, please apply this patch so we can know which file in
> sysfs these oopses are happening with.
>

It did not applied cleanly on a 2.6.22.14... copy/paste might be the issue here... Anyhow, I corrected the patch failure to apply and here is my version of it... Hoping I got this (attached patch).

Compiling at the moment... will try this out with commvault 5.9 probably in the morning and get back with the results.

Let me know I got the patch wrong.

- vin


Attachments:
display_most-recently-opened_sysfs_file_name_when_oopsing.patch (2.86 kB)
display_most-recently-opened_sysfs_file_name_when_oopsing.patch

2007-12-10 14:04:07

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?



> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de
> Fortier,Vincent [Montreal]
> Envoy? : 10 d?cembre 2007 08:21
> ? : Randy Dunlap; Andrew Morton
> Cc : [email protected]
> Objet : RE: 2.6.22.14 oops msg with commvault galaxy ?
>
> > -----Message d'origine-----
> > De : Randy Dunlap [mailto:[email protected]] Envoy? :
> 7 d?cembre
> > 2007 20:15
> >
> > On Fri, 7 Dec 2007 15:11:13 -0800 Andrew Morton wrote:
> >
> > > On Fri, 7 Dec 2007 14:15:36 -0800
> > > Randy Dunlap <[email protected]> wrote:
> > >
> > > > > Help would really be appreciated.
> > > >
> > > > Let's try the last_sysfs_file (name) patch.
> > > > I've attempted to update it for 2.6.22.14.
> > > > Andrew, does this change in fs/sysfs/file.c look OK?
> > >
> > > umm, yup.
> > >
> > >
> >
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-r
> > >
> >
> c6/2.6.21-rc6-mm1/broken-out/gregkh-driver-sysfs-crash-debugging.patch
> > >
> > > should work.
> >
> > Thanks.
> > I produced a cleanly applying version of it for 2.6.22.14.
> >
> > Vincent, please apply this patch so we can know which file in sysfs
> > these oopses are happening with.
> >
>
> It did not applied cleanly on a 2.6.22.14... copy/paste might
> be the issue here... Anyhow, I corrected the patch failure to
> apply and here is my version of it... Hoping I got this
> (attached patch).
>
> Compiling at the moment... will try this out with commvault
> 5.9 probably in the morning and get back with the results.
>
> Let me know I got the patch wrong.

Here is the resulting trace... hoping this helps...:

[ 942.107304] BUG: unable to handle kernel NULL pointer dereference at virtual address 000000c8
[ 942.107339] printing eip:
[ 942.107354] c01a924c
[ 942.107368] *pdpt = 000000002d6b4001
[ 942.107383] *pde = 0000000000000000
[ 942.107401] Oops: 0000 [#1]
[ 942.107414] SMP
[ 942.107431] last sysfs file: /kernel/uids/104/cpu_share
[ 942.107449] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse ide_cd ide_generic usbkbd usbmouse tsdev sg iTCO_wdt iTCO_vendor_support e752x_edac edac_mc psmouse floppy shpchp pci_hotplug serio_raw sr_mod pcspkr evdev cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic piix ide_core ehci_hcd uhci_hcd usbcore ata_piix tg3 thermal processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
[ 942.107675] CPU: 0
[ 942.107676] EIP: 0060:[<c01a924c>] Not tainted VLI
[ 942.107678] EFLAGS: 00010202 (2.6.22.14-cfs-etch-686-envcan #1)
[ 942.107730] EIP is at sysfs_open_file+0xae/0x21e
[ 942.107749] eax: 00000000 ebx: f77783b8 ecx: dfb0b280 edx: 000000c8
[ 942.107769] esi: f7e0ce8c edi: c03fd5c0 ebp: c01a919e esp: f1257ed8
[ 942.107789] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 942.107810] Process clBackup (pid: 5191, ti=f1256000 task=f71b3290 task.ti=f1256000)
[ 942.107831] Stack: 00001000 f295d240 ed4f4ac0 f7e0ce48 f295d240 ed4f4ac0 f1257f30 c01a919e
[ 942.107878] c0170464 dfe76100 ed4f3880 f295d240 00008000 f1257f30 00000010 c0170595
[ 942.107921] f295d240 00000000 00000000 c01705d6 00000000 f1257f30 ed4f3880 dfe76100
[ 942.107968] Call Trace:
[ 942.107998] [<c01a919e>] sysfs_open_file+0x0/0x21e
[ 942.108017] [<c0170464>] __dentry_open+0xc1/0x178
[ 942.108039] [<c0170595>] nameidata_to_filp+0x24/0x33
[ 942.108063] [<c01705d6>] do_filp_open+0x32/0x39
[ 942.108088] [<c0170343>] get_unused_fd+0x4a/0xaa
[ 942.108112] [<c017061f>] do_sys_open+0x42/0xc3
[ 942.108134] [<c01706d9>] sys_open+0x1c/0x1e
[ 942.108155] [<c0103d8a>] syscall_call+0x7/0xb
[ 942.108179] =======================
[ 942.108194] Code: b8 c0 c5 3f c0 41 e8 e8 06 03 00 83 7c 24 0c 00 0f 84 72 01 00 00 85 f6 0f 84 6a 01 00 00 8b 56 04 85 d2 74 19 64 a1 08 50 3d c0 <83> 3a 02 0f 84 44 01 00 00 c1 e0 05 ff 84 10 20 01 00 00 8b 54
[ 942.108364] EIP: [<c01a924c>] sysfs_open_file+0xae/0x21e SS:ESP 0068:f1257ed8

- vin

2007-12-10 17:16:20

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:

Ingo, can you look at this, please?
Vincent is getting oopses on 2.6.22.14-cfs-etch.

Vincent, did you apply the cfs patch or did Debian etch provide that?
If you applied it, did you use
http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22.13-v24.patch
or a different patch?


> > -----Message d'origine-----
> > De : [email protected]
> > [mailto:[email protected]] De la part de
> > Fortier,Vincent [Montreal]
> > Envoy? : 10 d?cembre 2007 08:21
> > ? : Randy Dunlap; Andrew Morton
> > Cc : [email protected]
> > Objet : RE: 2.6.22.14 oops msg with commvault galaxy ?
> >
> > > -----Message d'origine-----
> > > De : Randy Dunlap [mailto:[email protected]] Envoy? :
> > 7 d?cembre
> > > 2007 20:15
> > >
> > > On Fri, 7 Dec 2007 15:11:13 -0800 Andrew Morton wrote:
> > >
> > > > On Fri, 7 Dec 2007 14:15:36 -0800
> > > > Randy Dunlap <[email protected]> wrote:
> > > >
> > > > > > Help would really be appreciated.
> > > > >
> > > > > Let's try the last_sysfs_file (name) patch.
> > > > > I've attempted to update it for 2.6.22.14.
> > > > > Andrew, does this change in fs/sysfs/file.c look OK?
> > > >
> > > > umm, yup.
> > > >
> > > >
> > >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-r
> > > >
> > >
> > c6/2.6.21-rc6-mm1/broken-out/gregkh-driver-sysfs-crash-debugging.patch
> > > >
> > > > should work.
> > >
> > > Thanks.
> > > I produced a cleanly applying version of it for 2.6.22.14.
> > >
> > > Vincent, please apply this patch so we can know which file in sysfs
> > > these oopses are happening with.
> > >
> >
> > It did not applied cleanly on a 2.6.22.14... copy/paste might
> > be the issue here... Anyhow, I corrected the patch failure to
> > apply and here is my version of it... Hoping I got this
> > (attached patch).
> >
> > Compiling at the moment... will try this out with commvault
> > 5.9 probably in the morning and get back with the results.
> >
> > Let me know I got the patch wrong.
>
> Here is the resulting trace... hoping this helps...:
>
> [ 942.107304] BUG: unable to handle kernel NULL pointer dereference at virtual address 000000c8
> [ 942.107339] printing eip:
> [ 942.107354] c01a924c
> [ 942.107368] *pdpt = 000000002d6b4001
> [ 942.107383] *pde = 0000000000000000
> [ 942.107401] Oops: 0000 [#1]
> [ 942.107414] SMP
> [ 942.107431] last sysfs file: /kernel/uids/104/cpu_share
> [ 942.107449] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse ide_cd ide_generic usbkbd usbmouse tsdev sg iTCO_wdt iTCO_vendor_support e752x_edac edac_mc psmouse floppy shpchp pci_hotplug serio_raw sr_mod pcspkr evdev cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic piix ide_core ehci_hcd uhci_hcd usbcore ata_piix tg3 thermal processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
> [ 942.107675] CPU: 0
> [ 942.107676] EIP: 0060:[<c01a924c>] Not tainted VLI
> [ 942.107678] EFLAGS: 00010202 (2.6.22.14-cfs-etch-686-envcan #1)
> [ 942.107730] EIP is at sysfs_open_file+0xae/0x21e
> [ 942.107749] eax: 00000000 ebx: f77783b8 ecx: dfb0b280 edx: 000000c8
> [ 942.107769] esi: f7e0ce8c edi: c03fd5c0 ebp: c01a919e esp: f1257ed8
> [ 942.107789] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [ 942.107810] Process clBackup (pid: 5191, ti=f1256000 task=f71b3290 task.ti=f1256000)
> [ 942.107831] Stack: 00001000 f295d240 ed4f4ac0 f7e0ce48 f295d240 ed4f4ac0 f1257f30 c01a919e
> [ 942.107878] c0170464 dfe76100 ed4f3880 f295d240 00008000 f1257f30 00000010 c0170595
> [ 942.107921] f295d240 00000000 00000000 c01705d6 00000000 f1257f30 ed4f3880 dfe76100
> [ 942.107968] Call Trace:
> [ 942.107998] [<c01a919e>] sysfs_open_file+0x0/0x21e
> [ 942.108017] [<c0170464>] __dentry_open+0xc1/0x178
> [ 942.108039] [<c0170595>] nameidata_to_filp+0x24/0x33
> [ 942.108063] [<c01705d6>] do_filp_open+0x32/0x39
> [ 942.108088] [<c0170343>] get_unused_fd+0x4a/0xaa
> [ 942.108112] [<c017061f>] do_sys_open+0x42/0xc3
> [ 942.108134] [<c01706d9>] sys_open+0x1c/0x1e
> [ 942.108155] [<c0103d8a>] syscall_call+0x7/0xb
> [ 942.108179] =======================
> [ 942.108194] Code: b8 c0 c5 3f c0 41 e8 e8 06 03 00 83 7c 24 0c 00 0f 84 72 01 00 00 85 f6 0f 84 6a 01 00 00 8b 56 04 85 d2 74 19 64 a1 08 50 3d c0 <83> 3a 02 0f 84 44 01 00 00 c1 e0 05 ff 84 10 20 01 00 00 8b 54
> [ 942.108364] EIP: [<c01a924c>] sysfs_open_file+0xae/0x21e SS:ESP 0068:f1257ed8


This oops in sysfs_open_file() is on

if (!try_module_get(attr->owner)) {
error = -ENODEV;
goto Done;
}

but attr->owner == 0xc8.

---
~Randy
Features and documentation: http://lwn.net/Articles/260136/

2007-12-10 17:55:58

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : Randy Dunlap [mailto:[email protected]]
> Envoy? : 10 d?cembre 2007 12:15
>
> On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
>
> Ingo, can you look at this, please?
> Vincent is getting oopses on 2.6.22.14-cfs-etch.
>
> Vincent, did you apply the cfs patch or did Debian etch provide that?

I did. http://linux-dev.qc.ec.gc.ca/

> If you applied it, did you use
> http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22.13-v24.patch
> or a different patch?
>

I applied exactly that one.. and had already sent that info to ingo this morning since I presumed the CFS patchset could be involved in this by reagarding the more detailed output.

Also note that CFS v24 on 2.6.21 does not produce the oops and I can run galaxy backups on the system without any problems.

- vin

2007-12-11 14:55:36

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
>
> Ingo, can you look at this, please?
> Vincent is getting oopses on 2.6.22.14-cfs-etch.
>

Hi,

We are looking into this bug now. I believe that the patch at
http://marc.info/?l=linux-kernel&m=119404922603293 should help.

I am working with Kay to get this ported.

Thanks,
--
regards,
Dhaval

2007-12-11 16:44:16

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
> >
> > Ingo, can you look at this, please?
> > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> >
>
> Hi,
>
> We are looking into this bug now. I believe that the patch at
> http://marc.info/?l=linux-kernel&m=119404922603293 should help.
>
> I am working with Kay to get this ported.
>

Hi Vincent,

Does the following patch help?

Kay/Greg, could you please review and add your Signed-off-by(s) as
required?

This is basically a port of the patch at
http://marc.info/?l=linux-kernel&m=119404922603293

Thanks,
--

The sysfs interface for the Fair User Interface hits upon the bug
reported at http://lkml.org/lkml/2007/12/10/113.

Kay Sievers and Greg K H had posted some sysfs cleanup patches sometime
back at http://marc.info/?l=linux-kernel&m=119404922603293 .

This patch has been ported to 2.6.22.14 + CFS v24 backport.

Cc: Ingo Molnar <[email protected]>
Not-yet-Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Not-yet-Signed-off-by: Dhaval Giani <[email protected]>

---
include/linux/sched.h | 9 ---
kernel/ksysfs.c | 7 --
kernel/user.c | 129 +++++++++++++++++++++++++++++---------------------
3 files changed, 80 insertions(+), 65 deletions(-)

Index: current/include/linux/sched.h
===================================================================
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -586,18 +586,13 @@ struct user_struct {
#ifdef CONFIG_FAIR_USER_SCHED
struct task_group *tg;
#ifdef CONFIG_SYSFS
- struct kset kset;
- struct subsys_attribute user_attr;
+ struct kobject kobj;
struct work_struct work;
#endif
#endif
};

-#ifdef CONFIG_FAIR_USER_SCHED
-extern int uids_kobject_init(void);
-#else
-static inline int uids_kobject_init(void) { return 0; }
-#endif
+extern int uids_sysfs_init(void);

extern struct user_struct *find_user(uid_t);

Index: current/kernel/ksysfs.c
===================================================================
--- current.orig/kernel/ksysfs.c
+++ current/kernel/ksysfs.c
@@ -89,12 +89,9 @@ static int __init ksysfs_init(void)
error = sysfs_create_group(&kernel_subsys.kobj,
&kernel_attr_group);

- /*
- * Create "/sys/kernel/uids" directory and corresponding root user's
- * directory under it.
- */
+ /* create the /sys/kernel/uids/ directory */
if (!error)
- error = uids_kobject_init();
+ error = uids_sysfs_init();

return error;
}
Index: current/kernel/user.c
===================================================================
--- current.orig/kernel/user.c
+++ current/kernel/user.c
@@ -118,7 +118,6 @@ static void sched_switch_user(struct tas

#if defined(CONFIG_FAIR_USER_SCHED) && defined(CONFIG_SYSFS)

-static struct kobject uids_kobject; /* represents /sys/kernel/uids directory */
static DEFINE_MUTEX(uids_mutex);

static inline void uids_mutex_lock(void)
@@ -131,83 +130,104 @@ static inline void uids_mutex_unlock(voi
mutex_unlock(&uids_mutex);
}

-/* return cpu shares held by the user */
-ssize_t cpu_shares_show(struct kset *kset, char *buffer)
+/* uid directory attributes */
+static ssize_t cpu_shares_show(struct kobject *kobj,
+ struct attribute *attr,
+ char *buf)
{
- struct user_struct *up = container_of(kset, struct user_struct, kset);
+ struct user_struct *up = container_of(kobj, struct user_struct, kobj);

- return sprintf(buffer, "%lu\n", sched_group_shares(up->tg));
+ return sprintf(buf, "%lu\n", sched_group_shares(up->tg));
}

-/* modify cpu shares held by the user */
-ssize_t cpu_shares_store(struct kset *kset, const char *buffer, size_t size)
+static ssize_t cpu_shares_store(struct kobject *kobj,
+ struct attribute *attr,
+ const char *buf, size_t size)
{
- struct user_struct *up = container_of(kset, struct user_struct, kset);
+ struct user_struct *up = container_of(kobj, struct user_struct, kobj);
unsigned long shares;
int rc;

- sscanf(buffer, "%lu", &shares);
+ sscanf(buf, "%lu", &shares);

rc = sched_group_set_shares(up->tg, shares);

return (rc ? rc : size);
}

-static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
-{
- sa->attr.name = name;
- sa->attr.mode = mode;
- sa->show = cpu_shares_show;
- sa->store = cpu_shares_store;
-}
-
-/* Create "/sys/kernel/uids/<uid>" directory and
- * "/sys/kernel/uids/<uid>/cpu_share" file for this user.
- */
-static int user_kobject_create(struct user_struct *up)
-{
- struct kset *kset = &up->kset;
- struct kobject *kobj = &kset->kobj;
- int error;
-
- memset(kset, 0, sizeof(struct kset));
- kobj->parent = &uids_kobject; /* create under /sys/kernel/uids dir */
- kobject_set_name(kobj, "%d", up->uid);
- kset_init(kset);
- user_attr_init(&up->user_attr, "cpu_share", 0644);
-
+static struct attribute cpu_share_attr = {
+ .name = "cpu_share",
+ .mode = 0644
+ };
+
+/* default attributes per uid directory */
+static struct attribute *uids_attributes[] = {
+ &cpu_share_attr,
+ NULL
+};
+
+/* the lifetime of user_struct is not managed by the core (now) */
+static void uids_release(struct kobject *kobj)
+{
+ return;
+}
+
+static struct sysfs_ops uids_attributes_ops = {
+ .show = cpu_shares_show,
+ .store = cpu_shares_store,
+};
+
+static struct kobj_type uids_ktype = {
+ .sysfs_ops = &uids_attributes_ops,
+ .release = uids_release,
+};
+
+/* represents the /sys/kernel/uids/ directory */
+static struct kset uids_kset = {
+ .kobj = {.ktype = &uids_ktype},
+};
+
+/* create /sys/kernel/uids/<uid>/cpu_share file for this user */
+static int uids_user_create(struct user_struct *up)
+{
+ struct kobject *kobj = &up->kobj;
+ int error, i = 0;
+
+ memset(kobj, 0, sizeof(struct kobject));
+ kobject_init(kobj);
+ kobj->ktype = &uids_ktype;
+ kobj->kset = &uids_kset;
+ kobject_set_name(&up->kobj, "%d", up->uid);
error = kobject_add(kobj);
if (error)
goto done;

- error = sysfs_create_file(kobj, &up->user_attr.attr);
- if (error)
- kobject_del(kobj);
+ while (uids_attributes[i]) {
+ error = sysfs_create_file(kobj, uids_attributes[i++]);
+ if (error)
+ goto done;
+ }

kobject_uevent(kobj, KOBJ_ADD);
-
done:
return error;
}

-/* create these in sysfs filesystem:
+/* create these entries in sysfs:
* "/sys/kernel/uids" directory
* "/sys/kernel/uids/0" directory (for root user)
* "/sys/kernel/uids/0/cpu_share" file (for root user)
*/
-int __init uids_kobject_init(void)
+int __init uids_sysfs_init(void)
{
int error;

- /* create under /sys/kernel dir */
- uids_kobject.parent = &kernel_subsys.kobj;
- uids_kobject.kset = &kernel_subsys;
- kobject_set_name(&uids_kobject, "uids");
- kobject_init(&uids_kobject);
+ kobject_set_name(&uids_kset.kobj, "uids");
+ kobj_set_kset_s(&uids_kset, kernel_subsys);

- error = kobject_add(&uids_kobject);
+ error = kset_register(&uids_kset);
if (!error)
- error = user_kobject_create(&root_user);
+ error = uids_user_create(&root_user);

return error;
}
@@ -218,9 +238,8 @@ int __init uids_kobject_init(void)
static void remove_user_sysfs_dir(struct work_struct *w)
{
struct user_struct *up = container_of(w, struct user_struct, work);
- struct kobject *kobj = &up->kset.kobj;
unsigned long flags;
- int remove_user = 0;
+ int remove_user = 0, i = 0;

/* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()
* atomic.
@@ -240,9 +259,12 @@ static void remove_user_sysfs_dir(struct
if (!remove_user)
goto done;

- sysfs_remove_file(kobj, &up->user_attr.attr);
- kobject_uevent(kobj, KOBJ_REMOVE);
- kobject_del(kobj);
+ while (uids_attributes[i])
+ sysfs_remove_file(&up->kobj, uids_attributes[i++]);
+
+ kobject_uevent(&up->kobj, KOBJ_REMOVE);
+ kobject_del(&up->kobj);
+ kobject_put(&up->kobj);

sched_destroy_user(up);
key_put(up->uid_keyring);
@@ -269,7 +291,8 @@ static inline void free_user(struct user

#else /* CONFIG_FAIR_USER_SCHED && CONFIG_SYSFS */

-static inline int user_kobject_create(struct user_struct *up) { return 0; }
+int uids_sysfs_init(void) { return 0; }
+static inline int uids_user_create(struct user_struct *up) { return 0; }
static inline void uids_mutex_lock(void) { }
static inline void uids_mutex_unlock(void) { }

@@ -326,7 +349,7 @@ struct user_struct * alloc_uid(struct us
struct hlist_head *hashent = uidhashentry(ns, uid);
struct user_struct *up;

- /* Make uid_hash_find() + user_kobject_create() + uid_hash_insert()
+ /* Make uid_hash_find() + uids_user_create() + uid_hash_insert()
* atomic.
*/
uids_mutex_lock();
@@ -366,7 +389,7 @@ struct user_struct * alloc_uid(struct us
return NULL;
}

- if (user_kobject_create(new)) {
+ if (uids_user_create(new)) {
sched_destroy_user(new);
key_put(new->uid_keyring);
key_put(new->session_keyring);
--
regards,
Dhaval

2007-12-11 17:05:25

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 10:13:19PM +0530, Dhaval Giani wrote:
> On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> > On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
> > >
> > > Ingo, can you look at this, please?
> > > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> > >
> >
> > Hi,
> >
> > We are looking into this bug now. I believe that the patch at
> > http://marc.info/?l=linux-kernel&m=119404922603293 should help.
> >
> > I am working with Kay to get this ported.
> >
>
> Hi Vincent,
>
> Does the following patch help?
>
> Kay/Greg, could you please review and add your Signed-off-by(s) as
> required?

Um, why? What is this patch for? Where is it to be sent, to Linus for
2.6.24-final? Or to the -stable tree?

> This is basically a port of the patch at
> http://marc.info/?l=linux-kernel&m=119404922603293

Yeah, but that patch needs some other core kobject changes, right?

What exactly are you trying to fix here, the fact that this code never
even worked?

And, please, we need some documentation for Documenatation/ABI/ on
exactly what these sysfs files and tree is for. Please add that now for
Linus's tree.

confused,

greg k-h

2007-12-11 17:24:38

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 09:04:00AM -0800, Greg KH wrote:
> On Tue, Dec 11, 2007 at 10:13:19PM +0530, Dhaval Giani wrote:
> > On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> > > On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > > > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
> > > >
> > > > Ingo, can you look at this, please?
> > > > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> > > >
> > >
> > > Hi,
> > >
> > > We are looking into this bug now. I believe that the patch at
> > > http://marc.info/?l=linux-kernel&m=119404922603293 should help.
> > >
> > > I am working with Kay to get this ported.
> > >
> >
> > Hi Vincent,
> >
> > Does the following patch help?
> >
> > Kay/Greg, could you please review and add your Signed-off-by(s) as
> > required?
>
> Um, why? What is this patch for? Where is it to be sent, to Linus for
> 2.6.24-final? Or to the -stable tree?
>

Hi Greg,

This is for 2.26.24-final, since Fair User scheduling is not yet there
in stable.

> > This is basically a port of the patch at
> > http://marc.info/?l=linux-kernel&m=119404922603293
>
> Yeah, but that patch needs some other core kobject changes, right?
>

Yep, there are some other changes that patch needed. We have worked
around them by using the existing functions in the current Linus tree.

> What exactly are you trying to fix here, the fact that this code never
> even worked?
>

The code was not using the kobject API. Its been cleaned up now (I
hope!)

> And, please, we need some documentation for Documenatation/ABI/ on
> exactly what these sysfs files and tree is for. Please add that now for
> Linus's tree.
>

On to it, will send the patch asap.

> confused,
>

hope i helped (in clearing it :) )

Thanks,
--
regards,
Dhaval

2007-12-11 17:48:18

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de Dhaval Giani
>
> On Tue, Dec 11, 2007 at 09:04:00AM -0800, Greg KH wrote:
> > On Tue, Dec 11, 2007 at 10:13:19PM +0530, Dhaval Giani wrote:
> > > On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> > > > On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > > > > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent wrote:
> > > > >
> > > > > Ingo, can you look at this, please?
> > > > > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > We are looking into this bug now. I believe that the patch at
> > > > http://marc.info/?l=linux-kernel&m=119404922603293 should help.
> > > >
> > > > I am working with Kay to get this ported.
> > > >
> > >
> > > Hi Vincent,
> > >
> > > Does the following patch help?
> > >
> > > Kay/Greg, could you please review and add your Signed-off-by(s) as

> > > required?
> >
> > Um, why? What is this patch for? Where is it to be sent, to Linus
> > for 2.6.24-final? Or to the -stable tree?
> >
>
> Hi Greg,
>
> This is for 2.26.24-final, since Fair User scheduling is not
> yet there in stable.
>
> > > This is basically a port of the patch at
> > > http://marc.info/?l=linux-kernel&m=119404922603293
> >
> > Yeah, but that patch needs some other core kobject changes, right?
> >
>
> Yep, there are some other changes that patch needed. We have
> worked around them by using the existing functions in the
> current Linus tree.
>
> > What exactly are you trying to fix here, the fact that this code
never
> > even worked?
> >
>
> The code was not using the kobject API. Its been cleaned up now (I
> hope!)

It refused to apply cleanly on a 2.6.22.14 + CFS v24, only one failure
occured. So I resolved it manually and attached the resulting diff.

My tests with Galaxy 5.9 shows that it still does not work. Although,
the error seems to have changed a bit (see attached dmesg)

> > And, please, we need some documentation for Documenatation/ABI/ on
> > exactly what these sysfs files and tree is for. Please add that now

> > for Linus's tree.
>
> On to it, will send the patch asap.
>
> > confused,
>
> hope i helped (in clearing it :) )

Should this patch eventually be included in?
2.6.25 ?
2.6.24 ?
(-stable 2.6.23 & 2.6.22) || backport CFS v24 -> v25 ?

Thnx,

- vin


Attachments:
dmesg.2.6.22.14-CFSv24-FairUserInterfaceBUGfix (30.55 kB)
dmesg.2.6.22.14-CFSv24-FairUserInterfaceBUGfix
FairUserInterface-BugFix.patch (7.72 kB)
FairUserInterface-BugFix.patch
Download all attachments

2007-12-11 18:21:39

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 12:47:29PM -0500, Fortier,Vincent [Montreal] wrote:
> > -----Message d'origine-----
> > De : [email protected]
> > [mailto:[email protected]] De la part de Dhaval Giani
> >
> > On Tue, Dec 11, 2007 at 09:04:00AM -0800, Greg KH wrote:
> > > On Tue, Dec 11, 2007 at 10:13:19PM +0530, Dhaval Giani wrote:
> > > > On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> > > > > On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > > > > > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent wrote:
> > > > > >
> > > > > > Ingo, can you look at this, please?
> > > > > > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> > > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are looking into this bug now. I believe that the patch at
> > > > > http://marc.info/?l=linux-kernel&m=119404922603293 should help.
> > > > >
> > > > > I am working with Kay to get this ported.
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > Does the following patch help?
> > > >
> > > > Kay/Greg, could you please review and add your Signed-off-by(s) as
>
> > > > required?
> > >
> > > Um, why? What is this patch for? Where is it to be sent, to Linus
> > > for 2.6.24-final? Or to the -stable tree?
> > >
> >
> > Hi Greg,
> >
> > This is for 2.26.24-final, since Fair User scheduling is not
> > yet there in stable.
> >
> > > > This is basically a port of the patch at
> > > > http://marc.info/?l=linux-kernel&m=119404922603293
> > >
> > > Yeah, but that patch needs some other core kobject changes, right?
> > >
> >
> > Yep, there are some other changes that patch needed. We have
> > worked around them by using the existing functions in the
> > current Linus tree.
> >
> > > What exactly are you trying to fix here, the fact that this code
> never
> > > even worked?
> > >
> >
> > The code was not using the kobject API. Its been cleaned up now (I
> > hope!)
>
> It refused to apply cleanly on a 2.6.22.14 + CFS v24, only one failure
> occured. So I resolved it manually and attached the resulting diff.
>
> My tests with Galaxy 5.9 shows that it still does not work. Although,
> the error seems to have changed a bit (see attached dmesg)
>

Hmmm, makes me suspect the bug is somewhere else. What I am not able to
figure out is that I was able to recreate the trace you had on my
systems. So there is a bug somewhere there.

Could you send your config please?

> > > And, please, we need some documentation for Documenatation/ABI/ on
> > > exactly what these sysfs files and tree is for. Please add that now
>
> > > for Linus's tree.
> >
> > On to it, will send the patch asap.
> >
> > > confused,
> >
> > hope i helped (in clearing it :) )
>
> Should this patch eventually be included in?
> 2.6.25 ?
> 2.6.24 ?
> (-stable 2.6.23 & 2.6.22) || backport CFS v24 -> v25 ?
>

2.6.24 is what I believe, unless of course the bug lies elsewhere.

--
regards,
Dhaval

[ 638.466375] BUG: unable to handle kernel paging request at virtual address 80000000
[ 638.466479] printing eip:
[ 638.466527] c01d9182
[ 638.466574] *pdpt = 000000002d022001
[ 638.466622] *pde = 0000000000000000
[ 638.466672] Oops: 0000 [#1]
[ 638.466719] SMP
[ 638.466838] last sysfs file: /devices/platform/floppy.0/uevent
[ 638.466890] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt iTCO_vendor_support sg psmouse e752x_edac shpchp sr_mod pci_hotplug serio_raw edac_mc evdev pcspkr cdrom floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic piix ide_core ata_piix ehci_hcd uhci_hcd tg3 usbcore thermal processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
[ 638.469273] CPU: 3
[ 638.469274] EIP: 0060:[<c01d9182>] Not tainted VLI
[ 638.469275] EFLAGS: 00010297 (2.6.22.14-cfs-etch-686-envcan #1)
[ 638.469444] EIP is at vsnprintf+0x2af/0x48c
[ 638.469504] eax: 80000000 ebx: ffffffff ecx: 80000000 edx: fffffffe
[ 638.469567] esi: ebc29017 edi: ed019eac ebp: ffffffff esp: ed019e4c
[ 638.469631] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 638.469693] Process clBackup (pid: 4849, ti=ed018000 task=f6674c60 task.ti=ed018000)
[ 638.469757] Stack: ec2a9000 00001000 c033b658 f899b56c c0236131 ec2a9000 143d6fe8 ebc29017
[ 638.470212] 00f9b608 00000000 ffffffff ffffffff 00000000 c0337ecb 00000003 00000017
[ 638.470665] c037a3a0 ec2a9000 c01d93e0 ed019eac ed019eac c02356ac ebc29017 c0337eca
[ 638.471129] Call Trace:
[ 638.471258] [<c0236131>] dev_uevent+0x189/0x1e0
[ 638.471377] [<c01d93e0>] sprintf+0x20/0x23
[ 638.471486] [<c02356ac>] show_uevent+0xad/0xd5
[ 638.471594] [<c0157189>] get_page_from_freelist+0x273/0x30a
[ 638.471713] [<c01323b4>] group_send_sig_info+0x12/0x56
[ 638.471822] [<c0157272>] __alloc_pages+0x52/0x286
[ 638.471930] [<c02355ff>] show_uevent+0x0/0xd5
[ 638.472034] [<c02351be>] dev_attr_show+0x15/0x18
[ 638.472138] [<c01a8e91>] sysfs_read_file+0x87/0xd8
[ 638.472240] [<c018807c>] sys_getxattr+0x46/0x4e
[ 638.472341] [<c01a8e0a>] sysfs_read_file+0x0/0xd8
[ 638.472445] [<c0171f77>] vfs_read+0xa6/0x128
[ 638.472551] [<c0172373>] sys_read+0x41/0x67
[ 638.472656] [<c0103d8a>] syscall_call+0x7/0xb
[ 638.472765] =======================
[ 638.472822] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9 00 00 00 8b 0f b8 59 0a 33 c0 8b 54 24 2c 81 f9 ff 0f 00 00 0f 46 c8 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 30 10 89 c3
[ 638.475660] EIP: [<c01d9182>] vsnprintf+0x2af/0x48c SS:ESP 0068:ed019e4c

2007-12-11 18:26:08

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

> >
> > My tests with Galaxy 5.9 shows that it still does not work. Although,
> > the error seems to have changed a bit (see attached dmesg)
> >
>
> Hmmm, makes me suspect the bug is somewhere else. What I am not able to
> figure out is that I was able to recreate the trace you had on my
> systems. So there is a bug somewhere there.
>

To make it clearer why I think so,

> [ 638.466375] BUG: unable to handle kernel paging request at virtual address 80000000
> [ 638.466479] printing eip:
> [ 638.466527] c01d9182
> [ 638.466574] *pdpt = 000000002d022001
> [ 638.466622] *pde = 0000000000000000
> [ 638.466672] Oops: 0000 [#1]
> [ 638.466719] SMP
> [ 638.466838] last sysfs file: /devices/platform/floppy.0/uevent
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That has changed from /sys/kernel/uids/<uid>/cpu_share

> [ 638.466890] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt iTCO_vendor_support sg psmouse e752x_edac shpchp sr_mod pci_hotplug serio_raw edac_mc evdev pcspkr cdrom floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod generic piix ide_core ata_piix ehci_hcd uhci_hcd tg3 usbcore thermal processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
> [ 638.469273] CPU: 3
> [ 638.469274] EIP: 0060:[<c01d9182>] Not tainted VLI
> [ 638.469275] EFLAGS: 00010297 (2.6.22.14-cfs-etch-686-envcan #1)
> [ 638.469444] EIP is at vsnprintf+0x2af/0x48c
> [ 638.469504] eax: 80000000 ebx: ffffffff ecx: 80000000 edx: fffffffe
> [ 638.469567] esi: ebc29017 edi: ed019eac ebp: ffffffff esp: ed019e4c
> [ 638.469631] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [ 638.469693] Process clBackup (pid: 4849, ti=ed018000 task=f6674c60 task.ti=ed018000)
> [ 638.469757] Stack: ec2a9000 00001000 c033b658 f899b56c c0236131 ec2a9000 143d6fe8 ebc29017
> [ 638.470212] 00f9b608 00000000 ffffffff ffffffff 00000000 c0337ecb 00000003 00000017
> [ 638.470665] c037a3a0 ec2a9000 c01d93e0 ed019eac ed019eac c02356ac ebc29017 c0337eca
> [ 638.471129] Call Trace:
> [ 638.471258] [<c0236131>] dev_uevent+0x189/0x1e0
> [ 638.471377] [<c01d93e0>] sprintf+0x20/0x23
> [ 638.471486] [<c02356ac>] show_uevent+0xad/0xd5
> [ 638.471594] [<c0157189>] get_page_from_freelist+0x273/0x30a
> [ 638.471713] [<c01323b4>] group_send_sig_info+0x12/0x56
> [ 638.471822] [<c0157272>] __alloc_pages+0x52/0x286
> [ 638.471930] [<c02355ff>] show_uevent+0x0/0xd5
> [ 638.472034] [<c02351be>] dev_attr_show+0x15/0x18
> [ 638.472138] [<c01a8e91>] sysfs_read_file+0x87/0xd8
> [ 638.472240] [<c018807c>] sys_getxattr+0x46/0x4e
> [ 638.472341] [<c01a8e0a>] sysfs_read_file+0x0/0xd8
> [ 638.472445] [<c0171f77>] vfs_read+0xa6/0x128
> [ 638.472551] [<c0172373>] sys_read+0x41/0x67
> [ 638.472656] [<c0103d8a>] syscall_call+0x7/0xb
> [ 638.472765] =======================
> [ 638.472822] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9 00 00 00 8b 0f b8 59 0a 33 c0 8b 54 24 2c 81 f9 ff 0f 00 00 0f 46 c8 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 30 10 89 c3
> [ 638.475660] EIP: [<c01d9182>] vsnprintf+0x2af/0x48c SS:ESP 0068:ed019e4c
>
>

--
regards,
Dhaval

2007-12-11 19:08:54

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de Dhaval Giani
>
> > >
> > > My tests with Galaxy 5.9 shows that it still does not work.
> > > Although, the error seems to have changed a bit (see
> > > attached dmesg)
> > >
> >
> > Hmmm, makes me suspect the bug is somewhere else. What I am
> > not able
> > to figure out is that I was able to recreate the trace you
> > had on my
> > systems. So there is a bug somewhere there.
> >
>
> To make it clearer why I think so,
>
> > [ 638.466838] last sysfs file: /devices/platform/floppy.0/uevent
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> That has changed from /sys/kernel/uids/<uid>/cpu_share
>

Here is my config.

Maybie I should give it a shot without CFS at all and see what happends
?

- vin


Attachments:
CONFIG-i686-2.6.22-005 (79.06 kB)
CONFIG-i686-2.6.22-005

2007-12-11 19:14:27

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, 11 Dec 2007 14:08:15 -0500 Fortier,Vincent [Montreal] wrote:

> > -----Message d'origine-----
> > De : [email protected]
> > [mailto:[email protected]] De la part de Dhaval Giani
> >
> > > >
> > > > My tests with Galaxy 5.9 shows that it still does not work.
> > > > Although, the error seems to have changed a bit (see
> > > > attached dmesg)
> > > >
> > >
> > > Hmmm, makes me suspect the bug is somewhere else. What I am
> > > not able
> > > to figure out is that I was able to recreate the trace you
> > > had on my
> > > systems. So there is a bug somewhere there.
> > >
> >
> > To make it clearer why I think so,
> >
> > > [ 638.466838] last sysfs file: /devices/platform/floppy.0/uevent
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > That has changed from /sys/kernel/uids/<uid>/cpu_share
> >
>
> Here is my config.
>
> Maybie I should give it a shot without CFS at all and see what happends

I agree.

---
~Randy
Features and documentation: http://lwn.net/Articles/260136/

2007-12-11 19:32:39

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 10:53:38PM +0530, Dhaval Giani wrote:
> On Tue, Dec 11, 2007 at 09:04:00AM -0800, Greg KH wrote:
> > On Tue, Dec 11, 2007 at 10:13:19PM +0530, Dhaval Giani wrote:
> > > On Tue, Dec 11, 2007 at 08:24:37PM +0530, Dhaval Giani wrote:
> > > > On Mon, Dec 10, 2007 at 09:15:01AM -0800, Randy Dunlap wrote:
> > > > > On Mon, 10 Dec 2007 09:03:17 -0500 Fortier,Vincent [Montreal] wrote:
> > > > >
> > > > > Ingo, can you look at this, please?
> > > > > Vincent is getting oopses on 2.6.22.14-cfs-etch.
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > We are looking into this bug now. I believe that the patch at
> > > > http://marc.info/?l=linux-kernel&m=119404922603293 should help.
> > > >
> > > > I am working with Kay to get this ported.
> > > >
> > >
> > > Hi Vincent,
> > >
> > > Does the following patch help?
> > >
> > > Kay/Greg, could you please review and add your Signed-off-by(s) as
> > > required?
> >
> > Um, why? What is this patch for? Where is it to be sent, to Linus for
> > 2.6.24-final? Or to the -stable tree?
> >
>
> Hi Greg,
>
> This is for 2.26.24-final, since Fair User scheduling is not yet there
> in stable.

Again, I think this patch is too big for that release, unless it really
is determined that this fix is needed. As this thread shows, I do not
think it is true...

thanks,

greg k-h

2007-12-11 21:07:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


* Fortier,Vincent [Montreal] <[email protected]> wrote:

> > That has changed from /sys/kernel/uids/<uid>/cpu_share
> >
>
> Here is my config.
>
> Maybie I should give it a shot without CFS at all and see what
> happends ?

and also with CFS but without CONFIG_FAIR_GROUP_SCHED.

Ingo

2007-12-12 07:09:28

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Tue, Dec 11, 2007 at 10:06:53PM +0100, Ingo Molnar wrote:
>
> * Fortier,Vincent [Montreal] <[email protected]> wrote:
>
> > > That has changed from /sys/kernel/uids/<uid>/cpu_share
> > >
> >
> > Here is my config.
> >
> > Maybie I should give it a shot without CFS at all and see what
> > happends ?
>
> and also with CFS but without CONFIG_FAIR_GROUP_SCHED.
>

Hi Ingo,

I am able to reproduce the oops here on my system with 2.6.22.14 +
CFS backport. I am not able to reproduce it with 2.6.22.13 + CFS
backport. I believe the CFS backport is just exposing the bug. Can't
find an obvious culprit and am looking into this issue.

Vincent, could you please confirm if you are able to reproduce this with
2.6.22.13 + CFS?

Thanks,
--
regards,
Dhaval

2007-12-12 12:58:06

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : Dhaval Giani [mailto:[email protected]]
>
> On Tue, Dec 11, 2007 at 10:06:53PM +0100, Ingo Molnar wrote:
> >
> > * Fortier,Vincent [Montreal] <[email protected]> wrote:
> >
> > > > That has changed from /sys/kernel/uids/<uid>/cpu_share
> > >
> > > Here is my config.
> > >
> > > Maybie I should give it a shot without CFS at all and see what
> > > happends ?

It got triggerred also using a 2.6.22.14:
[57560.396000] BUG: unable to handle kernel paging request at virtual
address 80000000
[57560.396000] printing eip:
[57560.396000] c01d6c56
[57560.396000] *pdpt = 0000000008d02001
[57560.396000] *pde = 0000000000000000
[57560.396000] Oops: 0000 [#34]
[57560.396000] SMP
[57560.396000] last sysfs file: /devices/platform/floppy.0/uevent
[57560.396000] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd
nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse
ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt iTCO_vendor_support
psmouse e752x_edac edac_mc serio_raw evdev pcspkr sg floppy shpchp
pci_hotplug sr_mod cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod
generic piix ide_core tg3 ata_piix ehci_hcd uhci_hcd usbcore thermal
processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm
cciss aacraid
[57560.396000] CPU: 2
[57560.396000] EIP: 0060:[<c01d6c56>] Not tainted VLI
[57560.396000] EFLAGS: 00010297 (2.6.22.14-etch-686-envcan #1)
[57560.396000] EIP is at vsnprintf+0x2af/0x48c
[57560.396000] eax: 80000000 ebx: ffffffff ecx: 80000000 edx:
fffffffe
[57560.396000] esi: edf37017 edi: edf09eac ebp: ffffffff esp:
edf09e4c
[57560.396000] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[57560.396000] Process clBackup (pid: 31421, ti=edf08000 task=f7d36530
task.ti=edf08000)
[57560.396000] Stack: c852b000 00001000 c0338c78 f895b56c c0233bf5
c852b000 120c8fe8 edf37017
[57560.396000] 00c3bd08 00000000 ffffffff ffffffff 00000000
c03354eb 00000003 00000017
[57560.396000] c0376dc0 c852b000 c01d6eb4 edf09eac edf09eac
c0233170 edf37017 c03354ea
[57560.396000] Call Trace:
[57560.396000] [<c0233bf5>] dev_uevent+0x189/0x1e0
[57560.396000] [<c01d6eb4>] sprintf+0x20/0x23
[57560.396000] [<c0233170>] show_uevent+0xad/0xd5
[57560.396000] [<c0154f48>] get_page_from_freelist+0x296/0x32d
[57560.396000] [<c012e6f0>] group_send_sig_info+0x12/0x56
[57560.396000] [<c0155031>] __alloc_pages+0x52/0x294
[57560.396000] [<c02330c3>] show_uevent+0x0/0xd5
[57560.396000] [<c0232c82>] dev_attr_show+0x15/0x18
[57560.396000] [<c01a6979>] sysfs_read_file+0x87/0xd8
[57560.396000] [<c0185f04>] sys_getxattr+0x46/0x4e
[57560.396000] [<c01a68f2>] sysfs_read_file+0x0/0xd8
[57560.396000] [<c016fe03>] vfs_read+0xa6/0x128
[57560.396000] [<c01701ff>] sys_read+0x41/0x67
[57560.396000] [<c0103d8a>] syscall_call+0x7/0xb
[57560.396000] =======================
[57560.396000] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9 00
00 00 8b 0f b8 79 e0 32 c0 8b 54 24 2c 81 f9 ff 0f 00 00 0f 46 c8 89 c8
eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 30 10 89 c3
[57560.396000] EIP: [<c01d6c56>] vsnprintf+0x2af/0x48c SS:ESP
0068:edf09e4c

> >
> > and also with CFS but without CONFIG_FAIR_GROUP_SCHED.
> >

Is it still required since it now does not seems to be CFS related?

>
> Hi Ingo,
>
> I am able to reproduce the oops here on my system with
> 2.6.22.14 + CFS backport. I am not able to reproduce it with
> 2.6.22.13 + CFS backport. I believe the CFS backport is just
> exposing the bug. Can't find an obvious culprit and am
> looking into this issue.
>
> Vincent, could you please confirm if you are able to
> reproduce this with
> 2.6.22.13 + CFS?

Using 2.6.13 + CFS v24 I was also able to reproduce the bug (I already
had one built in my depot without the
display_most-recently-opened_sysfs_file_name_when_oopsing.patch). So it
looks like it is at least related to >= 2.6.22.13 and probably not
directly CFS related. Note that to get a oops on a 2.6.13 it seems to
need a full backup since it usually works with incremental. The backup
does start properly then, in this case, at around 70% it oopsed. Using
2.6.22.14 it seems to oops right at startup. Here is the 2.6.22.13 CFS
v24 oops:

[ 170.152908] SGI XFS Quota Management subsystem
[ 170.168443] Filesystem "drbd0": Disabling barriers, not supported by
the underlying device
[ 170.174964] XFS mounting filesystem drbd0
[ 170.232455] Ending clean XFS mount for filesystem: drbd0
[ 170.318614] Filesystem "drbd1": Disabling barriers, not supported by
the underlying device
[ 170.327708] XFS mounting filesystem drbd1
[ 170.380481] Ending clean XFS mount for filesystem: drbd1
[ 947.493764] BUG: unable to handle kernel NULL pointer dereference at
virtual address 000000c8
[ 947.493797] printing eip:
[ 947.493810] c01a922c
[ 947.493823] *pdpt = 000000002a97a001
[ 947.493837] *pde = 0000000000000000
[ 947.493852] Oops: 0000 [#1]
[ 947.493865] SMP
[ 947.493881] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd
nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse
ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt iTCO_vendor_support sg
e752x_edac psmouse edac_mc pcspkr evdev shpchp pci_hotplug serio_raw
sr_mod floppy cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod
generic piix ide_core ehci_hcd uhci_hcd ata_piix usbcore tg3 thermal
processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm
cciss aacraid
[ 947.494099] CPU: 0
[ 947.494100] EIP: 0060:[<c01a922c>] Not tainted VLI
[ 947.494102] EFLAGS: 00010202 (2.6.22.13-cfs-etch-686-envcan #1)
[ 947.494148] EIP is at sysfs_open_file+0x78/0x1e4
[ 947.494163] eax: 00000000 ebx: dff18440 ecx: 0000000d edx:
000000c8
[ 947.494181] esi: f7fc118c edi: eb385f30 ebp: c01a91b4 esp:
eb385edc
[ 947.494199] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 947.494217] Process clBackup (pid: 5273, ti=eb384000 task=f6558000
task.ti=eb384000)
[ 947.494235] Stack: ec339cc0 eafae158 f7fc1148 ec339cc0 eafae158
eb385f30 c01a91b4 c01704ac
[ 947.494278] dfaa8080 eafad220 ec339cc0 00008000 eb385f30
00000010 c01705dd ec339cc0
[ 947.494321] 00000000 00000000 c017061e 00000000 eb385f30
eafad220 dfaa8080 f711dd00
[ 947.494364] Call Trace:
[ 947.494389] [<c01a91b4>] sysfs_open_file+0x0/0x1e4
[ 947.494407] [<c01704ac>] __dentry_open+0xc1/0x178
[ 947.494429] [<c01705dd>] nameidata_to_filp+0x24/0x33
[ 947.494450] [<c017061e>] do_filp_open+0x32/0x39
[ 947.494475] [<c017038b>] get_unused_fd+0x4a/0xaa
[ 947.494496] [<c0170667>] do_sys_open+0x42/0xc3
[ 947.494518] [<c0170721>] sys_open+0x1c/0x1e
[ 947.494537] [<c0103d8a>] syscall_call+0x7/0xb
[ 947.494561] =======================
[ 947.494577] Code: 14 24 83 7c 24 08 00 8b 42 0c 8b 40 54 8b 70 14 0f
84 70 01 00 00 85 f6 0f 84 68 01 00 00 8b 56 04 85 d2 74 19 64 a1 08 50
3d c0 <83> 3a 02 0f 84 42 01 00 00 c1 e0 05 ff 84 10 20 01 00 00 8b 54
[ 947.494743] EIP: [<c01a922c>] sysfs_open_file+0x78/0x1e4 SS:ESP
0068:eb385edc

- vin

2007-12-12 13:06:20

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de
> Fortier,Vincent [Montreal]
>
> > -----Message d'origine-----
> > De : Dhaval Giani [mailto:[email protected]]
> >
> > On Tue, Dec 11, 2007 at 10:06:53PM +0100, Ingo Molnar wrote:
> > >
> > > * Fortier,Vincent [Montreal] <[email protected]> wrote:
> > >
> > > > > That has changed from /sys/kernel/uids/<uid>/cpu_share
> > > >
> > > > Here is my config.
> > > >
> > > > Maybie I should give it a shot without CFS at all and see what
> > > > happends ?
>
> It got triggerred also using a 2.6.22.14:

Just to clarify... this is a non CFS kernel oops...

> [57560.396000] BUG: unable to handle kernel paging request at
> virtual address 80000000 [57560.396000] printing eip:
> [57560.396000] c01d6c56
> [57560.396000] *pdpt = 0000000008d02001
> [57560.396000] *pde = 0000000000000000
> [57560.396000] Oops: 0000 [#34]
> [57560.396000] SMP
> [57560.396000] last sysfs file:
> /devices/platform/floppy.0/uevent [57560.396000] Modules
> linked in: xfs drbd cn nfs nfsd exportfs lockd nfs_acl sunrpc
> ppdev parport_pc lp parport button ac battery ipv6 fuse
> ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt
> iTCO_vendor_support psmouse e752x_edac edac_mc serio_raw
> evdev pcspkr sg floppy shpchp pci_hotplug sr_mod cdrom ext3
> jbd mbcache dm_mirror dm_snapshot dm_mod generic piix
> ide_core tg3 ata_piix ehci_hcd uhci_hcd usbcore thermal
> processor fan mptscsih mptbase megaraid_sas megaraid_mbox
> megaraid_mm cciss aacraid
> [57560.396000] CPU: 2
> [57560.396000] EIP: 0060:[<c01d6c56>] Not tainted VLI
> [57560.396000] EFLAGS: 00010297 (2.6.22.14-etch-686-envcan #1)
> [57560.396000] EIP is at vsnprintf+0x2af/0x48c
> [57560.396000] eax: 80000000 ebx: ffffffff ecx: 80000000 edx:
> fffffffe
> [57560.396000] esi: edf37017 edi: edf09eac ebp: ffffffff esp:
> edf09e4c
> [57560.396000] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [57560.396000] Process clBackup (pid: 31421, ti=edf08000 task=f7d36530
> task.ti=edf08000)
> [57560.396000] Stack: c852b000 00001000 c0338c78 f895b56c
> c0233bf5 c852b000 120c8fe8 edf37017
> [57560.396000] 00c3bd08 00000000 ffffffff ffffffff 00000000
> c03354eb 00000003 00000017
> [57560.396000] c0376dc0 c852b000 c01d6eb4 edf09eac edf09eac
> c0233170 edf37017 c03354ea
> [57560.396000] Call Trace:
> [57560.396000] [<c0233bf5>] dev_uevent+0x189/0x1e0
> [57560.396000] [<c01d6eb4>] sprintf+0x20/0x23 [57560.396000]
> [<c0233170>] show_uevent+0xad/0xd5 [57560.396000]
> [<c0154f48>] get_page_from_freelist+0x296/0x32d
> [57560.396000] [<c012e6f0>] group_send_sig_info+0x12/0x56
> [57560.396000] [<c0155031>] __alloc_pages+0x52/0x294
> [57560.396000] [<c02330c3>] show_uevent+0x0/0xd5
> [57560.396000] [<c0232c82>] dev_attr_show+0x15/0x18
> [57560.396000] [<c01a6979>] sysfs_read_file+0x87/0xd8
> [57560.396000] [<c0185f04>] sys_getxattr+0x46/0x4e
> [57560.396000] [<c01a68f2>] sysfs_read_file+0x0/0xd8
> [57560.396000] [<c016fe03>] vfs_read+0xa6/0x128
> [57560.396000] [<c01701ff>] sys_read+0x41/0x67
> [57560.396000] [<c0103d8a>] syscall_call+0x7/0xb
> [57560.396000] ======================= [57560.396000] Code:
> 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9 00 00 00 8b
> 0f b8 79 e0 32 c0 8b 54 24 2c 81 f9 ff 0f 00 00 0f 46 c8 89
> c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24
> 30 10 89 c3 [57560.396000] EIP: [<c01d6c56>]
> vsnprintf+0x2af/0x48c SS:ESP 0068:edf09e4c
>
> > >
> > > and also with CFS but without CONFIG_FAIR_GROUP_SCHED.
> > >
>
> Is it still required since it now does not seems to be CFS related?
>
> >
> > Hi Ingo,
> >
> > I am able to reproduce the oops here on my system with
> > 2.6.22.14 + CFS backport. I am not able to reproduce it with
> > 2.6.22.13 + CFS backport. I believe the CFS backport is
> just exposing
> > the bug. Can't find an obvious culprit and am looking into
> this issue.
> >
> > Vincent, could you please confirm if you are able to reproduce this
> > with
> > 2.6.22.13 + CFS?
>
> Using 2.6.13 + CFS v24 I was also able to reproduce the bug
> (I already had one built in my depot without the
> display_most-recently-opened_sysfs_file_name_when_oopsing.patc
> h). So it looks like it is at least related to >= 2.6.22.13
> and probably not directly CFS related. Note that to get a
> oops on a 2.6.13 it seems to need a full backup since it
> usually works with incremental. The backup does start
> properly then, in this case, at around 70% it oopsed. Using
> 2.6.22.14 it seems to oops right at startup. Here is the
> 2.6.22.13 CFS v24 oops:

Again, just to clarify, I'm not even sure the backup worked at all using
a 2.6.22.13 CFS v24 since I already had a previous pending full backup
at 70% ... so it may simply had tried to finalize that one and crash
right at startup?

> [ 170.152908] SGI XFS Quota Management subsystem [
> 170.168443] Filesystem "drbd0": Disabling barriers, not
> supported by the underlying device [ 170.174964] XFS
> mounting filesystem drbd0 [ 170.232455] Ending clean XFS
> mount for filesystem: drbd0 [ 170.318614] Filesystem
> "drbd1": Disabling barriers, not supported by the underlying
> device [ 170.327708] XFS mounting filesystem drbd1 [
> 170.380481] Ending clean XFS mount for filesystem: drbd1 [
> 947.493764] BUG: unable to handle kernel NULL pointer
> dereference at virtual address 000000c8 [ 947.493797] printing eip:
> [ 947.493810] c01a922c
> [ 947.493823] *pdpt = 000000002a97a001
> [ 947.493837] *pde = 0000000000000000
> [ 947.493852] Oops: 0000 [#1]
> [ 947.493865] SMP
> [ 947.493881] Modules linked in: xfs drbd cn nfs nfsd
> exportfs lockd nfs_acl sunrpc ppdev parport_pc lp parport
> button ac battery ipv6 fuse ide_cd ide_generic usbkbd
> usbmouse tsdev iTCO_wdt iTCO_vendor_support sg e752x_edac
> psmouse edac_mc pcspkr evdev shpchp pci_hotplug serio_raw
> sr_mod floppy cdrom ext3 jbd mbcache dm_mirror dm_snapshot
> dm_mod generic piix ide_core ehci_hcd uhci_hcd ata_piix
> usbcore tg3 thermal processor fan mptscsih mptbase
> megaraid_sas megaraid_mbox megaraid_mm cciss aacraid
> [ 947.494099] CPU: 0
> [ 947.494100] EIP: 0060:[<c01a922c>] Not tainted VLI
> [ 947.494102] EFLAGS: 00010202 (2.6.22.13-cfs-etch-686-envcan #1)
> [ 947.494148] EIP is at sysfs_open_file+0x78/0x1e4
> [ 947.494163] eax: 00000000 ebx: dff18440 ecx: 0000000d edx:
> 000000c8
> [ 947.494181] esi: f7fc118c edi: eb385f30 ebp: c01a91b4 esp:
> eb385edc
> [ 947.494199] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [ 947.494217] Process clBackup (pid: 5273, ti=eb384000 task=f6558000
> task.ti=eb384000)
> [ 947.494235] Stack: ec339cc0 eafae158 f7fc1148 ec339cc0
> eafae158 eb385f30 c01a91b4 c01704ac
> [ 947.494278] dfaa8080 eafad220 ec339cc0 00008000 eb385f30
> 00000010 c01705dd ec339cc0
> [ 947.494321] 00000000 00000000 c017061e 00000000 eb385f30
> eafad220 dfaa8080 f711dd00
> [ 947.494364] Call Trace:
> [ 947.494389] [<c01a91b4>] sysfs_open_file+0x0/0x1e4 [
> 947.494407] [<c01704ac>] __dentry_open+0xc1/0x178 [
> 947.494429] [<c01705dd>] nameidata_to_filp+0x24/0x33 [
> 947.494450] [<c017061e>] do_filp_open+0x32/0x39 [
> 947.494475] [<c017038b>] get_unused_fd+0x4a/0xaa [
> 947.494496] [<c0170667>] do_sys_open+0x42/0xc3 [
> 947.494518] [<c0170721>] sys_open+0x1c/0x1e [ 947.494537]
> [<c0103d8a>] syscall_call+0x7/0xb [ 947.494561]
> ======================= [ 947.494577] Code: 14 24 83 7c 24
> 08 00 8b 42 0c 8b 40 54 8b 70 14 0f
> 84 70 01 00 00 85 f6 0f 84 68 01 00 00 8b 56 04 85 d2 74 19
> 64 a1 08 50 3d c0 <83> 3a 02 0f 84 42 01 00 00 c1 e0 05 ff 84
> 10 20 01 00 00 8b 54 [ 947.494743] EIP: [<c01a922c>]
> sysfs_open_file+0x78/0x1e4 SS:ESP 0068:eb385edc
>

Regards,

- vin

2007-12-12 13:42:49

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Wed, Dec 12, 2007 at 07:57:33AM -0500, Fortier,Vincent [Montreal] wrote:
> > -----Message d'origine-----
> > De : Dhaval Giani [mailto:[email protected]]
> >
> > On Tue, Dec 11, 2007 at 10:06:53PM +0100, Ingo Molnar wrote:
> > >
> > > * Fortier,Vincent [Montreal] <[email protected]> wrote:
> > >
> > > > > That has changed from /sys/kernel/uids/<uid>/cpu_share
> > > >
> > > > Here is my config.
> > > >
> > > > Maybie I should give it a shot without CFS at all and see what
> > > > happends ?
>
> It got triggerred also using a 2.6.22.14:
> [57560.396000] BUG: unable to handle kernel paging request at virtual
> address 80000000
> [57560.396000] printing eip:
> [57560.396000] c01d6c56
> [57560.396000] *pdpt = 0000000008d02001
> [57560.396000] *pde = 0000000000000000
> [57560.396000] Oops: 0000 [#34]
> [57560.396000] SMP
> [57560.396000] last sysfs file: /devices/platform/floppy.0/uevent
> [57560.396000] Modules linked in: xfs drbd cn nfs nfsd exportfs lockd
> nfs_acl sunrpc ppdev parport_pc lp parport button ac battery ipv6 fuse
> ide_cd ide_generic usbkbd usbmouse tsdev iTCO_wdt iTCO_vendor_support
> psmouse e752x_edac edac_mc serio_raw evdev pcspkr sg floppy shpchp
> pci_hotplug sr_mod cdrom ext3 jbd mbcache dm_mirror dm_snapshot dm_mod
> generic piix ide_core tg3 ata_piix ehci_hcd uhci_hcd usbcore thermal
> processor fan mptscsih mptbase megaraid_sas megaraid_mbox megaraid_mm
> cciss aacraid
> [57560.396000] CPU: 2
> [57560.396000] EIP: 0060:[<c01d6c56>] Not tainted VLI
> [57560.396000] EFLAGS: 00010297 (2.6.22.14-etch-686-envcan #1)
> [57560.396000] EIP is at vsnprintf+0x2af/0x48c
> [57560.396000] eax: 80000000 ebx: ffffffff ecx: 80000000 edx:
> fffffffe
> [57560.396000] esi: edf37017 edi: edf09eac ebp: ffffffff esp:
> edf09e4c
> [57560.396000] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> [57560.396000] Process clBackup (pid: 31421, ti=edf08000 task=f7d36530
> task.ti=edf08000)
> [57560.396000] Stack: c852b000 00001000 c0338c78 f895b56c c0233bf5
> c852b000 120c8fe8 edf37017
> [57560.396000] 00c3bd08 00000000 ffffffff ffffffff 00000000
> c03354eb 00000003 00000017
> [57560.396000] c0376dc0 c852b000 c01d6eb4 edf09eac edf09eac
> c0233170 edf37017 c03354ea
> [57560.396000] Call Trace:
> [57560.396000] [<c0233bf5>] dev_uevent+0x189/0x1e0
> [57560.396000] [<c01d6eb4>] sprintf+0x20/0x23
> [57560.396000] [<c0233170>] show_uevent+0xad/0xd5
> [57560.396000] [<c0154f48>] get_page_from_freelist+0x296/0x32d
> [57560.396000] [<c012e6f0>] group_send_sig_info+0x12/0x56
> [57560.396000] [<c0155031>] __alloc_pages+0x52/0x294
> [57560.396000] [<c02330c3>] show_uevent+0x0/0xd5
> [57560.396000] [<c0232c82>] dev_attr_show+0x15/0x18
> [57560.396000] [<c01a6979>] sysfs_read_file+0x87/0xd8
> [57560.396000] [<c0185f04>] sys_getxattr+0x46/0x4e
> [57560.396000] [<c01a68f2>] sysfs_read_file+0x0/0xd8
> [57560.396000] [<c016fe03>] vfs_read+0xa6/0x128
> [57560.396000] [<c01701ff>] sys_read+0x41/0x67
> [57560.396000] [<c0103d8a>] syscall_call+0x7/0xb
> [57560.396000] =======================
> [57560.396000] Code: 74 24 28 73 03 c6 06 20 4d 46 85 ed 7f f1 e9 b9 00
> 00 00 8b 0f b8 79 e0 32 c0 8b 54 24 2c 81 f9 ff 0f 00 00 0f 46 c8 89 c8
> eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 f6 44 24 30 10 89 c3
> [57560.396000] EIP: [<c01d6c56>] vsnprintf+0x2af/0x48c SS:ESP
> 0068:edf09e4c
>
> > >
> > > and also with CFS but without CONFIG_FAIR_GROUP_SCHED.
> > >
>
> Is it still required since it now does not seems to be CFS related?
>

No, not any more. Would it be possible for you to do a git-bisect? I am not
too well versed with sysfs, so it is not apparent to me what is causing
this oops. It seems to be easily reproducible. I don't still have a
reliable method to reproduce it without the CFS patch. Could sysfs
experts please help debugging?

Thanks,
--
regards,
Dhaval

2007-12-12 18:46:48

by Vincent Fortier

[permalink] [raw]
Subject: RE: 2.6.22.14 oops msg with commvault galaxy ?

> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de Dhaval Giani
>
> On Wed, Dec 12, 2007 at 07:57:33AM -0500, Fortier,Vincent
> [Montreal] wrote:
> > > -----Message d'origine-----
> > > De : Dhaval Giani [mailto:[email protected]]
> > >
> > > On Tue, Dec 11, 2007 at 10:06:53PM +0100, Ingo Molnar wrote:
> > > >
> > > > * Fortier,Vincent [Montreal] <[email protected]> wrote:
> > > >
> > > > > > That has changed from /sys/kernel/uids/<uid>/cpu_share
> > > > >
> > > > > Here is my config.
> > > > >
> > > > > Maybie I should give it a shot without CFS at all and see what

> > > > > happends ?
> >
> > It got triggerred also using a 2.6.22.14:

Here are my preliminary test results:
2.6.21.7: OK
2.6.22.13/14: Failure
2.6.23.9: OK
2.6.24-rc5-git2: OK

It seems to only hang using a 2.6.22 kernel.

>
> No, not any more. Would it be possible for you to do a
> git-bisect? I am not too well versed with sysfs, so it is not
> apparent to me what is causing this oops. It seems to be
> easily reproducible. I don't still have a reliable method to
> reproduce it without the CFS patch. Could sysfs experts
> please help debugging?
>

I seriously doubt I have the time to do a git-bisect at the moment....

- vin

2007-12-13 11:44:25

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

Hi Vincent,

Could you please see if the following patch removes the oops due to CFS
sysfs files? (There might still be the other oops due to the floppy
sysfs files)

Ingo, could you please add this patch in your CFS backport to 2.6.22 and
older kernels?

Thanks,
--

kdump showed that the owner field had some junk value which caused
the oops reported at http://lkml.org/lkml/2007/12/10/113 . This
patch sets the value of that field to NULL.

Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Maneesh Soni <[email protected]>

---
kernel/user.c | 1 +
1 files changed, 1 insertion(+)

Index: linux-2.6.22.13/kernel/user.c
===================================================================
--- linux-2.6.22.13.orig/kernel/user.c
+++ linux-2.6.22.13/kernel/user.c
@@ -145,6 +145,7 @@ ssize_t cpu_shares_store(struct kset *ks

static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
{
+ sa->attr.owner = NULL;
sa->attr.name = name;
sa->attr.mode = mode;
sa->show = cpu_shares_show;

--
regards,
Dhaval

2007-12-13 12:56:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


* Dhaval Giani <[email protected]> wrote:

> Could you please see if the following patch removes the oops due to
> CFS sysfs files? (There might still be the other oops due to the
> floppy sysfs files)
>
> Ingo, could you please add this patch in your CFS backport to 2.6.22
> and older kernels?

sure - i've updated the backport patches with this fix.

> static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
> {
> + sa->attr.owner = NULL;
> sa->attr.name = name;

i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
initialize the owner field to NULL automatically?

Ingo

2007-12-13 13:03:07

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, Dec 13, 2007 at 01:55:09PM +0100, Ingo Molnar wrote:
>
> * Dhaval Giani <[email protected]> wrote:
>
> > Could you please see if the following patch removes the oops due to
> > CFS sysfs files? (There might still be the other oops due to the
> > floppy sysfs files)
> >
> > Ingo, could you please add this patch in your CFS backport to 2.6.22
> > and older kernels?
>
> sure - i've updated the backport patches with this fix.
>

Thanks!

> > static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
> > {
> > + sa->attr.owner = NULL;
> > sa->attr.name = name;
>
> i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> initialize the owner field to NULL automatically?
>

Going through git log, it seems that commit
7b595756ec1f49e0049a9e01a1298d53a7faaa15 deemed attribute->owner as
unnecessary. I guess that answers the question.

--
regards,
Dhaval

2007-12-13 13:14:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


* Dhaval Giani <[email protected]> wrote:

> > > static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
> > > {
> > > + sa->attr.owner = NULL;
> > > sa->attr.name = name;
> >
> > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > initialize the owner field to NULL automatically?
> >
>
> Going through git log, it seems that commit
> 7b595756ec1f49e0049a9e01a1298d53a7faaa15 deemed attribute->owner as
> unnecessary. I guess that answers the question.

thx. The only open question seems to be: Vincent had sysfs crashes
without the CFS patchset as well.

Wouldnt it be prudent to backport the core bits of the above commit
(attached below), to make sure the owner field is never utilized.
(because it seems it's so easy and common to not maintain it properly)

Vincent, does the patch below resolve the non-CFS crashes?

Ingo

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 618b8ae..3c5574a 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -175,25 +175,20 @@ static int open(struct inode * inode, struct file * file)
if (!sysfs_get_active(attr_sd))
return -ENODEV;

- /* Grab the module reference for this attribute */
- error = -ENODEV;
- if (!try_module_get(attr->attr.owner))
- goto err_sput;
-
error = -EACCES;
if ((file->f_mode & FMODE_WRITE) && !(attr->write || attr->mmap))
- goto err_mput;
+ goto err_out;
if ((file->f_mode & FMODE_READ) && !(attr->read || attr->mmap))
- goto err_mput;
+ goto err_out;

error = -ENOMEM;
bb = kzalloc(sizeof(*bb), GFP_KERNEL);
if (!bb)
- goto err_mput;
+ goto err_out;

bb->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!bb->buffer)
- goto err_mput;
+ goto err_out;

mutex_init(&bb->mutex);
file->private_data = bb;
@@ -203,9 +198,7 @@ static int open(struct inode * inode, struct file * file)
sysfs_get(attr_sd);
return 0;

- err_mput:
- module_put(attr->attr.owner);
- err_sput:
+ err_out:
sysfs_put_active(attr_sd);
kfree(bb);
return error;
@@ -214,13 +207,11 @@ static int open(struct inode * inode, struct file * file)
static int release(struct inode * inode, struct file * file)
{
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
- struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
struct bin_buffer *bb = file->private_data;

if (bb->mmapped)
sysfs_put_active_two(attr_sd);
sysfs_put(attr_sd);
- module_put(attr->attr.owner);
kfree(bb->buffer);
kfree(bb);
return 0;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index d673d9b..a84b734 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -241,7 +241,6 @@ sysfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t
static int sysfs_open_file(struct inode *inode, struct file *file)
{
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
- struct attribute *attr = attr_sd->s_elem.attr.attr;
struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
@@ -251,11 +250,6 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
if (!sysfs_get_active_two(attr_sd))
return -ENODEV;

- /* Grab the module reference for this attribute */
- error = -ENODEV;
- if (!try_module_get(attr->owner))
- goto err_sput;
-
/* if the kobject has no ktype, then we assume that it is a subsystem
* itself, and use ops for it.
*/
@@ -272,7 +266,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
* or the subsystem have no operations.
*/
if (!ops)
- goto err_mput;
+ goto err_out;

/* File needs write support.
* The inode's perms must say it's ok,
@@ -280,7 +274,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
*/
if (file->f_mode & FMODE_WRITE) {
if (!(inode->i_mode & S_IWUGO) || !ops->store)
- goto err_mput;
+ goto err_out;
}

/* File needs read support.
@@ -289,7 +283,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
*/
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO) || !ops->show)
- goto err_mput;
+ goto err_out;
}

/* No error? Great, allocate a buffer for the file, and store it
@@ -298,7 +292,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
error = -ENOMEM;
buffer = kzalloc(sizeof(struct sysfs_buffer), GFP_KERNEL);
if (!buffer)
- goto err_mput;
+ goto err_out;

init_MUTEX(&buffer->sem);
buffer->needs_read_fill = 1;
@@ -310,9 +304,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
sysfs_get(attr_sd);
return 0;

- err_mput:
- module_put(attr->owner);
- err_sput:
+ err_out:
sysfs_put_active_two(attr_sd);
return error;
}
@@ -320,12 +312,9 @@ static int sysfs_open_file(struct inode *inode, struct file *file)
static int sysfs_release(struct inode * inode, struct file * filp)
{
struct sysfs_dirent *attr_sd = filp->f_path.dentry->d_fsdata;
- struct attribute *attr = attr_sd->s_elem.attr.attr;
struct sysfs_buffer *buffer = filp->private_data;

sysfs_put(attr_sd);
- /* After this point, attr should not be accessed. */
- module_put(attr->owner);

if (buffer) {
if (buffer->page)

2007-12-13 13:24:59

by Vincent Fortier

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, 2007-12-13 at 18:32 +0530, Dhaval Giani wrote:
> On Thu, Dec 13, 2007 at 01:55:09PM +0100, Ingo Molnar wrote:
> >
> > * Dhaval Giani <[email protected]> wrote:
> >
> > > Could you please see if the following patch removes the oops due to
> > > CFS sysfs files? (There might still be the other oops due to the
> > > floppy sysfs files)
> > >
> > > Ingo, could you please add this patch in your CFS backport to 2.6.22
> > > and older kernels?
> >
> > sure - i've updated the backport patches with this fix.
> >
>
> Thanks!

CFS v24 now does not apply correctly on a 2.6.22.15-rc1 here:
--- 31,43 ----
#include <linux/cn_proc.h>
#include <linux/getcpu.h>
#include <linux/task_io_accounting_ops.h>
+ #include <linux/seccomp.h>
#include <linux/cpu.h>

#include <linux/compat.h>
#include <linux/syscalls.h>
#include <linux/kprobes.h>
+ #include <linux/user_namespace.h>

#include <asm/uaccess.h>
#include <asm/io.h>


due to [patch 31/36] Revert "Fix SMP poweroff hangs
which removes:
-#include <linux/cpu.h>


About to build/test this morning.

thnx.

- vin


>
> > > static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
> > > {
> > > + sa->attr.owner = NULL;
> > > sa->attr.name = name;
> >
> > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > initialize the owner field to NULL automatically?
> >
>
> Going through git log, it seems that commit
> 7b595756ec1f49e0049a9e01a1298d53a7faaa15 deemed attribute->owner as
> unnecessary. I guess that answers the question.
>

2007-12-13 13:44:18

by Vincent Fortier

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, 2007-12-13 at 08:12 -0500, Ingo Molnar wrote:
>
> * Dhaval Giani <[email protected]> wrote:
>
> > > > static void user_attr_init(struct subsys_attribute *sa, char
> *name, int mode)
> > > > {
> > > > + sa->attr.owner = NULL;
> > > > sa->attr.name = name;
> > >
> > > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > > initialize the owner field to NULL automatically?
> > >
> >
> > Going through git log, it seems that commit
> > 7b595756ec1f49e0049a9e01a1298d53a7faaa15 deemed attribute->owner as
> > unnecessary. I guess that answers the question.
>
> thx. The only open question seems to be: Vincent had sysfs crashes
> without the CFS patchset as well.
>
> Wouldnt it be prudent to backport the core bits of the above commit
> (attached below), to make sure the owner field is never utilized.
> (because it seems it's so easy and common to not maintain it properly)
>
> Vincent, does the patch below resolve the non-CFS crashes?

I was about to test but it does not apply on a 2.6.22:
[root@printemps linux-2.6.22.15-rc1-patched]# patch -p1
< ../make_sure_owner_field_is_never_utilized.patch
patching file fs/sysfs/bin.c
Hunk #1 FAILED at 175.
Hunk #2 FAILED at 198.
Hunk #3 FAILED at 207.
3 out of 3 hunks FAILED -- saving rejects to file fs/sysfs/bin.c.rej
patching file fs/sysfs/file.c
Hunk #1 FAILED at 241.
Hunk #2 FAILED at 250.
Hunk #3 FAILED at 266.
Hunk #4 FAILED at 274.
Hunk #5 FAILED at 283.
Hunk #6 FAILED at 292.
Hunk #7 FAILED at 304.
Hunk #8 FAILED at 312.
8 out of 8 hunks FAILED -- saving rejects to file fs/sysfs/file.c.rej

I was about to backport it but I find it's not that trivial... Help
would be appreciated.

- vin

2007-12-13 14:01:31

by Kay Sievers

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, 2007-12-13 at 13:55 +0100, Ingo Molnar wrote:
> * Dhaval Giani <[email protected]> wrote:
>
> > Could you please see if the following patch removes the oops due to
> > CFS sysfs files? (There might still be the other oops due to the
> > floppy sysfs files)
> >
> > Ingo, could you please add this patch in your CFS backport to 2.6.22
> > and older kernels?
>
> sure - i've updated the backport patches with this fix.
>
> > static void user_attr_init(struct subsys_attribute *sa, char *name, int mode)
> > {
> > + sa->attr.owner = NULL;
> > sa->attr.name = name;
>
> i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> initialize the owner field to NULL automatically?

Attibutes do not have an owner anymore:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15

Kay

2007-12-13 14:41:59

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, Dec 13, 2007 at 01:24:26PM +0000, Vincent Fortier wrote:
> On Thu, 2007-12-13 at 18:32 +0530, Dhaval Giani wrote:
> > On Thu, Dec 13, 2007 at 01:55:09PM +0100, Ingo Molnar wrote:
> > >
> > > * Dhaval Giani <[email protected]> wrote:
> > >
> > > > Could you please see if the following patch removes the oops due to
> > > > CFS sysfs files? (There might still be the other oops due to the
> > > > floppy sysfs files)
> > > >
> > > > Ingo, could you please add this patch in your CFS backport to 2.6.22
> > > > and older kernels?
> > >
> > > sure - i've updated the backport patches with this fix.
> > >
> >
> > Thanks!
>
> CFS v24 now does not apply correctly on a 2.6.22.15-rc1 here:

Could you try on 2.6.22.13/14, while we wait for Ingo ;).
--
regards,
Dhaval

2007-12-13 15:02:49

by Vincent Fortier

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, 2007-12-13 at 08:59 -0500, Kay Sievers wrote:
> On Thu, 2007-12-13 at 13:55 +0100, Ingo Molnar wrote:
> > * Dhaval Giani <[email protected]> wrote:
> >
> > > Could you please see if the following patch removes the oops due
> to
> > > CFS sysfs files? (There might still be the other oops due to the
> > > floppy sysfs files)
> > >
> > > Ingo, could you please add this patch in your CFS backport to
> 2.6.22
> > > and older kernels?
> >
> > sure - i've updated the backport patches with this fix.
> >
> > > static void user_attr_init(struct subsys_attribute *sa, char
> *name, int mode)
> > > {
> > > + sa->attr.owner = NULL;
> > > sa->attr.name = name;
> >
> > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > initialize the owner field to NULL automatically?
>
> Attibutes do not have an owner anymore:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15

This one also fails to apply properly at the exact same place has Ingo's
previously posted patch. Would need to backport his one.

> Kay

- vin

2007-12-13 16:25:07

by Kay Sievers

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


On Thu, 2007-12-13 at 15:02 +0000, Vincent Fortier wrote:
> On Thu, 2007-12-13 at 08:59 -0500, Kay Sievers wrote:
> > On Thu, 2007-12-13 at 13:55 +0100, Ingo Molnar wrote:
> > > * Dhaval Giani <[email protected]> wrote:
> > >
> > > > Could you please see if the following patch removes the oops due
> > to
> > > > CFS sysfs files? (There might still be the other oops due to the
> > > > floppy sysfs files)
> > > >
> > > > Ingo, could you please add this patch in your CFS backport to
> > 2.6.22
> > > > and older kernels?
> > >
> > > sure - i've updated the backport patches with this fix.
> > >
> > > > static void user_attr_init(struct subsys_attribute *sa, char
> > *name, int mode)
> > > > {
> > > > + sa->attr.owner = NULL;
> > > > sa->attr.name = name;
> > >
> > > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > > initialize the owner field to NULL automatically?
> >
> > Attibutes do not have an owner anymore:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15
>
> This one also fails to apply properly at the exact same place has Ingo's
> previously posted patch. Would need to backport his one.

It depends on a completely reworked sysfs logic, I don't think it makes
any sense to backport that.

Kay

2007-12-13 16:51:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


* Kay Sievers <[email protected]> wrote:

> > > > > + sa->attr.owner = NULL;
> > > > > sa->attr.name = name;
> > > >
> > > > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > > > initialize the owner field to NULL automatically?
> > >
> > > Attibutes do not have an owner anymore:
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15
> >
> > This one also fails to apply properly at the exact same place has
> > Ingo's previously posted patch. Would need to backport his one.
>
> It depends on a completely reworked sysfs logic, I don't think it
> makes any sense to backport that.

well, if it fixes a live bug in a still supported stable kernel
release...

Vincent, could you try to just get rid of all actual uses of
se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
(totally untested - might be fatally broken as well)

Ingo

---
fs/sysfs/bin.c | 7 -------
fs/sysfs/file.c | 13 -------------
2 files changed, 20 deletions(-)

Index: linux-cfs-2.6.22.13.q/fs/sysfs/bin.c
===================================================================
--- linux-cfs-2.6.22.13.q.orig/fs/sysfs/bin.c
+++ linux-cfs-2.6.22.13.q/fs/sysfs/bin.c
@@ -125,11 +125,6 @@ static int open(struct inode * inode, st
if (!kobj || !attr)
goto Done;

- /* Grab the module reference for this attribute if we have one */
- error = -ENODEV;
- if (!try_module_get(attr->attr.owner))
- goto Done;
-
error = -EACCES;
if ((file->f_mode & FMODE_WRITE) && !(attr->write || attr->mmap))
goto Error;
@@ -145,7 +140,6 @@ static int open(struct inode * inode, st
goto Done;

Error:
- module_put(attr->attr.owner);
Done:
if (error)
kobject_put(kobj);
@@ -159,7 +153,6 @@ static int release(struct inode * inode,
u8 * buffer = file->private_data;

kobject_put(kobj);
- module_put(attr->attr.owner);
kfree(buffer);
return 0;
}
Index: linux-cfs-2.6.22.13.q/fs/sysfs/file.c
===================================================================
--- linux-cfs-2.6.22.13.q.orig/fs/sysfs/file.c
+++ linux-cfs-2.6.22.13.q/fs/sysfs/file.c
@@ -257,12 +257,6 @@ static int sysfs_open_file(struct inode
if (!kobj || !attr)
goto Einval;

- /* Grab the module reference for this attribute if we have one */
- if (!try_module_get(attr->owner)) {
- error = -ENODEV;
- goto Done;
- }
-
/* if the kobject has no ktype, then we assume that it is a subsystem
* itself, and use ops for it.
*/
@@ -332,7 +326,6 @@ static int sysfs_open_file(struct inode
goto Done;
Eaccess:
error = -EACCES;
- module_put(attr->owner);
Done:
if (error)
kobject_put(kobj);
@@ -343,14 +336,12 @@ static int sysfs_release(struct inode *
{
struct kobject * kobj = to_kobj(filp->f_path.dentry->d_parent);
struct attribute * attr = to_attr(filp->f_path.dentry);
- struct module * owner = attr->owner;
struct sysfs_buffer * buffer = filp->private_data;

if (buffer)
remove_from_collection(buffer, inode);
kobject_put(kobj);
/* After this point, attr should not be accessed. */
- module_put(owner);

if (buffer) {
if (buffer->page)
@@ -615,7 +606,6 @@ static void sysfs_schedule_callback_work

(ss->func)(ss->data);
kobject_put(ss->kobj);
- module_put(ss->owner);
kfree(ss);
}

@@ -644,11 +634,8 @@ int sysfs_schedule_callback(struct kobje
{
struct sysfs_schedule_callback_struct *ss;

- if (!try_module_get(owner))
- return -ENODEV;
ss = kmalloc(sizeof(*ss), GFP_KERNEL);
if (!ss) {
- module_put(owner);
return -ENOMEM;
}
kobject_get(kobj);

2007-12-13 17:11:11

by Kay Sievers

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, 2007-12-13 at 17:50 +0100, Ingo Molnar wrote:
> * Kay Sievers <[email protected]> wrote:
>
> > > > > > + sa->attr.owner = NULL;
> > > > > > sa->attr.name = name;
> > > > >
> > > > > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > > > > initialize the owner field to NULL automatically?
> > > >
> > > > Attibutes do not have an owner anymore:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15
> > >
> > > This one also fails to apply properly at the exact same place has
> > > Ingo's previously posted patch. Would need to backport his one.
> >
> > It depends on a completely reworked sysfs logic, I don't think it
> > makes any sense to backport that.
>
> well, if it fixes a live bug in a still supported stable kernel
> release...
>
> Vincent, could you try to just get rid of all actual uses of
> se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> (totally untested - might be fatally broken as well)

How can you think that this is not needed? You can not remove it with
sysfs you are patching. Hope this explains it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced

Kay

2007-12-13 17:21:41

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, Dec 13, 2007 at 05:50:59PM +0100, Ingo Molnar wrote:
>
> * Kay Sievers <[email protected]> wrote:
>
> > > > > > + sa->attr.owner = NULL;
> > > > > > sa->attr.name = name;
> > > > >
> > > > > i'm wondering why doesnt this affect 2.6.23 and later? Does sysfs
> > > > > initialize the owner field to NULL automatically?
> > > >
> > > > Attibutes do not have an owner anymore:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b595756ec1f49e0049a9e01a1298d53a7faaa15
> > >
> > > This one also fails to apply properly at the exact same place has
> > > Ingo's previously posted patch. Would need to backport his one.
> >
> > It depends on a completely reworked sysfs logic, I don't think it
> > makes any sense to backport that.
>
> well, if it fixes a live bug in a still supported stable kernel
> release...
>
> Vincent, could you try to just get rid of all actual uses of
> se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> (totally untested - might be fatally broken as well)
>

hmm. I am not too sure if it is a good idea. I think it will break a lot
of drivers. But I will just wait for the sysfs experts to speak up thre.

--
regards,
Dhaval

2007-12-13 20:22:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?


* Kay Sievers <[email protected]> wrote:

> > > > This one also fails to apply properly at the exact same place
> > > > has Ingo's previously posted patch. Would need to backport his
> > > > one.
> > >
> > > It depends on a completely reworked sysfs logic, I don't think it
> > > makes any sense to backport that.
> >
> > well, if it fixes a live bug in a still supported stable kernel
> > release...
> >
> > Vincent, could you try to just get rid of all actual uses of
> > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > (totally untested - might be fatally broken as well)
>
> How can you think that this is not needed? You can not remove it with
> sysfs you are patching. Hope this explains it:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced

yeah - as i said it might be fatally broken (in fact it is). Do we
understand why Vincent got the crashes with vanilla 2.6.22.14 ?

Ingo

2007-12-14 02:15:07

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:
>
> * Kay Sievers <[email protected]> wrote:
>
> > > > > This one also fails to apply properly at the exact same place
> > > > > has Ingo's previously posted patch. Would need to backport his
> > > > > one.
> > > >
> > > > It depends on a completely reworked sysfs logic, I don't think it
> > > > makes any sense to backport that.
> > >
> > > well, if it fixes a live bug in a still supported stable kernel
> > > release...
> > >
> > > Vincent, could you try to just get rid of all actual uses of
> > > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > > (totally untested - might be fatally broken as well)
> >
> > How can you think that this is not needed? You can not remove it with
> > sysfs you are patching. Hope this explains it:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced
>
> yeah - as i said it might be fatally broken (in fact it is). Do we
> understand why Vincent got the crashes with vanilla 2.6.22.14 ?
>

My guess is some variables have probably been left uninitialized. I am a
bit too scared to look into sysfs parts of the code now.

--
regards,
Dhaval

2007-12-14 16:53:47

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:
>
> * Kay Sievers <[email protected]> wrote:
>
> > > > > This one also fails to apply properly at the exact same place
> > > > > has Ingo's previously posted patch. Would need to backport his
> > > > > one.
> > > >
> > > > It depends on a completely reworked sysfs logic, I don't think it
> > > > makes any sense to backport that.
> > >
> > > well, if it fixes a live bug in a still supported stable kernel
> > > release...
> > >
> > > Vincent, could you try to just get rid of all actual uses of
> > > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > > (totally untested - might be fatally broken as well)
> >
> > How can you think that this is not needed? You can not remove it with
> > sysfs you are patching. Hope this explains it:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced
>
> yeah - as i said it might be fatally broken (in fact it is). Do we
> understand why Vincent got the crashes with vanilla 2.6.22.14 ?

No, and I can't seem to duplicate them here at all.

Does anyone have a test case for this that I can work on trying to
duplicate?

thanks,

greg k-h

2007-12-14 17:08:19

by Dhaval Giani

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Fri, Dec 14, 2007 at 08:26:42AM -0800, Greg KH wrote:
> On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:
> >
> > * Kay Sievers <[email protected]> wrote:
> >
> > > > > > This one also fails to apply properly at the exact same place
> > > > > > has Ingo's previously posted patch. Would need to backport his
> > > > > > one.
> > > > >
> > > > > It depends on a completely reworked sysfs logic, I don't think it
> > > > > makes any sense to backport that.
> > > >
> > > > well, if it fixes a live bug in a still supported stable kernel
> > > > release...
> > > >
> > > > Vincent, could you try to just get rid of all actual uses of
> > > > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > > > (totally untested - might be fatally broken as well)
> > >
> > > How can you think that this is not needed? You can not remove it with
> > > sysfs you are patching. Hope this explains it:
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced
> >
> > yeah - as i said it might be fatally broken (in fact it is). Do we
> > understand why Vincent got the crashes with vanilla 2.6.22.14 ?
>
> No, and I can't seem to duplicate them here at all.
>
> Does anyone have a test case for this that I can work on trying to
> duplicate?
>

If you apply CFS without my fix, and try to constantly check cpu_shares
for a user who is logging and logging out, you should hit it. (That's
what I was doing).

--
regards,
Dhaval

2007-12-14 17:29:25

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

On Fri, Dec 14, 2007 at 10:37:39PM +0530, Dhaval Giani wrote:
> On Fri, Dec 14, 2007 at 08:26:42AM -0800, Greg KH wrote:
> > On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:
> > >
> > > * Kay Sievers <[email protected]> wrote:
> > >
> > > > > > > This one also fails to apply properly at the exact same place
> > > > > > > has Ingo's previously posted patch. Would need to backport his
> > > > > > > one.
> > > > > >
> > > > > > It depends on a completely reworked sysfs logic, I don't think it
> > > > > > makes any sense to backport that.
> > > > >
> > > > > well, if it fixes a live bug in a still supported stable kernel
> > > > > release...
> > > > >
> > > > > Vincent, could you try to just get rid of all actual uses of
> > > > > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > > > > (totally untested - might be fatally broken as well)
> > > >
> > > > How can you think that this is not needed? You can not remove it with
> > > > sysfs you are patching. Hope this explains it:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced
> > >
> > > yeah - as i said it might be fatally broken (in fact it is). Do we
> > > understand why Vincent got the crashes with vanilla 2.6.22.14 ?
> >
> > No, and I can't seem to duplicate them here at all.
> >
> > Does anyone have a test case for this that I can work on trying to
> > duplicate?
> >
>
> If you apply CFS without my fix, and try to constantly check cpu_shares
> for a user who is logging and logging out, you should hit it. (That's
> what I was doing).

Hm, how about a "vanilla 2.6.22.14 kernel _without_ any patches".
That's what I am most worried about :)

thanks,

greg k-h

2007-12-20 13:49:37

by Vincent Fortier

[permalink] [raw]
Subject: Re: 2.6.22.14 oops msg with commvault galaxy ?

Le vendredi 14 d?cembre 2007 ? 09:28 -0800, Greg KH a ?crit :
> On Fri, Dec 14, 2007 at 10:37:39PM +0530, Dhaval Giani wrote:
> > On Fri, Dec 14, 2007 at 08:26:42AM -0800, Greg KH wrote:
> > > On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:
> > > >
> > > > * Kay Sievers <[email protected]> wrote:
> > > >
> > > > > > > > This one also fails to apply properly at the exact same place
> > > > > > > > has Ingo's previously posted patch. Would need to backport his
> > > > > > > > one.
> > > > > > >
> > > > > > > It depends on a completely reworked sysfs logic, I don't think it
> > > > > > > makes any sense to backport that.
> > > > > >
> > > > > > well, if it fixes a live bug in a still supported stable kernel
> > > > > > release...
> > > > > >
> > > > > > Vincent, could you try to just get rid of all actual uses of
> > > > > > se->attr.owner, within fs/sysfs/*.c? Something like the patch below.
> > > > > > (totally untested - might be fatally broken as well)
> > > > >
> > > > > How can you think that this is not needed? You can not remove it with
> > > > > sysfs you are patching. Hope this explains it:
> > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced
> > > >
> > > > yeah - as i said it might be fatally broken (in fact it is). Do we
> > > > understand why Vincent got the crashes with vanilla 2.6.22.14 ?
> > >
> > > No, and I can't seem to duplicate them here at all.
> > >
> > > Does anyone have a test case for this that I can work on trying to
> > > duplicate?
> > >
> >
> > If you apply CFS without my fix, and try to constantly check cpu_shares
> > for a user who is logging and logging out, you should hit it. (That's
> > what I was doing).
>
> Hm, how about a "vanilla 2.6.22.14 kernel _without_ any patches".
> That's what I am most worried about :)

Since I was getting the problem with both vanilla & CFS patched kernels
and that, sadly, I don't have the time to do git bisect at the moment I
decided to go ahead and prepare a full migration to 2.6.23 (I was hoping
to skip directly to 2.6.24 but...).

I can confirm at the moment that 2.6.23 works properly with Galaxy (just
has 2.6.20 & 2.6.21 used to...).

Thnx very much everyone for the help but sadly this bug will have to
remain unresolved.

> thanks,

- vin