2003-08-05 08:00:46

by Stephan von Krawczynski

[permalink] [raw]
Subject: decoded problem in 2.4.22-pre10

Hello all,

the testbox crashed again this night, unfortunately I made a mistake yesterday
and started vmware once. Although only the usual modules were loaded at crash
time and not the application, the kernel was tainted of course.
Nevertheless I present the data:

Everthing started with this it seems:

journal-2332: Trying to log block 4316, which is a log block
(device sd(8,17))

Then:


ksymoops 2.4.8 on i686 2.4.22-pre10. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-pre10/ (default)
-m /boot/System.map-2.4.22-pre10 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

kernel BUG at prints.c:341!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c018bca5>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000053 ebx: f7117000 ecx: f59ae000 edx: f5f2ff7c
esi: f6e63740 edi: f8acc310 ebp: f8ac727c esp: f59afd28
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 1726, stackpage=f59af000)
Stack: c02b48dc c037dba0 c037c720 f7117000 c019d12c f7117000 c02bbdc0 000010dc
00000006 f7117000 00000001 f8aa63e4 00000004 00000002 00000000 00000001
00000002 3f2f143f f598e000 f2aebb60 f2aeb560 f2abf000 f2ac4000 f8acc310
Call Trace: [<c019d12c>] [<c019bafc>] [<c019c3b2>] [<c014519f>] [<c019c47f>]
[<c0183eff>] [<f8c84fc8>] [<f8c854f4>] [<c028e61f>] [<f8c814fe>] [<f8c91c60>]
[<f8c80699>] [<f8c65938>] [<f8c91c60>] [<f8c91a28>] [<f8c91a58>] [<f8c80411>]
[<f8c91a20>] [<c010592e>] [<f8c80210>]
Code: 0f 0b 55 01 ef 48 2b c0 85 db b8 f8 48 2b c0 74 0c 0f b7 43


>>EIP; c018bca5 <reiserfs_panic+45/80> <=====

>>ebx; f7117000 <_end+36d6bde0/3852ee40>
>>ecx; f59ae000 <_end+35602de0/3852ee40>
>>edx; f5f2ff7c <_end+35b84d5c/3852ee40>
>>esi; f6e63740 <_end+36ab8520/3852ee40>
>>edi; f8acc310 <[3w-xxxx]tw_device_extension_list+1e9050/271da0>
>>ebp; f8ac727c <[3w-xxxx]tw_device_extension_list+1e3fbc/271da0>
>>esp; f59afd28 <_end+35604b08/3852ee40>

Trace; c019d12c <do_journal_end+bac/bb0>
Trace; c019bafc <journal_end_sync+3c/a0>
Trace; c019c3b2 <__commit_trans_index+72/a0>
Trace; c014519f <fsync_buffers_list+18f/1b0>
Trace; c019c47f <reiserfs_commit_for_inode+3f/80>
Trace; c0183eff <reiserfs_sync_file+6f/d0>
Trace; f8c84fc8 <[lockd]nlmclt_decode_testres+28/160>
Trace; f8c854f4 <[lockd]nlm_procname+4/20>
Trace; c028e61f <inet_sendmsg+3f/50>
Trace; f8c814fe <[lockd]nlmsvc_notify_blocked+4e/f0>
Trace; f8c91c60 <[nfsd]nfsd_access+a0/100>
Trace; f8c80699 <[lockd]lockd_up+49/140>
Trace; f8c65938 <[vmnet]__constant_c_and_count_memset+4a/75>
Trace; f8c91c60 <[nfsd]nfsd_access+a0/100>
Trace; f8c91a28 <[nfsd]nfsd_setattr+3f8/590>
Trace; f8c91a58 <[nfsd]nfsd_setattr+428/590>
Trace; f8c80411 <[lockd]lockd+e1/320>
Trace; f8c91a20 <[nfsd]nfsd_setattr+3f0/590>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; f8c80210 <[lockd]__constant_c_and_count_memset+50/c9>

Code; c018bca5 <reiserfs_panic+45/80>
00000000 <_EIP>:
Code; c018bca5 <reiserfs_panic+45/80> <=====
0: 0f 0b ud2a <=====
Code; c018bca7 <reiserfs_panic+47/80>
2: 55 push %ebp
Code; c018bca8 <reiserfs_panic+48/80>
3: 01 ef add %ebp,%edi
Code; c018bcaa <reiserfs_panic+4a/80>
5: 48 dec %eax
Code; c018bcab <reiserfs_panic+4b/80>
6: 2b c0 sub %eax,%eax
Code; c018bcad <reiserfs_panic+4d/80>
8: 85 db test %ebx,%ebx
Code; c018bcaf <reiserfs_panic+4f/80>
a: b8 f8 48 2b c0 mov $0xc02b48f8,%eax
Code; c018bcb4 <reiserfs_panic+54/80>
f: 74 0c je 1d <_EIP+0x1d>
Code; c018bcb6 <reiserfs_panic+56/80>
11: 0f b7 43 00 movzwl 0x0(%ebx),%eax


1 warning issued. Results may not be reliable.

Regards,
Stephan


2003-08-05 10:21:00

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On Tue, 5 Aug 2003 10:00:40 +0200
Stephan von Krawczynski <[email protected]> wrote:

> Hello all,
>
> the testbox crashed again this night, unfortunately I made a mistake
> yesterday and started vmware once. Although only the usual modules were
> loaded at crash time and not the application, the kernel was tainted of
> course. Nevertheless I present the data:

I re-checked the setup with vmware and found out I can shoot it down in no
time. So you probably should just forget about this bug report, because loading
vmware modules does obviously do harm.

Sorry for bothering you.
Regards,
Stephan

2003-08-05 12:03:06

by Petr Vandrovec

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On 5 Aug 03 at 12:20, Stephan von Krawczynski wrote:
> On Tue, 5 Aug 2003 10:00:40 +0200
> Stephan von Krawczynski <[email protected]> wrote:
>
> > Hello all,
> >
> > the testbox crashed again this night, unfortunately I made a mistake
> > yesterday and started vmware once. Although only the usual modules were
> > loaded at crash time and not the application, the kernel was tainted of
> > course. Nevertheless I present the data:
>
> I re-checked the setup with vmware and found out I can shoot it down in no
> time. So you probably should just forget about this bug report, because loading
> vmware modules does obviously do harm.

Any details? Were there some warning while vmmon was built?
Petr


2003-08-05 12:24:01

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On Tue, 5 Aug 2003 14:02:38 +0200
"Petr Vandrovec" <[email protected]> wrote:

> On 5 Aug 03 at 12:20, Stephan von Krawczynski wrote:
> > On Tue, 5 Aug 2003 10:00:40 +0200
> > Stephan von Krawczynski <[email protected]> wrote:
> >
> > > Hello all,
> > >
> > > the testbox crashed again this night, unfortunately I made a mistake
> > > yesterday and started vmware once. Although only the usual modules were
> > > loaded at crash time and not the application, the kernel was tainted of
> > > course. Nevertheless I present the data:
> >
> > I re-checked the setup with vmware and found out I can shoot it down in no
> > time. So you probably should just forget about this bug report, because
> > loading vmware modules does obviously do harm.
>
> Any details? Were there some warning while vmmon was built?
> Petr

Hello Petr,

at this time I can't provide you with details or exact reporting as the box has
to be used for finding the 2.4.22-pre stability problem I see. And since the
crashes take quite some time to occur I cannot reboot and check out what's the
deal with the vmware modules.
And frankly: I find the application quite ok but tainting the kernel with the
closed source modules is really something to think about, especially since
there should be easy ways to avoid that completely.
Btw I already stopped using nvidia equipment completely due to not being able
to produce valuable debugging output while running an nvidia-tainted kernels.

I might come back to your request for details when 2.4.22 got stable.

Regards,
Stephan



2003-08-05 12:47:31

by Petr Vandrovec

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On 5 Aug 03 at 14:23, Stephan von Krawczynski wrote:
>
> Hello Petr,
>
> at this time I can't provide you with details or exact reporting as the box has
> to be used for finding the 2.4.22-pre stability problem I see. And since the
> crashes take quite some time to occur I cannot reboot and check out what's the
> deal with the vmware modules.
> And frankly: I find the application quite ok but tainting the kernel with the
> closed source modules is really something to think about, especially since
> there should be easy ways to avoid that completely.

This is not true. VMware modules are open source, they are just non-GPL.

And no, it is impossible to avoid them. At least nobody I know knows how
to avoid them.

There is only known problem (fixed in
ftp://platan.vc.cvut.cz/pub/vmware/vmware-any-any-update38.tar.gz) that
SuSE backported epoll patches from 2.5.x to both 2.4.19 and 2.4.20, and
while this seriously changes poll_initwait semantic, it caused only
warning at compile time, but at runtime it was corrupting kernel
stack. But I do not see epoll patches in 2.4.22pre10, so it must
be something else.
Best regards,
Petr Vandrovec

2003-08-05 12:40:20

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10



On Tue, 5 Aug 2003, Stephan von Krawczynski wrote:

> Hello all,
>
> the testbox crashed again this night, unfortunately I made a mistake yesterday
> and started vmware once. Although only the usual modules were loaded at crash
> time and not the application, the kernel was tainted of course.
> Nevertheless I present the data:
>
> Everthing started with this it seems:
>
> journal-2332: Trying to log block 4316, which is a log block
> (device sd(8,17))
>
> Then:

Hello Stephan,

Mind trying to reproduce the problem without the vmware modules?

I think they might be the problem here.

2003-08-05 12:58:52

by Petr Vandrovec

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

> On 5 Aug 03 at 14:23, Stephan von Krawczynski wrote:
> >
> > Hello Petr,
> >
> > at this time I can't provide you with details or exact reporting as the box has
> > to be used for finding the 2.4.22-pre stability problem I see. And since the
> > crashes take quite some time to occur I cannot reboot and check out what's the
> > deal with the vmware modules.

One more thing. VMware creates file in /tmp, unlinks it, and then file
gradually expands as VM is initialized, so it grews from 0 to your
guest memory size + videoram size + ~5MB, while at same time all portions
of that file are MAP_SHARED to several processes.

And at exit VMware does ftruncate(xxx,0) to throw away unneeded data,
preventing them from hitting disk on subsequent (f)sync(), and this
ftruncate() happens while other processes which have mmapped file are
doing munmap or exit, finding dirty pages which have no
underlying storage anymore during cleanup...

Both these operations were observed to cause problems in the past -
- on startup long ago reiserfs had problems with grewing unlinked files,
on shutdown kernel's mm raced with ftruncate. Both these problems are
currently fixed, but maybe some other race appeared somewhere?
Best regards,
Petr Vandrovec


2003-08-05 13:24:27

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On Tue, 5 Aug 2003 14:57:43 +0200
"Petr Vandrovec" <[email protected]> wrote:

> [Petr on ftruncate]

I can tell you this for sure: I don't need to start the app, only loading the
modules with their usual script is enough to shoot the box during heavy network
load.

At the moment I only have a partly decoded oops at hand where all the
vmware-modules symbols are missing (network action is Gig btw):


ksymoops 2.4.8 on i686 2.4.22-pre10. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-pre10/ (default)
-m /boot/System.map-2.4.22-pre10 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Unable to handle kernel paging request at virtual address 80f00064
c01d428b
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01d428b>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00000642 ebx: 00010000 ecx: 275da012 edx: 00000000
esi: 80f00000 edi: 00000090 ebp: f5dd5200 esp: f5603cb8
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 2049, stackpage=f5603000)
Stack: c34e3d60 00010000 00000090 007c3758 000003be c34e3da0 000003ba 00000001
00000291 007c3690 00010000 00000040 c34e3d60 00000292 c34e3c00 c01d457e
c34e3d60 0000003f 00000001 c34e3cc0 c34e3c00 c034ffdc c034ffc0 c02539d2
Call Trace: [<c01d457e>] [<c02539d2>] [<c01222d6>] [<c0109508>] [<c010c048>]
[<c0130018>] [<c0138d70>] [<c0130c86>] [<c0133d4a>] [<c013431a>] [<f8c92419>]
[<f8c98a3a>] [<f8c9f1fc>] [<f8c8d699>] [<f8c72938>] [<f8c9f1fc>] [<f8c9ea38>]
[<f8c9ea58>] [<f8c8d411>] [<c010592e>] [<f8c8d210>]
Code: 8b 5e 64 85 db 0f 85 48 02 00 00 8b 44 24 18 8b 8e 88 00 00


>>EIP; c01d428b <tg3_rx+14b/3b0> <=====

>>ebp; f5dd5200 <_end+35a29fe0/3852ee40>
>>esp; f5603cb8 <_end+35258a98/3852ee40>

Trace; c01d457e <tg3_poll+8e/150>
Trace; c02539d2 <net_rx_action+e2/160>
Trace; c01222d6 <do_softirq+76/e0>
Trace; c0109508 <do_IRQ+d8/f0>
Trace; c010c048 <call_do_IRQ+5/d>
Trace; c0130018 <.text.lock.mmap+b7/cf>
Trace; c0138d70 <lru_cache_add+10/70>
Trace; c0130c86 <add_to_page_cache_unique+56/90>
Trace; c0133d4a <do_generic_file_write+1ba/4b0>
Trace; c013431a <generic_file_write+8a/150>
Trace; f8c92419 <[nfsd]nfsd_procedures3+319/320>
Trace; f8c98a3a <.data.end+65db/????>
Trace; f8c9f1fc <END_OF_CODE+cd9d/????>
Trace; f8c8d699 <[nfsd]nfs3svc_decode_readargs+a9/100>
Trace; f8c72938 <[lockd]nlmclnt_lookup_host+18/30>
Trace; f8c9f1fc <END_OF_CODE+cd9d/????>
Trace; f8c9ea38 <END_OF_CODE+c5d9/????>
Trace; f8c9ea58 <END_OF_CODE+c5f9/????>
Trace; f8c8d411 <[nfsd]nfs3svc_decode_sattrargs+c1/f0>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; f8c8d210 <[nfsd]encode_wcc_data+50/f0>

Code; c01d428b <tg3_rx+14b/3b0>
00000000 <_EIP>:
Code; c01d428b <tg3_rx+14b/3b0> <=====
0: 8b 5e 64 mov 0x64(%esi),%ebx <=====
Code; c01d428e <tg3_rx+14e/3b0>
3: 85 db test %ebx,%ebx
Code; c01d4290 <tg3_rx+150/3b0>
5: 0f 85 48 02 00 00 jne 253 <_EIP+0x253>
Code; c01d4296 <tg3_rx+156/3b0>
b: 8b 44 24 18 mov 0x18(%esp,1),%eax
Code; c01d429a <tg3_rx+15a/3b0>
f: 8b 8e 88 00 00 00 mov 0x88(%esi),%ecx

<0>Kernel panic: Aiee, killing interrupt handler!

1 warning issued. Results may not be reliable.


Regards,
Stephan

2003-08-06 10:27:06

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: decoded problem in 2.4.22-pre10

On Tue, 5 Aug 2003 14:46:52 +0200
"Petr Vandrovec" <[email protected]> wrote:

> On 5 Aug 03 at 14:23, Stephan von Krawczynski wrote:
> >
> > Hello Petr,
> > [...]
> > And frankly: I find the application quite ok but tainting the kernel with
> > the closed source modules is really something to think about, especially
> > since there should be easy ways to avoid that completely.
>
> This is not true. VMware modules are open source, they are just non-GPL.

Sorry for my incorrect description of what I really meant. In fact I wanted to
express that the _tainting_ (obviously meaning lack of GPL licensing) should be
thought about.

Regards,
Stephan