2003-01-12 18:46:16

by ghugh Song

[permalink] [raw]
Subject: Nervous with 2.4.21-pre3 and -pre3-ac*


Many people including me are getting unusual Kernel
trouble recently with 2.4.21-pre3-ac*. In my case, with
2.4.21-pre3-ac2 I got segmentation fault from
a command (tar) where I never suspected. Yet no one seems to know
what part of the the kernel update caused all this
trouble.

Does anyone have any guess?

Regards,

G,. H. S.


2003-01-12 19:22:41

by Alan

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*

On Sun, 2003-01-12 at 18:55, ghugh Song wrote:
> Many people including me are getting unusual Kernel
> trouble recently with 2.4.21-pre3-ac*. In my case, with
> 2.4.21-pre3-ac2 I got segmentation fault from
> a command (tar) where I never suspected. Yet no one seems to know
> what part of the the kernel update caused all this
> trouble.
>
> Does anyone have any guess?

At the moment I am not sure. Its stable on my boxes using gcc 3.1 and
built from make distclean. At least one reporter found a patch and
build over an old built tree failed but a clean tree did not.

The obvious candidates assuming 2.4.21-pre3 is stable are the mm/shmem.c
changes (you can back out just the diff to that file and retest which
would be interesting), or the buffer cache changes which I plan to drop
out to test soon.

Neither of these two changes are due for Marcelo.

Are you using highmem (> 900Mb RAM in the box)

Alan

2003-01-13 05:13:57

by ghugh Song

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*


> Are you using highmem (> 900Mb RAM in the box)
>
> Alan

Yes, it's got 1 GB of ram in the box of P4 with i845G.
in an ASUS P4PE motherboard.

BTW, After several segmentation faults from a few repeated tries
of unsuccessful tar command, the machine got frozen.

Regards,

G. H. S>

2003-01-13 05:20:59

by khromy

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*

On Mon, Jan 13, 2003 at 12:21:51AM -0500, ghugh Song wrote:
> BTW, After several segmentation faults from a few repeated tries
> of unsuccessful tar command, the machine got frozen.

I had the same thing happen here using 2.4.21-pre3-ac2. cpio segfaulted
and then the machine hung. I went back to 2.4.21-pre3 and all is well,
so far.

PIII866, 256MB RAM

Bus 0, device 7, function 1:
IDE interface: VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE (rev 6).
Master Capable. Latency=32.
I/O at 0xd000 [0xd00f].

--
L1: khromy ;khromy(at)lnuxlab.ath.cx

2003-01-13 08:16:31

by Dee

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*


> On Sun, 2003-01-12 at 18:55, ghugh Song wrote:
> > Does anyone have any guess?


I have this same problem with segfaulting and locks.
I did get this in the log once tho before a hang.

Jan 12 15:51:37 ghost kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000004
Jan 12 15:51:37 ghost kernel: *pde = 00000000
Jan 12 15:51:37 ghost last message repeated 2 times
Jan 12 15:51:38 ghost gpm[142]: oops() invoked from gpm.c(147)
Jan 12 15:51:38 ghost gpm[142]: /dev/vc/0: Input/output error

I also had it hang once when doing a ls on the dev dir aswell.
I am using devfs with debug and auto mount.
Ld seems to mess with it too, it hung twice in a row when
making the piggy.o on a make bzImage, on the pre3-ac3
kernel. Third time it went without a problem. Thought this might help.


Dee

2003-01-13 13:56:35

by Alan

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*

On Mon, 2003-01-13 at 05:21, ghugh Song wrote:
> > Are you using highmem (> 900Mb RAM in the box)
> >
> > Alan
>
> Yes, it's got 1 GB of ram in the box of P4 with i845G.
> in an ASUS P4PE motherboard.
>
> BTW, After several segmentation faults from a few repeated tries
> of unsuccessful tar command, the machine got frozen.

If you build a kernel with highmem disabled does it become stable ?

2003-01-16 07:09:52

by ghugh Song

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*


Alan Cox wrote:
> If you build a kernel with highmem disabled does it become stable ?

No. It's got frozen with highmem disabled.
Highmem does not seem to matter.

Regards,

G. H. S.

2003-01-16 18:07:10

by Zed Pobre

[permalink] [raw]
Subject: Re: Nervous with 2.4.21-pre3 and -pre3-ac*

On Sun, Jan 12, 2003 at 08:18:38PM +0000, Alan Cox wrote:
> On Sun, 2003-01-12 at 18:55, ghugh Song wrote:
> > Many people including me are getting unusual Kernel
> > trouble recently with 2.4.21-pre3-ac*. In my case, with
> > 2.4.21-pre3-ac2 I got segmentation fault from
> > a command (tar) where I never suspected. Yet no one seems to know
> > what part of the the kernel update caused all this
> > trouble.
> >
> > Does anyone have any guess?
>
> At the moment I am not sure. Its stable on my boxes using gcc 3.1 and
> built from make distclean. At least one reporter found a patch and
> build over an old built tree failed but a clean tree did not.
>
> The obvious candidates assuming 2.4.21-pre3 is stable are the mm/shmem.c
> changes (you can back out just the diff to that file and retest which
> would be interesting), or the buffer cache changes which I plan to drop
> out to test soon.

I am not the original reporter, but I have a similar situation (I
can consistently cause an oops in 2.4.21-pre3-ac2 by attempting to
rsync a few gigabytes of data to that machine, or copy that data from
one partition to another. It only happens if highmem support is
enabled (the machine in question has 4GB), does not happen in
2.4.21-pre3, and I just tried removing the changes to mm/shmem.c in
the pre3-ac2 diff, recompiled the kernel, and still got the oops.

ksymoops output follows:

ksymoops 2.4.5 on i686 2.4.21-pre3-ac2. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-pre3-ac2/ (default)
-m /boot/System.map-2.4.21-pre3-ac2 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000004
c01342bd
*pde = 00104001
oops: 0002
CPU: 4
EIP: 0010:[<c01342bd>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: c10bc960 ecx: c3a96000 edx: 00000200
esi: 00000002 edi: f704ca00 ebp: 00000000 esp: c3a97d6c
ds: 0018 es: 0018 ss: 0018
Process kupdated (pid: 21, stackpage=c3a97000)
Stack: c3edc000 c3a2e268 f704ca00 00000000 f700a280 00000001 00000000 00000001
00000000 00000004 f700a280 c3f2b020 c03f5080 c01348c8 c01348e9 c01315bf
c3a2e278 c3a2e174 00000001 c013291f c3a2e268 f704ca00 00000020 00000070
Call Trace: [<c01348c8>] [<c01348e9>] [<c01315bf>] [<c013291f>] [<c0133a44>]
[<c0133aec>] [<c013459e>] [<c0134822>] [<c013453e>] [<c0139b82>] [<c0139cce>]
[<c01f4a07>] [<c01f506c>] [<c01f50cc>] [<c013cb74>] [<c013cc08>] [<c013fd9c>]
[<c014006a>] [<c01070c4>]
Code: 89 58 04 89 03 8d 51 5c 89 53 04 89 59 5c 89 73 0c ff 41 68


>>EIP; c01342bd <__free_pages_ok+28d/2ac> <=====

>>ebx; c10bc960 <_end+c3cadc/38a6c1dc>
>>ecx; c3a96000 <_end+361617c/38a6c1dc>
>>edi; f704ca00 <_end+36bccb7c/38a6c1dc>
>>esp; c3a97d6c <_end+3617ee8/38a6c1dc>

Trace; c01348c8 <__free_pages+1c/20>
Trace; c01348e9 <free_pages+1d/20>
Trace; c01315bf <kmem_slab_destroy+7f/98>
Trace; c013291f <kmem_cache_reap+2c7/338>
Trace; c0133a44 <shrink_caches+1c/88>
Trace; c0133aec <try_to_free_pages_zone+3c/5c>
Trace; c013459e <balance_classzone+5e/1d0>
Trace; c0134822 <__alloc_pages+112/160>
Trace; c013453e <_alloc_pages+16/18>
Trace; c0139b82 <alloc_bounce_page+e/94>
Trace; c0139cce <create_bounce+26/166>
Trace; c01f4a07 <__make_request+af/5f8>
Trace; c01f506c <generic_make_request+11c/12c>
Trace; c01f50cc <submit_bh+50/70>
Trace; c013cb74 <write_locked_buffers+20/2c>
Trace; c013cc08 <write_some_buffers+88/d0>
Trace; c013fd9c <sync_old_buffers+68/a0>
Trace; c014006a <kupdate+102/124>
Trace; c01070c4 <kernel_thread+28/38>

Code; c01342bd <__free_pages_ok+28d/2ac>
00000000 <_EIP>:
Code; c01342bd <__free_pages_ok+28d/2ac> <=====
0: 89 58 04 mov %ebx,0x4(%eax) <=====
Code; c01342c0 <__free_pages_ok+290/2ac>
3: 89 03 mov %eax,(%ebx)
Code; c01342c2 <__free_pages_ok+292/2ac>
5: 8d 51 5c lea 0x5c(%ecx),%edx
Code; c01342c5 <__free_pages_ok+295/2ac>
8: 89 53 04 mov %edx,0x4(%ebx)
Code; c01342c8 <__free_pages_ok+298/2ac>
b: 89 59 5c mov %ebx,0x5c(%ecx)
Code; c01342cb <__free_pages_ok+29b/2ac>
e: 89 73 0c mov %esi,0xc(%ebx)
Code; c01342ce <__free_pages_ok+29e/2ac>
11: ff 41 68 incl 0x68(%ecx)


1 warning issued. Results may not be reliable.


--
Zed Pobre <[email protected]> a.k.a. Zed Pobre <[email protected]>
PGP key and fingerprint available on finger; encrypted mail welcomed.


Attachments:
(No filename) (4.87 kB)
(No filename) (481.00 B)
Download all attachments