2008-01-07 23:15:21

by Stoyan Gaydarov

[permalink] [raw]
Subject: Kernel Oops?

Today I upgraded my kernel from 2.6.23.9 to 2.6.23.12 and in the past
30 minutes I have had to restart my computer twice.
I believe its a kernel oops or a kernel panic because when the
computer freezes it blinks the caps and scroll lock LEDs.
I don't know what is causing the problem but I am willing to help, I
can provide you with any information you need.
The only problem is that I don't know how to debug the system myself.
If anyone can tell me what to do to I can do it and give back the
information.

The system I am running is a slackware 12.0 with the new kernel.
root@SlaxDesk:~# uname -a
Linux SlaxDesk 2.6.23.12 #1 SMP Sat Jan 5 06:58:19 CST 2008 i686
Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz GenuineIntel GNU/Linux

Thank you ahead of time for any help.


2008-01-07 23:32:20

by Alan

[permalink] [raw]
Subject: Re: Kernel Oops?

On Mon, 7 Jan 2008 17:15:01 -0600
"Stoyan Gaydarov" <[email protected]> wrote:

> Today I upgraded my kernel from 2.6.23.9 to 2.6.23.12 and in the past
> 30 minutes I have had to restart my computer twice.
> I believe its a kernel oops or a kernel panic because when the
> computer freezes it blinks the caps and scroll lock LEDs.
> I don't know what is causing the problem but I am willing to help, I
> can provide you with any information you need.
> The only problem is that I don't know how to debug the system myself.
> If anyone can tell me what to do to I can do it and give back the
> information.

When the machine hangs in graphical mode its quite hard to get the data
out - one of the long term todo items is to fix that.

Boot the machine and leave it in text mode (or if it boots to graphical
mode then switch to a text console/text mode) and wait.. with "luck" it
will show the same problem in text mode and give you a meaningful screen
dump you can then write down (or grab with a digital camera)

Alan

2008-01-07 23:35:21

by Jesper Juhl

[permalink] [raw]
Subject: Re: Kernel Oops?

On 08/01/2008, Stoyan Gaydarov <[email protected]> wrote:
> Today I upgraded my kernel from 2.6.23.9 to 2.6.23.12 and in the past
> 30 minutes I have had to restart my computer twice.
> I believe its a kernel oops or a kernel panic because when the
> computer freezes it blinks the caps and scroll lock LEDs.
> I don't know what is causing the problem but I am willing to help, I
> can provide you with any information you need.
> The only problem is that I don't know how to debug the system myself.
> If anyone can tell me what to do to I can do it and give back the
> information.
>

Here are some things for you to try :


Try looking in your log files. look for things like "Oops", "BUG()"
and similar. If you find anything that looks relevant, post it here.

If you can trigger the problem without X running - try that. Sometimes
an Oops makes it to the local console but doesn't make it to the logs.
Being logged into a plain console without X running when the problem
triggers can sometimes enable you to capture it with a digital camera.

Try building your kernel with magic sysrq support (if you haven't
already). Then you can sometimes manage to get a backtrace to the
console after the hang. See Documentation/sysrq.txt for details.

Try building your kernel with some (or all) of the debug options found
under the 'Kernel hacking' menuconfig menu to get more debug info.

Make sure you have no proprietary modules that taint your kernel
loaded (like the NVidia driver for example). The presence of any such
modules makes the kernel pretty much un-debugable.

If nothing makes it to your logs nor to your local console, then try
attaching a second PC via serial console or netconsole and see if you
can manage to log the Oops that way. See
Documentation/serial-console.txt and
Documentation/networking/netconsole.txt for details.


That should do it for a few starting points. :-)


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2008-01-09 02:25:30

by Stoyan Gaydarov

[permalink] [raw]
Subject: Re: Kernel Oops?

On Jan 7, 2008 5:30 PM, Alan Cox <[email protected]> wrote:
> On Mon, 7 Jan 2008 17:15:01 -0600
> "Stoyan Gaydarov" <[email protected]> wrote:
>
> > Today I upgraded my kernel from 2.6.23.9 to 2.6.23.12 and in the past
> > 30 minutes I have had to restart my computer twice.
> > I believe its a kernel oops or a kernel panic because when the
> > computer freezes it blinks the caps and scroll lock LEDs.
> > I don't know what is causing the problem but I am willing to help, I
> > can provide you with any information you need.
> > The only problem is that I don't know how to debug the system myself.
> > If anyone can tell me what to do to I can do it and give back the
> > information.
>
> When the machine hangs in graphical mode its quite hard to get the data
> out - one of the long term todo items is to fix that.
>
> Boot the machine and leave it in text mode (or if it boots to graphical
> mode then switch to a text console/text mode) and wait.. with "luck" it
> will show the same problem in text mode and give you a meaningful screen
> dump you can then write down (or grab with a digital camera)
>
> Alan
>

I reverted back to a clean install of slackware 12.0 after trying to
get it to fail again without luck, then i installed the 2.6.23.9
kernel and continued to use it regularly. Then a few minutes ago it I
restarted the computer because it had frozen again, the same way.
Except this time when rebooting the machine i got a kernel oops
message and it didn't boot completely. I could not copy it but I did
take a picture and now I have re-written the screen here(sorry about
the formating):

Stack: 00000010 000000d0 00000001 000000d0 c20fb980 c2104000 c2103e00 00000246
c0a32fc0 47807ae8 00000000 c23eeaa0 000000d0 00000282
c20fb980 c026661b
c23eeaa0 00000000 f586df04 c23eeaa0 c02227f2 00000246
00000000 c225c480
Call Trace:
[<c026661b>] kmem_cache_alloc+0x6b/0x90
[<c02227f2>] dup_fd+0x22/0x2c0
[<c023cc26>] getnstimeofday+0x36/0xc0
[<c0222ad1>] copy_files+0x41/0x60
[<c0223218>] copy_process+0x488/0x11a0
[<c0235a62>] alloc_pid+0x152/0x280
[<c0224196>] do_fork+0x76/0x230
[<c022e7fb>] recalc_sigpending+0x5d/0xe0
[<c022e86d>] sigprocmask+0x5d/0xe0
[<c0202252>] sys_clone+0x32/0x40
[<c02041e6>] syscall_call+0x7/0xb
[<c07f0000>] __mutex_lock_interruptible_slowpath+0xb0/0xc0
=======================
Code: 5b 5e 5f 5d c3 8b 7a 10 89 d0 c7 42 34 01 00 00 00 83 c0 10 39 c7 74 b6 8b
4c 24 10 8b 77 10 3b b1 98 00 00 00 0f 82 1d ff ff ff <0f> 0b eb fe 8b 4c 24 18
8b 54 24 18 8b 41 08 83 c2 08 89 78 04
EIP: [<c02667fd>] cache_alloc_refill+0x1bd/0x540 SS:ESP 0068:f586de7c
INIT: Entering runlevel: 4
Going multiuser...
Updating shared library links: /sbin/ldconfig &


Hope that someone can find the problem and fix it

2008-01-09 03:04:28

by Alan

[permalink] [raw]
Subject: Re: Kernel Oops?

> Except this time when rebooting the machine i got a kernel oops
> message and it didn't boot completely. I could not copy it but I did
> take a picture and now I have re-written the screen here(sorry about

That is interesting - that sort of error usually points at memory
corruption and early on tends to point at hardware (but not always). What
hard is in this system and does it have over 4GB of RAM ?

2008-01-09 03:24:56

by Stoyan Gaydarov

[permalink] [raw]
Subject: Re: Kernel Oops?

On Jan 8, 2008 9:02 PM, Alan Cox <[email protected]> wrote:
> > Except this time when rebooting the machine i got a kernel oops
> > message and it didn't boot completely. I could not copy it but I did
> > take a picture and now I have re-written the screen here(sorry about
>
> That is interesting - that sort of error usually points at memory
> corruption and early on tends to point at hardware (but not always). What
> hard is in this system and does it have over 4GB of RAM ?
>
>

There are 2GB of RAM and the motherboard is DFI and it has a duel core
intel cpu. If you need to specifics I could look them up.

2008-01-10 23:05:52

by Jesper Juhl

[permalink] [raw]
Subject: Re: Kernel Oops?

On 09/01/2008, Stoyan Gaydarov <[email protected]> wrote:
> On Jan 8, 2008 9:02 PM, Alan Cox <[email protected]> wrote:
> > > Except this time when rebooting the machine i got a kernel oops
> > > message and it didn't boot completely. I could not copy it but I did
> > > take a picture and now I have re-written the screen here(sorry about
> >
> > That is interesting - that sort of error usually points at memory
> > corruption and early on tends to point at hardware (but not always). What
> > hard is in this system and does it have over 4GB of RAM ?
> >
> >
>
> There are 2GB of RAM and the motherboard is DFI and it has a duel core
> intel cpu. If you need to specifics I could look them up.

cat /proc/cpuinfo
cat /proc/scsi/scsi
cat /proc/interrupts
lspci -vvx

should give you most of the details :)

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html