2002-07-21 12:27:01

by Axel H. Siebenwirth

[permalink] [raw]
Subject: 2.5.27: Software Suspend failure / JFS errors

Hi,

I invoked software suspend with kernel 2.5.27 and get the following messages
from kernel:

Stopping tasks: ========================
stopping tasks failed (3 tasks remaining)
Suspend failed: Not all processes stopped!
Restarting tasks...<6> Strange, jfsIO not stopped
Strange, jfsCommit not stopped
Strange, jfsSync not stopped
done

Afterwards, I have full cpu utilization of the JFS kernel threads:

CPU states: 1.0% user, 99.0% system, 0.0% nice, 0.0% idle
Mem: 126284K total, 97112K used, 29172K free, 0K buffers
Swap: 289160K total, 0K used, 289160K free, 53224K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
7 root 25 0 0 0 0 RW 36.6 0.0 3:56 jfsIO
8 root 25 0 0 0 0 RW 30.6 0.0 3:56 jfsCommit
9 root 25 0 0 0 0 RW 29.6 0.0 3:56 jfsSync
361 root 15 0 972 972 768 R 0.9 0.7 0:00 top
235 axel 15 0 3064 3064 1928 R 0.0 2.4 0:00 sawfish
244 axel 15 0 5004 5004 3724 R 0.0 3.9 0:00 panel
248 axel 15 0 4508 4508 3324 R 0.0 3.5 0:03 gkrellm

And constant activity of VM:

procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 3 0 29284 0 53224 0 0 61 6 1051 106 2 89
10
0 0 2 0 29284 0 53224 0 0 0 0 1006 94 0 100
0
0 0 2 0 29284 0 53224 0 0 0 0 1006 75 1 99
0
0 0 2 0 29284 0 53224 0 0 0 0 1006 79 1 99
0
0 0 2 0 29284 0 53224 0 0 0 0 1006 88 1 99
0

I used to have problems with JFS anyway when unpacking big tar archives. The
the system gives an oops and locks up a short while after. The process it is
stuck in is JFSCommit.
I tried latest 2.4 and 2.5, always had the same problems. Strangely JFS
causes no problems at all when I uses the kernel my partitions were
formatted with. That is slackware kernel 2.4.18, jfs 1.0.18. Any kernel
later with jfs versions higher causes these JFSCommit freezes.

I will send an oops report of JFS later.

Regards,
Axel Siebenwirth


2002-07-21 14:39:44

by Axel H. Siebenwirth

[permalink] [raw]
Subject: Re: 2.5.27: Software Suspend failure / JFS errors

Hi!

This oops occurred during build of gcc..
Kernel 2.4.19-rc2-ac2.
About the same happens with 2.5.27. I will post an oops of jfsCommit of
2.5.27 as soon as I get one.

ksymoops 2.4.5 on i686 2.4.19-rc2-ac2. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19-rc2-ac2/ (default)
-m /boot/System.map-2.4.19-rc2-ac2 (specified)

Unable to handle kernel NULL pointer dereference at virtual address 00000018
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c018b565>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: c7e5c000 ebx: c8802490 ecx: 00000000 edx: 00000000
esi: c8802490 edi: c880cf58 ebp: c7dbd980 esp: c7e5df58
ds: 0018 es: 0018 ss: 0018
Process jfsCommit (pid: 8, stackpage=c7e5d000)
Stack: 000000b1 c0190800 00000000 00000000 00000000 00000286 00000000
00000040
c7e5e000 c0118486 c7e5dfa8 c7e5c000 c8802490 c8802490 c7e5c000
00000001
c0190fb3 c8802490 c7e5c000 00000246 c8802490 c01911db c8802490
c7e5c000
Call Trace: [<c0190800>] [<c0118486>] [<c0190fb3>] [<c01911db>] [<c0105000>]
[<c010739e>] [<c0191080>]
Code: ff 41 18 85 d2 74 34 31 c0 0f ab 41 14 19 c0 85 c0 74 09 b8


>>EIP; c018b565 <hold_metapage+15/70> <=====

>>eax; c7e5c000 <_end+7b3e314/85ff314>
>>ebx; c8802490 <_end+84e47a4/85ff314>
>>esi; c8802490 <_end+84e47a4/85ff314>
>>edi; c880cf58 <_end+84ef26c/85ff314>
>>ebp; c7dbd980 <_end+7a9fc94/85ff314>
>>esp; c7e5df58 <_end+7b4026c/85ff314>

Trace; c0190800 <txUpdateMap+2c0/2d0>
Trace; c0118486 <schedule+1a6/310>
Trace; c0190fb3 <txLazyCommit+23/f0>
Trace; c01911db <jfs_lazycommit+15b/250>
Trace; c0105000 <_stext+0/0>
Trace; c010739e <kernel_thread+2e/40>
Trace; c0191080 <jfs_lazycommit+0/250>

Code; c018b565 <hold_metapage+15/70>
00000000 <_EIP>:
Code; c018b565 <hold_metapage+15/70> <=====
0: ff 41 18 incl 0x18(%ecx) <=====
Code; c018b568 <hold_metapage+18/70>
3: 85 d2 test %edx,%edx
Code; c018b56a <hold_metapage+1a/70>
5: 74 34 je 3b <_EIP+0x3b> c018b5a0
<hold_metapage+50/70>
Code; c018b56c <hold_metapage+1c/70>
7: 31 c0 xor %eax,%eax
Code; c018b56e <hold_metapage+1e/70>
9: 0f ab 41 14 bts %eax,0x14(%ecx)
Code; c018b572 <hold_metapage+22/70>
d: 19 c0 sbb %eax,%eax
Code; c018b574 <hold_metapage+24/70>
f: 85 c0 test %eax,%eax
Code; c018b576 <hold_metapage+26/70>
11: 74 09 je 1c <_EIP+0x1c> c018b581
<hold_metapage+31/70>
Code; c018b578 <hold_metapage+28/70>
13: b8 00 00 00 00 mov $0x0,%eax


Regards,
Axel Siebenwirth

2002-07-23 14:51:45

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.5.27: Software Suspend failure / JFS errors

On Sunday 21 July 2002 09:42, [email protected] wrote:
> This oops occurred during build of gcc..
> Kernel 2.4.19-rc2-ac2.
> About the same happens with 2.5.27. I will post an oops of jfsCommit
> of 2.5.27 as soon as I get one.

I just built gcc on 2.4.19-rc3 + latest JFS and didn't have a problem.
I'll repeat it on 2.4.19-rc2-ac2, but there shouldn't be more than a
comsmetic difference in the JFS code. I haven't tried 2.5.27 yet.

> ksymoops 2.4.5 on i686 2.4.19-rc2-ac2. Options used
--- ksymoops output deleted ---
>
> Trace; c0190800 <txUpdateMap+2c0/2d0>
> Trace; c0118486 <schedule+1a6/310>
> Trace; c0190fb3 <txLazyCommit+23/f0>
> Trace; c01911db <jfs_lazycommit+15b/250>
> Trace; c0105000 <_stext+0/0>
> Trace; c010739e <kernel_thread+2e/40>
> Trace; c0191080 <jfs_lazycommit+0/250>
>
> Code; c018b565 <hold_metapage+15/70>
> 00000000 <_EIP>:
> Code; c018b565 <hold_metapage+15/70> <=====
> 0: ff 41 18 incl 0x18(%ecx) <=====
> Code; c018b568 <hold_metapage+18/70>
> 3: 85 d2 test %edx,%edx

It looks like tlck->mp was null in txUpdateMap, and hold_metapage was
called with the null pointer. I haven't seen this before, but I am
looking at the code to see if I can figure out how it may have
happened. I'm guessing that you have built the kernel without
CONFIG_JFS_DEBUG set. If I'm right, can you set this before you try to
stress JFS again. It may help find the problem earlier.

> Regards,
> Axel Siebenwirth

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2002-07-23 15:05:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Jfs-discussion] Re: 2.5.27: Software Suspend failure / JFS errors

On Tue, Jul 23, 2002 at 09:54:35AM -0500, Dave Kleikamp wrote:
> On Sunday 21 July 2002 09:42, [email protected] wrote:
> > This oops occurred during build of gcc..
> > Kernel 2.4.19-rc2-ac2.
> > About the same happens with 2.5.27. I will post an oops of jfsCommit
> > of 2.5.27 as soon as I get one.
>
> I just built gcc on 2.4.19-rc3 + latest JFS and didn't have a problem.
> I'll repeat it on 2.4.19-rc2-ac2, but there shouldn't be more than a
> comsmetic difference in the JFS code. I haven't tried 2.5.27 yet.

As I read 'Software Suspend' in the subject I guess it's swsusp fault.
Swsusp needs magic flags for kernel threads which no one has added to
JFS yet.

2002-07-23 16:12:23

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Jfs-discussion] Re: 2.5.27: Software Suspend failure / JFS errors

On Tuesday 23 July 2002 10:06, Christoph Hellwig wrote:
> As I read 'Software Suspend' in the subject I guess it's swsusp
> fault. Swsusp needs magic flags for kernel threads which no one has
> added to JFS yet.

I understood the swsusp to be an unrelated issue. Is swsusp even
available in a 2.4 kernel?

I believe to fix the swsusp problem, the kernel threads need to test
(current->flags & PF_FREEZE), and if set call
refrigerator(PF_IOTHREAD).

--
David Kleikamp
IBM Linux Technology Center

2002-07-23 16:17:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Jfs-discussion] Re: 2.5.27: Software Suspend failure / JFS errors

On Tue, Jul 23, 2002 at 11:15:14AM -0500, Dave Kleikamp wrote:
> On Tuesday 23 July 2002 10:06, Christoph Hellwig wrote:
> > As I read 'Software Suspend' in the subject I guess it's swsusp
> > fault. Swsusp needs magic flags for kernel threads which no one has
> > added to JFS yet.
>
> I understood the swsusp to be an unrelated issue. Is swsusp even
> available in a 2.4 kernel?

There is a 2.4 patch and it was merged in -ac for some period.

> I believe to fix the swsusp problem, the kernel threads need to test
> (current->flags & PF_FREEZE), and if set call
> refrigerator(PF_IOTHREAD).

I think so. (although I have to admit that I don't care for it)

2002-07-25 22:25:14

by Axel H. Siebenwirth

[permalink] [raw]
Subject: Re: JFS errors

Hi Dave!

On Tue, 23 Jul 2002, Dave Kleikamp wrote:

> happened. I'm guessing that you have built the kernel without
> CONFIG_JFS_DEBUG set. If I'm right, can you set this before you try to
> stress JFS again. It may help find the problem earlier.

No, it's built with JFS_DEBUG. That was the first thing I compiled into a
new kernel when I first encountered this.
How can it help you? Shall I provide info from /proc/fs/jfs after oops
occured?
Oops itself I have to handcopy each time. Hard work! ;) But I guess I can
access /proc tree.

Axel Siebenwirth

2002-07-26 14:02:36

by Dave Kleikamp

[permalink] [raw]
Subject: Re: JFS errors

> No, it's built with JFS_DEBUG. That was the first thing I compiled into a
> new kernel when I first encountered this.

I'll take another look at the oops. My initial thought was that if I was
right in my assumptions, a dereference in an ASSERT statement would have
caused a trap slightly earlier than the one you hit. Without debug, the
ASSERT is compiled out.

> How can it help you?

If it's already on, it won't provide any more help. There was just a
chance that if it wasn't on, it might have caught something earlier.

> Shall I provide info from /proc/fs/jfs after oops
> occured?

I doubt anything there would be useful.

> Oops itself I have to handcopy each time. Hard work! ;) But I guess I can
> access /proc tree.

The oops was helpful, and I'll need to take a closer look at the code. I'll
let you know if I want you to try anything else.

Thanks for the feedback.

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2002-07-29 17:45:51

by Pavel Machek

[permalink] [raw]
Subject: Re: [Jfs-discussion] Re: 2.5.27: Software Suspend failure / JFS errors

Hi!

> > > This oops occurred during build of gcc..
> > > Kernel 2.4.19-rc2-ac2.
> > > About the same happens with 2.5.27. I will post an oops of jfsCommit
> > > of 2.5.27 as soon as I get one.
> >
> > I just built gcc on 2.4.19-rc3 + latest JFS and didn't have a problem.
> > I'll repeat it on 2.4.19-rc2-ac2, but there shouldn't be more than a
> > comsmetic difference in the JFS code. I haven't tried 2.5.27 yet.
>
> As I read 'Software Suspend' in the subject I guess it's swsusp fault.
> Swsusp needs magic flags for kernel threads which no one has added to
> JFS yet.

Hehe. Really someone should add if (current->flags & PF_FREEZE) refrigerator();
at the right place of JFS threads. I don't have JFS installed so it is hard
for me to do that, sorry.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.