2010-07-29 01:30:41

by Pedro Ribeiro

[permalink] [raw]
Subject: kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

Hi all,

I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
says Compress Read -22 and locks up. I caught the stack trace with kdb
and took photos of that.
I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
swap and home partitions inside.

It seems that kcryptd caused the trouble. I've had other lockups with
TuxOnIce that relate to kcryptd too, but I never caught them with kdb,

After printing the stack trace I decided to see the output of the ps
command. As I was scrolling the processes shown, kdb oops'ed and
called itself. I also took photos of that kdb's own stack trace. I
then tried the ps command again, but this time the stack trace was
looping every few seconds (I took another photo of that). After a
while it just panicked and kept calling itself on a loop. I rebooted
and was able to successfully resume the TuxOnIce image.

The stack trace means little to me, but might be helpful to you.

The photos are:
kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
kdb_blows_up - final stack trace being shown in a cycle before PANIC:
recursive entry into debugger and locking up completely

The files are in kcryptd_kdb_oopses.tar.gz (about 4.7 mb) located here
http://www.mediafire.com/file/uum6y1hwfk90124/kcryptd_kdb_oopses.tar.gz
. They should stay there at least 30 days.

Sorry for the file size but they are good quality pictures.

Regards,
Pedro


Subject: Re: kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
> says Compress Read -22 and locks up. I caught the stack trace with kdb
> and took photos of that.

Maybe this?
http://lkml.org/lkml/2010/7/28/398

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2010-07-29 03:14:35

by Nigel Cunningham

[permalink] [raw]
Subject: Re: kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

Hi Henrique.

On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>> and took photos of that.
>
> Maybe this?
> http://lkml.org/lkml/2010/7/28/398

I don't think so. This issue has been around for a fair while. It's just
impossible to reliably reproduce, and I haven't yet found the time to
put some serious effort into tracking down the cause and fixing it.

Regards,

Nigel

2010-07-29 10:31:46

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-crypt] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

On 07/29/2010 05:08 AM, Nigel Cunningham wrote:
> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
>> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>> and took photos of that.
>>
>> Maybe this?
>> http://lkml.org/lkml/2010/7/28/398
>
> I don't think so. This issue has been around for a fair while. It's just
> impossible to reliably reproduce, and I haven't yet found the time to
> put some serious effort into tracking down the cause and fixing it.

Is it TuxOnIce only problem?
Or there is similar report with unpatched kernel?

Milan

2010-07-29 11:38:39

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [dm-crypt] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

Hi.

On 29/07/10 20:31, Milan Broz wrote:
> On 07/29/2010 05:08 AM, Nigel Cunningham wrote:
>> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
>>> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>>> and took photos of that.
>>>
>>> Maybe this?
>>> http://lkml.org/lkml/2010/7/28/398
>>
>> I don't think so. This issue has been around for a fair while. It's just
>> impossible to reliably reproduce, and I haven't yet found the time to
>> put some serious effort into tracking down the cause and fixing it.
>
> Is it TuxOnIce only problem?
> Or there is similar report with unpatched kernel?

It's TuxOnIce specific.

Regards,

Nigel

2010-07-29 11:49:57

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [TuxOnIce-devel] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

Am Donnerstag 29 Juli 2010 schrieb Nigel Cunningham:
> Hi Henrique.

Hi Nigel,

> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
> > On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
> >> I hit a bug when resuming with TuxOnIce. At the middle of a resume,
> >> it says Compress Read -22 and locks up. I caught the stack trace
> >> with kdb and took photos of that.
> >
> > Maybe this?
> > http://lkml.org/lkml/2010/7/28/398
>
> I don't think so. This issue has been around for a fair while. It's
> just impossible to reliably reproduce, and I haven't yet found the
> time to put some serious effort into tracking down the cause and
> fixing it.

I reported this one as - you said, its an TuxOnIce bug:
https://bugzilla.kernel.org/show_bug.cgi?id=15873

I switched compression on my ThinkPad T23 where it happened all 2-4 days
or so with 2.6.34 from LZO to LZF. And since then I didn't get the error
anymore, but with only 5 attempts so far, so I am not sure whether
switching to LZF "fixed" it:

deepdance:~> cat /sys/power/tuxonice/debug_info
TuxOnIce debugging info:
- TuxOnIce core : 3.1.1.1
- Kernel Version : 2.6.34.1-tp23-toi-3.1.1.1-04990-g3a7d1f4
- Compiler vers. : 4.4
- Attempt number : 5
- Parameters : 0 667656 0 1 0 0
- Overall expected compression percentage: 0.
- Checksum method is 'md4'.
0 pages resaved in atomic copy.
- Compressor is 'lzf'.
Compressed 776593408 bytes into 359897499 (53 percent compression).
- Block I/O active.
- Max outstanding reads 714. Max writes 5.
Memory_needed: 1024 x (4096 + 200 + 76) = 4476928 bytes.
Free mem throttle point reached 983.
- Swap Allocator enabled.
Swap available for image: 229016 pages.
- File Allocator active.
Storage available for image: 0 pages.
- I/O speed: Write 28 MB/s, Read 33 MB/s.
- Extra pages : 26 used/500.
- Result : Succeeded.

Maybe its a good idea to collect information in that bug report, even when
it really is a TuxOnIce one.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-30 21:10:26

by Jason Wessel

[permalink] [raw]
Subject: Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards

On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
> Hi all,
>
> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
> says Compress Read -22 and locks up. I caught the stack trace with kdb
> and took photos of that.
> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
> swap and home partitions inside.
>
> It seems that kcryptd caused the trouble. I've had other lockups with
> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>
> After printing the stack trace I decided to see the output of the ps
> command. As I was scrolling the processes shown, kdb oops'ed and
> called itself. I also took photos of that kdb's own stack trace. I
> then tried the ps command again, but this time the stack trace was
> looping every few seconds (I took another photo of that). After a
> while it just panicked and kept calling itself on a loop. I rebooted
> and was able to successfully resume the TuxOnIce image.
>
> The stack trace means little to me, but might be helpful to you.
>
> The photos are:
> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>

You don't happen to have the vmlinux file around which corresponded to
that crashed kernel do you?

If so, can you run:

addr2line -f -e vmlinux 0xffffffff81030512
addr2line -f -e vmlinux 0xffffffff810ad1d0
addr2line -f -e vmlinux 0xffffffff810add3c

And send me the output?

I have a pretty good idea about what the problem is but it would be
interesting to know the exact failure point if the vmlinux file will
tell us. In a nut shell, the "ps" command in kdb does not use
probe_kernel_address() to safely read memory in all instances.
Presently the ps function assumes that if the task struct was ok the
rest of memory accesses in this region would be ok as well.




> kdb_blows_up - final stack trace being shown in a cycle before PANIC:
>
Once kdb oopses the system is pretty much toast. There are some limited
things you can do at that point like at least get a stack trace so the
original problem can be found.

Jason.

2010-07-30 21:33:38

by Pedro Ribeiro

[permalink] [raw]
Subject: Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards

On 30 July 2010 22:10, Jason Wessel <[email protected]> wrote:
> On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
>> Hi all,
>>
>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>> and took photos of that.
>> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
>> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
>> swap and home partitions inside.
>>
>> It seems that kcryptd caused the trouble. I've had other lockups with
>> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>>
>> After printing the stack trace I decided to see the output of the ps
>> command. As I was scrolling the processes shown, kdb oops'ed and
>> called itself. I also took photos of that kdb's own stack trace. I
>> then tried the ps command again, but this time the stack trace was
>> looping every few seconds (I took another photo of that). After a
>> while it just panicked and kept calling itself on a loop. I rebooted
>> and was able to successfully resume the TuxOnIce image.
>>
>> The stack trace means little to me, but might be helpful to you.
>>
>> The photos are:
>> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
>> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>>
>
> You don't happen to have the vmlinux file around which corresponded to
> that crashed kernel do you?
>
> If so, can you run:
>
> addr2line -f -e vmlinux 0xffffffff81030512
> addr2line -f -e vmlinux 0xffffffff810ad1d0
> addr2line -f -e vmlinux 0xffffffff810add3c
>
> And send me the output?
>
> I have a pretty good idea about what the problem is but it would be
> interesting to know the exact failure point if the vmlinux file will
> tell us. ? ?In a nut shell, the "ps" command in kdb does not use
> probe_kernel_address() to safely read memory in all instances.
> Presently the ps function assumes that if the task struct was ok the
> rest of memory accesses in this region would be ok as well.
>

Not sure if this is what you want...

addr2line -f -e vmlinux 0xffffffff81030512:
task_curr
??:0

addr2line -f -e vmlinux 0xffffffff810ad1d0
kdb_ps1
??:0

addr2line -f -e vmlinux 0xffffffff810add3c
kdb_task_state_char
??:0


>
>
>> kdb_blows_up - final stack trace being shown in a cycle before PANIC:
>>
> Once kdb oopses the system is pretty much toast. ?There are some limited
> things you can do at that point like at least get a stack trace so the
> original problem can be found.
>
> Jason.
>

Can you tell me how to do that? So that when it happens next time I
have the chance to take a photo...

Regards,
Pedro

2010-07-30 22:53:38

by Jason Wessel

[permalink] [raw]
Subject: Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards

On 07/30/2010 04:33 PM, Pedro Ribeiro wrote:
> On 30 July 2010 22:10, Jason Wessel <[email protected]> wrote:
>
>> On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
>>
>>> Hi all,
>>>
>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>> and took photos of that.
>>> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
>>> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
>>> swap and home partitions inside.
>>>
>>> It seems that kcryptd caused the trouble. I've had other lockups with
>>> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>>>
>>> After printing the stack trace I decided to see the output of the ps
>>> command. As I was scrolling the processes shown, kdb oops'ed and
>>> called itself. I also took photos of that kdb's own stack trace. I
>>> then tried the ps command again, but this time the stack trace was
>>> looping every few seconds (I took another photo of that). After a
>>> while it just panicked and kept calling itself on a loop. I rebooted
>>> and was able to successfully resume the TuxOnIce image.
>>>
>>> The stack trace means little to me, but might be helpful to you.
>>>
>>> The photos are:
>>> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
>>> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>>>
>>>
>> You don't happen to have the vmlinux file around which corresponded to
>> that crashed kernel do you?
>>
>> If so, can you run:
>>
>> addr2line -f -e vmlinux 0xffffffff81030512
>> addr2line -f -e vmlinux 0xffffffff810ad1d0
>> addr2line -f -e vmlinux 0xffffffff810add3c
>>
>> And send me the output?
>>
>> I have a pretty good idea about what the problem is but it would be
>> interesting to know the exact failure point if the vmlinux file will
>> tell us. In a nut shell, the "ps" command in kdb does not use
>> probe_kernel_address() to safely read memory in all instances.
>> Presently the ps function assumes that if the task struct was ok the
>> rest of memory accesses in this region would be ok as well.
>>
>>
>
> Not sure if this is what you want...
>
> addr2line -f -e vmlinux 0xffffffff81030512:
> task_curr
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810ad1d0
> kdb_ps1
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810add3c
> kdb_task_state_char
> ??:0
>
>

I guess there was no debuginfo in your vmlinux file then, because
normally that would return the source line information. At least I
know where to look to fix the problem from the back trace.

Thanks,
Jason.