2009-09-30 21:11:56

by Shirley Ma

[permalink] [raw]
Subject: INFO: task journal:337 blocked for more than 120 seconds

Hello all,

Anybody found this problem before? I kept hitting this issue for 2.6.31
guest kernel even with a simple network test.

INFO: task kjournal:337 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_sec" disables this message.

kjournald D 00000041 0 337 2 0x00000000

My test is totally being blocked.

Thanks
Shirley


2009-10-01 13:20:37

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On Wed, Sep 30, 2009 at 02:11:52PM -0700, Shirley Ma wrote:
> Hello all,
>
> Anybody found this problem before? I kept hitting this issue for 2.6.31
> guest kernel even with a simple network test.
>
> INFO: task kjournal:337 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_sec" disables this message.
>
> kjournald D 00000041 0 337 2 0x00000000
>
> My test is totally being blocked.

I've hit this in the past with ext3, mounting with data=writeback made it
disappear.

2009-10-01 16:58:57

by Shirley Ma

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On Thu, 2009-10-01 at 10:20 -0300, Marcelo Tosatti wrote:
> I've hit this in the past with ext3, mounting with data=writeback made
> it
> disappear.

Thanks. I will make a try. Someone should fix this.

Shirley

2009-10-01 21:00:40

by Shirley Ma

[permalink] [raw]
Subject: Re: INFO: task kjournal:337 blocked for more than 120 seconds

I talked to Mingming, she suggested to use different IO scheduler. The
default scheduler is cfg, after I switch to noop, the problem is gone.

So there seems a bug in cfg scheduler. It's easily reproduced it when
running the guest kernel, so far I haven't hit this problem on the host
side.

If I need to file a bug for some one to look at, please let me know.

Thanks
Shirley

2009-10-01 21:03:17

by Javier Guerra

[permalink] [raw]
Subject: Re: INFO: task kjournal:337 blocked for more than 120 seconds

On Thu, Oct 1, 2009 at 4:00 PM, Shirley Ma <[email protected]> wrote:
> I talked to Mingming, she suggested to use different IO scheduler. The
> default scheduler is cfg, after I switch to noop, the problem is gone.

deadline is the most recommended one for virtualization hosts. some
distros set it as default if you select Xen or KVM at installation
time. (and noop for the guests)


--
Javier

2009-10-01 21:17:22

by Shirley Ma

[permalink] [raw]
Subject: Re: INFO: task kjournal:337 blocked for more than 120 seconds

On Thu, 2009-10-01 at 16:03 -0500, Javier Guerra wrote:
> deadline is the most recommended one for virtualization hosts. some
> distros set it as default if you select Xen or KVM at installation
> time. (and noop for the guests)

I spoke too earlier, after a while noop scheduler hit the same issue. I
am switching to deadline to test it again.

Thanks
Shirley

2009-10-01 22:09:40

by Shirley Ma

[permalink] [raw]
Subject: Re: INFO: task kjournal:337 blocked for more than 120 seconds

Switching to different scheduler doesn't make the problem gone away.

Shirley

2009-10-02 18:30:46

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On 09/30/09 14:11, Shirley Ma wrote:
> Anybody found this problem before? I kept hitting this issue for 2.6.31
> guest kernel even with a simple network test.
>
> INFO: task kjournal:337 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_sec" disables this message.
>
> kjournald D 00000041 0 337 2 0x00000000
>
> My test is totally being blocked.

I'm assuming from the lists you've posted to that this is under KVM?
What disk drivers are you using (virtio or emulated)?

Can you get a full stack backtrace of kjournald?

Kevin Bowling submitted a RH bug against Xen with apparently the same
symptoms (https://bugzilla.redhat.com/show_bug.cgi?id=526627). I'm
wondering if there's a core kernel bug here, which is perhaps more
easily triggered by the changed timing in a virtual machine.

Thanks,
J

2009-10-02 19:06:54

by Shirley Ma

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On Fri, 2009-10-02 at 11:30 -0700, Jeremy Fitzhardinge wrote:
> I'm assuming from the lists you've posted to that this is under KVM?
> What disk drivers are you using (virtio or emulated)?
>
> Can you get a full stack backtrace of kjournald?

Yes, it's under KVM, disk driver is virtio. Since the io has issue, the
stack can't be saved on the disk. I have the image file attached here.

Thanks
Shirley


Attachments:
$696984E9BF50266.jpg (75.85 kB)
ext3.jpg (58.66 kB)
Download all attachments

2009-10-02 19:13:27

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On 10/02/09 12:06, Shirley Ma wrote:
> On Fri, 2009-10-02 at 11:30 -0700, Jeremy Fitzhardinge wrote:
>
>> I'm assuming from the lists you've posted to that this is under KVM?
>> What disk drivers are you using (virtio or emulated)?
>>
>> Can you get a full stack backtrace of kjournald?
>>
> Yes, it's under KVM, disk driver is virtio. Since the io has issue, the
> stack can't be saved on the disk. I have the image file attached here.
>

Ah, thank you. The backtrace does indeed look very similar.

(BTW, you could get a serial console with "qemu-kvm -nographic -append
console=ttyS0 ...")

J

2009-10-04 04:24:09

by Kevin Bowling

[permalink] [raw]
Subject: Re: INFO: task journal:337 blocked for more than 120 seconds

On 10/2/2009 2:30 PM, Jeremy Fitzhardinge wrote:
> On 09/30/09 14:11, Shirley Ma wrote:
>
>> Anybody found this problem before? I kept hitting this issue for 2.6.31
>> guest kernel even with a simple network test.
>>
>> INFO: task kjournal:337 blocked for more than 120 seconds.
>> "echo 0> /proc/sys/kernel/hung_task_timeout_sec" disables this message.
>>
>> kjournald D 00000041 0 337 2 0x00000000
>>
>> My test is totally being blocked.
>>
> I'm assuming from the lists you've posted to that this is under KVM?
> What disk drivers are you using (virtio or emulated)?
>
> Can you get a full stack backtrace of kjournald?
>
> Kevin Bowling submitted a RH bug against Xen with apparently the same
> symptoms (https://bugzilla.redhat.com/show_bug.cgi?id=526627). I'm
> wondering if there's a core kernel bug here, which is perhaps more
> easily triggered by the changed timing in a virtual machine.
>
> Thanks,
> J
>

I've had a stable system thus far by appending "clocksource=jiffies" to
the kernel boot line. The default clocksource is otherwise "xen".

The dmesg boot warnings in my bugzilla report still occur.

Regards,
Kevin Bowling
http://www.analograils.com/