Christopher S. Aker wrote:
> Sorry for the noise if this isn't the appropriate venue for this. I
> posted this last month to xen-devel:
>
> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>
> I can reliably cause a paravirt_ops Xen guest to hang during intensive
> IO. My current recipe is an untar/tar loop, without compression, of a
> kernel tree. For example:
>
> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
> bzip2 -d linux-2.6.23.tar.bz2
>
> while true;
> echo `date`
> tar xf linux-2.6.23.tar
> tar cf linux-2.6.23.tar linux-2.6.23
> done
>
> After a few loops, anything that touches the xvd device that hung will
> get stuck in D state.
I've been running this all night without seeing any problem. I'm using
current x86.git#testing with a few local patches, but nothing especially
relevent-looking.
Could you try the attached patch to see if it makes any difference?
J
>
> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
> from 3.1.2. In all cases, the host continues to run fine, nothing out
> of the ordinary is logged on the dom0 side, xenstore reports the
> status of the devices is fine.
>
> Can anyone reproduce this problem, or let me know what else I can
> provide to help track this down?
>
> Thanks,
> -Chris
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
Jeremy Fitzhardinge wrote:
> I've been running this all night without seeing any problem. I'm using
> current x86.git#testing with a few local patches, but nothing especially
> relevent-looking.
Meh .. what backend are you using? We're using LVM volumes exported
directly into the domUs like so:
disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
> Could you try the attached patch to see if it makes any difference?
Unfortunately we're still in the same place... pv_ops kernels are still
hanging after heavy disk IO:
works - 2.6.18.x (from xen-unstable)
hangs - 2.6.25-rc3-git3
hangs - 2.6.25-rc3-git3 + your patch
Any other suggestions or debugging I can provide that would be useful to
squash this?
-Chris
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> I've been running this all night without seeing any problem. I'm
>> using current x86.git#testing with a few local patches, but nothing
>> especially relevent-looking.
>
> Meh .. what backend are you using? We're using LVM volumes exported
> directly into the domUs like so:
>
> disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
>
>> Could you try the attached patch to see if it makes any difference?
>
> Unfortunately we're still in the same place... pv_ops kernels are
> still hanging after heavy disk IO:
>
> works - 2.6.18.x (from xen-unstable)
> hangs - 2.6.25-rc3-git3
> hangs - 2.6.25-rc3-git3 + your patch
>
> Any other suggestions or debugging I can provide that would be useful
> to squash this?
Are you running an SMP or UP domain? I found I could get hangs very
easily with UP (but I need confirm it isn't a result of some other very
experimental patches).
J
Jeremy Fitzhardinge wrote:
> Are you running an SMP or UP domain? I found I could get hangs very
> easily with UP (but I need confirm it isn't a result of some other very
> experimental patches).
The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
kernels are still slightly responsive after the hang occurs, which makes
me think only one proc gets stuck at a time, not the entire kernel.
-Chris
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> Are you running an SMP or UP domain? I found I could get hangs very
>> easily with UP (but I need confirm it isn't a result of some other
>> very experimental patches).
>
> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
> kernels are still slightly responsive after the hang occurs, which
> makes me think only one proc gets stuck at a time, not the entire kernel.
The patch I posted yesterday - "xen: fix RMW when unmasking events" -
should definitively fix the hanging-under-load bugs (I hope). It
problem came from returning to userspace with pending events, which
would leave them hanging around on the vcpu unprocessed, and eventually
everything would deadlock. This was caused by using an unlocked
read-modify-write operation on the event pending flag - which can be set
by another (real) cpu - meaning that the pending event wasn't noticed
until too late. It would only be a problem on an SMP host.
The patch should back-apply to 2.6.24.
J
Jeremy Fitzhardinge wrote:
> Christopher S. Aker wrote:
>> Jeremy Fitzhardinge wrote:
>>> Are you running an SMP or UP domain? I found I could get hangs very
>>> easily with UP (but I need confirm it isn't a result of some other
>>> very experimental patches).
>>
>> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
>> kernels are still slightly responsive after the hang occurs, which
>> makes me think only one proc gets stuck at a time, not the entire kernel.
>
> The patch I posted yesterday - "xen: fix RMW when unmasking events" -
> should definitively fix the hanging-under-load bugs (I hope).
Confirmed-by: [email protected]
Nice work!
-Chris