2019-01-11 16:22:30

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH v2] xen: Fix x86 sched_clock() interface for xen

Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
sched_clock() interface") broke Xen guest time handling across
migration:

[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 187.251137] OOM killer disabled.
[ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 187.252299] suspending xenstore...
[ 187.266987] xen:grant_table: Grant tables using version 1 layout
[18446743811.706476] OOM killer enabled.
[18446743811.706478] Restarting tasks ... done.
[18446743811.720505] Setting capacity to 16777216

Fix that by setting xen_sched_clock_offset at resume time to ensure a
monotonic clock value.

Fixes: f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
Cc: <[email protected]> # 4.11
Reported-by: Hans van Kranenburg <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/xen/time.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 72bf446c3fee..3c1dfdf26284 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -379,12 +379,15 @@ static const struct pv_time_ops xen_time_ops __initconst = {
};

static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
+static u64 xen_clock_value_saved;

void xen_save_time_memory_area(void)
{
struct vcpu_register_time_memory_area t;
int ret;

+ xen_clock_value_saved = xen_clocksource_read() - xen_sched_clock_offset;
+
if (!xen_clock)
return;

@@ -404,7 +407,7 @@ void xen_restore_time_memory_area(void)
int ret;

if (!xen_clock)
- return;
+ goto out;

t.addr.v = &xen_clock->pvti;

@@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void)
if (ret != 0)
pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
ret);
+
+out:
+ /* Need pvclock_resume() before using xen_clocksource_read(). */
+ pvclock_resume();
+ xen_sched_clock_offset = xen_clocksource_read() - xen_clock_value_saved;
}

static void xen_setup_vsyscall_time_info(void)
--
2.16.4



2019-01-11 14:20:16

by Hans van Kranenburg

[permalink] [raw]
Subject: Re: [PATCH v2] xen: Fix x86 sched_clock() interface for xen

Hi,

On 1/11/19 1:08 PM, Juergen Gross wrote:
> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
> sched_clock() interface") broke Xen guest time handling across
> migration:
>
> [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [ 187.251137] OOM killer disabled.
> [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 187.252299] suspending xenstore...
> [ 187.266987] xen:grant_table: Grant tables using version 1 layout
> [18446743811.706476] OOM killer enabled.
> [18446743811.706478] Restarting tasks ... done.
> [18446743811.720505] Setting capacity to 16777216
>
> Fix that by setting xen_sched_clock_offset at resume time to ensure a
> monotonic clock value.
>
> [...]

I'm throwing around a PV domU over a bunch of test servers with live
migrate now, and in between the kernel logging, I'm seeing this:

[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
0.002 seconds) done.
[Fri Jan 11 13:58:42 2019] OOM killer disabled.
[Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[Fri Jan 11 13:58:42 2019] suspending xenstore...
[Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
[Fri Jan 11 13:58:42 2019] Current state: 1
[Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
[Fri Jan 11 13:58:42 2019] Modules linked in:
[Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
4.19.14+ #1
[Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0
[Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80
3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01
e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0
74 69 48 89
[Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082
[Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80
RCX: 0000000000000006
[Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001
RDI: ffff88805df963f0
[Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000
R09: 0000000000000119
[Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d
R12: ffff88805df9ca40
[Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000
R15: 0000000000000000
[Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000)
GS:ffff88805df80000(0000) knlGS:0000000000000000
[Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
[Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000
CR4: 0000000000002660
[Fri Jan 11 13:58:42 2019] Call Trace:
[Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70
[Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100
[Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20
[Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0
[Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0
[Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260
[Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70
[Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]---
[Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1
layout
[Fri Jan 11 13:58:42 2019] OOM killer enabled.
[Fri Jan 11 13:58:42 2019] Restarting tasks ... done.
[Fri Jan 11 13:58:42 2019] Setting capacity to 6291456
[Fri Jan 11 13:58:42 2019] Setting capacity to 10485760

This always happens on every *first* live migrate that I do after
starting the domU.

OTOH, With v1 I've indeed also observed the 'leftover' time jump on the
first line when resuming. It didn't happen always, but it seems to not
happen anymore at all now, so that's good.

Hans

2019-01-11 14:23:23

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for xen

On 11/01/2019 14:12, Hans van Kranenburg wrote:
> Hi,
>
> On 1/11/19 1:08 PM, Juergen Gross wrote:
>> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
>> sched_clock() interface") broke Xen guest time handling across
>> migration:
>>
>> [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [ 187.251137] OOM killer disabled.
>> [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [ 187.252299] suspending xenstore...
>> [ 187.266987] xen:grant_table: Grant tables using version 1 layout
>> [18446743811.706476] OOM killer enabled.
>> [18446743811.706478] Restarting tasks ... done.
>> [18446743811.720505] Setting capacity to 16777216
>>
>> Fix that by setting xen_sched_clock_offset at resume time to ensure a
>> monotonic clock value.
>>
>> [...]
>
> I'm throwing around a PV domU over a bunch of test servers with live
> migrate now, and in between the kernel logging, I'm seeing this:
>
> [Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
> 0.002 seconds) done.
> [Fri Jan 11 13:58:42 2019] OOM killer disabled.
> [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
> (elapsed 0.000 seconds) done.
> [Fri Jan 11 13:58:42 2019] suspending xenstore...
> [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
> [Fri Jan 11 13:58:42 2019] Current state: 1
> [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
> kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
> [Fri Jan 11 13:58:42 2019] Modules linked in:
> [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 4.19.14+ #1
> [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0
> [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80
> 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01
> e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0
> 74 69 48 89
> [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082
> [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80
> RCX: 0000000000000006
> [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001
> RDI: ffff88805df963f0
> [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000
> R09: 0000000000000119
> [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d
> R12: ffff88805df9ca40
> [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000
> R15: 0000000000000000
> [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000)
> GS:ffff88805df80000(0000) knlGS:0000000000000000
> [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
> [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000
> CR4: 0000000000002660
> [Fri Jan 11 13:58:42 2019] Call Trace:
> [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70
> [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100
> [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20
> [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0
> [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0
> [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260
> [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70
> [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]---
> [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1
> layout
> [Fri Jan 11 13:58:42 2019] OOM killer enabled.
> [Fri Jan 11 13:58:42 2019] Restarting tasks ... done.
> [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456
> [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
>
> This always happens on every *first* live migrate that I do after
> starting the domU.

Yeah, its a WARN_ONCE().

And you didn't see it with v1 of the patch?

On the first glance this might be another bug just being exposed by
my patch.

I'm investigating further, but this might take some time. Could you
meanwhile verify the same happens with kernel 5.0-rc1? That was the
one I tested with and I didn't spot that WARN.


Juergen

2019-01-11 16:00:17

by Hans van Kranenburg

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for xen

On 1/11/19 3:01 PM, Juergen Gross wrote:
> On 11/01/2019 14:12, Hans van Kranenburg wrote:
>> Hi,
>>
>> On 1/11/19 1:08 PM, Juergen Gross wrote:
>>> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
>>> sched_clock() interface") broke Xen guest time handling across
>>> migration:
>>>
>>> [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>> [ 187.251137] OOM killer disabled.
>>> [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>> [ 187.252299] suspending xenstore...
>>> [ 187.266987] xen:grant_table: Grant tables using version 1 layout
>>> [18446743811.706476] OOM killer enabled.
>>> [18446743811.706478] Restarting tasks ... done.
>>> [18446743811.720505] Setting capacity to 16777216
>>>
>>> Fix that by setting xen_sched_clock_offset at resume time to ensure a
>>> monotonic clock value.
>>>
>>> [...]
>>
>> I'm throwing around a PV domU over a bunch of test servers with live
>> migrate now, and in between the kernel logging, I'm seeing this:
>>
>> [Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
>> 0.002 seconds) done.
>> [Fri Jan 11 13:58:42 2019] OOM killer disabled.
>> [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
>> (elapsed 0.000 seconds) done.
>> [Fri Jan 11 13:58:42 2019] suspending xenstore...
>> [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
>> [Fri Jan 11 13:58:42 2019] Current state: 1
>> [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
>> kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
>> [Fri Jan 11 13:58:42 2019] Modules linked in:
>> [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
>> 4.19.14+ #1
>> [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0
>> [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80
>> 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01
>> e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0
>> 74 69 48 89
>> [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082
>> [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80
>> RCX: 0000000000000006
>> [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001
>> RDI: ffff88805df963f0
>> [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000
>> R09: 0000000000000119
>> [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d
>> R12: ffff88805df9ca40
>> [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000
>> R15: 0000000000000000
>> [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000)
>> GS:ffff88805df80000(0000) knlGS:0000000000000000
>> [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
>> [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000
>> CR4: 0000000000002660
>> [Fri Jan 11 13:58:42 2019] Call Trace:
>> [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70
>> [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100
>> [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20
>> [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0
>> [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0
>> [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260
>> [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70
>> [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]---
>> [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1
>> layout
>> [Fri Jan 11 13:58:42 2019] OOM killer enabled.
>> [Fri Jan 11 13:58:42 2019] Restarting tasks ... done.
>> [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456
>> [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
>>
>> This always happens on every *first* live migrate that I do after
>> starting the domU.
>
> Yeah, its a WARN_ONCE().
>
> And you didn't see it with v1 of the patch?

No.

> On the first glance this might be another bug just being exposed by
> my patch.
>
> I'm investigating further, but this might take some time. Could you
> meanwhile verify the same happens with kernel 5.0-rc1? That was the
> one I tested with and I didn't spot that WARN.

I have Linux 5.0-rc1 with v2 on top now, which gives me this on live
migrate:

[ 51.845967] xen:grant_table: Grant tables using version 1 layout
[ 51.871076] BUG: unable to handle kernel NULL pointer dereference at
0000000000000098
[ 51.871091] #PF error: [normal kernel read fault]
[ 51.871100] PGD 0 P4D 0
[ 51.871109] Oops: 0000 [#1] SMP NOPTI
[ 51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1
[ 51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270
[ 51.871141] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1
04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c
f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7
[ 51.871161] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282
[ 51.871173] RAX: 0000000000000000 RBX: ffffffff82541728 RCX:
0000000000000000
[ 51.871184] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI:
0000000000000001
[ 51.871196] RBP: ffff88805be8b720 R08: 0000000000000001 R09:
ffffea0001699900
[ 51.871206] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff88805be8b218
[ 51.871217] R13: ffff88805d0fae68 R14: 0000000000000001 R15:
0000000000000004
[ 51.871237] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000)
knlGS:0000000000000000
[ 51.871252] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.871261] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4:
0000000000002660
[ 51.871275] Call Trace:
[ 51.871285] blk_mq_update_nr_hw_queues+0x2fd/0x380
[ 51.871297] blkfront_resume+0x200/0x3f0
[ 51.871307] xenbus_dev_resume+0x48/0xf0
[ 51.871317] ? xenbus_dev_probe+0x120/0x120
[ 51.871326] dpm_run_callback+0x3c/0x160
[ 51.871336] device_resume+0xce/0x1d0
[ 51.871344] dpm_resume+0x115/0x2f0
[ 51.871352] ? find_watch+0x40/0x40
[ 51.871360] dpm_resume_end+0x8/0x10
[ 51.871370] do_suspend+0xef/0x1b0
[ 51.871378] shutdown_handler+0x123/0x150
[ 51.871387] xenwatch_thread+0xbb/0x160
[ 51.871397] ? wait_woken+0x80/0x80
[ 51.871406] kthread+0xf3/0x130
[ 51.871416] ? kthread_create_worker_on_cpu+0x70/0x70
[ 51.871427] ret_from_fork+0x35/0x40
[ 51.871435] Modules linked in:
[ 51.871443] CR2: 0000000000000098
[ 51.871452] ---[ end trace 84a3a6932d70aa71 ]---
[ 51.871461] RIP: e030:blk_mq_map_swqueue+0x103/0x270
[ 51.871471] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1
04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c
f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7
[ 51.871491] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282
[ 51.871501] RAX: 0000000000000000 RBX: ffffffff82541728 RCX:
0000000000000000
[ 51.871512] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI:
0000000000000001
[ 51.871523] RBP: ffff88805be8b720 R08: 0000000000000001 R09:
ffffea0001699900
[ 51.871533] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff88805be8b218
[ 51.871545] R13: ffff88805d0fae68 R14: 0000000000000001 R15:
0000000000000004
[ 51.871562] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000)
knlGS:0000000000000000
[ 51.871573] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.871582] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4:
0000000000002660

When starting it on another test dom0 to see if the direction of
movement matters, it mostly fails to boot with:

[Fri Jan 11 16:16:34 2019] BUG: unable to handle kernel paging request
at ffff88805c61e9f0
[Fri Jan 11 16:16:34 2019] #PF error: [PROT] [WRITE]
[Fri Jan 11 16:16:34 2019] PGD 2410067 P4D 2410067 PUD 2c00067 PMD
5ff26067 PTE 801000005c61e065
[Fri Jan 11 16:16:34 2019] Oops: 0003 [#1] SMP NOPTI
[Fri Jan 11 16:16:34 2019] CPU: 3 PID: 1943 Comm: apt-get Not tainted
5.0.0-rc1 #1
[Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970
[Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8
c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18
b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00
00 48 8b 44
[Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282
[Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58
RCX: 000000000000000c
[Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000
RDI: 0000000000000201
[Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000
R09: 00000000000260a0
[Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000
R12: ffffea00017187a8
[Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000
R15: 00007f04e9600000
[Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000)
GS:ffff88805df80000(0000) knlGS:0000000000000000
[Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000
CR4: 0000000000002660
[Fri Jan 11 16:16:34 2019] Call Trace:
[Fri Jan 11 16:16:34 2019] move_vma.isra.34+0xd1/0x2d0
[Fri Jan 11 16:16:34 2019] __x64_sys_mremap+0x1b3/0x370
[Fri Jan 11 16:16:34 2019] do_syscall_64+0x49/0x100
[Fri Jan 11 16:16:34 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri Jan 11 16:16:34 2019] RIP: 0033:0x7f04ee2e227a
[Fri Jan 11 16:16:34 2019] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64
89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19
00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8
64 89 01 48
[Fri Jan 11 16:16:34 2019] RSP: 002b:00007fffb3da3e38 EFLAGS: 00000246
ORIG_RAX: 0000000000000019
[Fri Jan 11 16:16:34 2019] RAX: ffffffffffffffda RBX: 000056533fa1bf50
RCX: 00007f04ee2e227a
[Fri Jan 11 16:16:34 2019] RDX: 0000000001a00000 RSI: 0000000001900000
RDI: 00007f04e95ac000
[Fri Jan 11 16:16:34 2019] RBP: 0000000001a00000 R08: 2e8ba2e8ba2e8ba3
R09: 0000000000000040
[Fri Jan 11 16:16:34 2019] R10: 0000000000000001 R11: 0000000000000246
R12: 00007f04e95ac060
[Fri Jan 11 16:16:34 2019] R13: 00007f04e95ac000 R14: 000056533fa45d73
R15: 000056534024bd10
[Fri Jan 11 16:16:34 2019] Modules linked in:
[Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0
[Fri Jan 11 16:16:34 2019] ---[ end trace 443702bd9ba5d6b2 ]---
[Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970
[Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8
c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18
b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00
00 48 8b 44
[Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282
[Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58
RCX: 000000000000000c
[Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000
RDI: 0000000000000201
[Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000
R09: 00000000000260a0
[Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000
R12: ffffea00017187a8
[Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000
R15: 00007f04e9600000
[Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000)
GS:ffff88805df80000(0000) knlGS:0000000000000000
[Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000
CR4: 0000000000002660

I can log in over ssh, but a command like ps afxu hangs. Oh, it seems
that 5.0-rc1 is doing this all the time. Next time it's after 500
seconds uptime.

xl destroy and trying again, it boots. 1st live migrate successful (and
no clockevents_switch_state complaints), second one explodes with
blk_mq_update_nr_hw_queues again.

Hmok, as long as I live migrate the 5.0-rc1 domU around between dom0s
with Xen 4.11.1-pre from commit 5acdd26fdc (the one we had in debian
until yesterday) and Linux 4.19.9 in the dom0, it works. As soon as I
live migrate to the one box running the new Xen 4.11.1 package from
Debian unstable, and Linux 4.19.12, then I get the
blk_mq_update_nr_hw_queues.

If I do the same with 4.19 in the domU, I don't get
blk_mq_update_nr_hw_queues.

Now, back to 4.19.14 + guard_hole + v2, I can't seem to reproduce the
clockevents_switch_state any more. I'll take a break and then see to
find out if I'm doing anything different than earlier today when I could
reproduce it 100% consistently.

O_o :)

Hans

2019-01-11 21:17:42

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH v2] xen: Fix x86 sched_clock() interface for xen

On 1/11/19 7:08 AM, Juergen Gross wrote:
> @@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void)
> if (ret != 0)
> pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
> ret);
> +
> +out:
> + /* Need pvclock_resume() before using xen_clocksource_read(). */
> + pvclock_resume();

Do we want to drop it from xen_timer_resume then?

BTW, I ran this code a few times on top of for-linus-4.21 and didn't see
any of the problems that Hans described. I didn't do LM, only
save/restore, but I'd think it shouldn't make any difference.

-boris

2019-01-11 21:49:29

by Hans van Kranenburg

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for xen

On 1/11/19 4:57 PM, Hans van Kranenburg wrote:
> On 1/11/19 3:01 PM, Juergen Gross wrote:
>> On 11/01/2019 14:12, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> On 1/11/19 1:08 PM, Juergen Gross wrote:
>>>> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
>>>> sched_clock() interface") broke Xen guest time handling across
>>>> migration:
>>>>
>>>> [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>> [ 187.251137] OOM killer disabled.
>>>> [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>> [ 187.252299] suspending xenstore...
>>>> [ 187.266987] xen:grant_table: Grant tables using version 1 layout
>>>> [18446743811.706476] OOM killer enabled.
>>>> [18446743811.706478] Restarting tasks ... done.
>>>> [18446743811.720505] Setting capacity to 16777216
>>>>
>>>> Fix that by setting xen_sched_clock_offset at resume time to ensure a
>>>> monotonic clock value.
>>>>
>>>> [...]
>>>
>>> I'm throwing around a PV domU over a bunch of test servers with live
>>> migrate now, and in between the kernel logging, I'm seeing this:
>>>
>>> [Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
>>> 0.002 seconds) done.
>>> [Fri Jan 11 13:58:42 2019] OOM killer disabled.
>>> [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
>>> (elapsed 0.000 seconds) done.
>>> [Fri Jan 11 13:58:42 2019] suspending xenstore...
>>> [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
>>> [Fri Jan 11 13:58:42 2019] Current state: 1
>>> [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
>>> kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
>>> [...]
>>>
>>> This always happens on every *first* live migrate that I do after
>>> starting the domU.
>>
>> Yeah, its a WARN_ONCE().

Ok, false alarm. It's there, but not caused by this patch.

I changed the WARN_ONCE to WARN for funs, and now I get it a lot more
already (v2):

https://paste.debian.net/plainh/d535a379

>> And you didn't see it with v1 of the patch?
>
> No.

I was wrong. I tried a bit more, and I can also reproduce without v1 or
v2 patch at all, and I can reproduce it with v4.19.9. Just sometimes
needs a dozen times more live migrating it before it shows up.

I cannot make it happen to show up with the Debian 4.19.9 distro kernel,
that's why it was new for me.

So, let's ignore it in this thread now.

>> On the first glance this might be another bug just being exposed by
>> my patch.
>>
>> I'm investigating further, but this might take some time. Could you
>> meanwhile verify the same happens with kernel 5.0-rc1? That was the
>> one I tested with and I didn't spot that WARN.
>
> I have Linux 5.0-rc1 with v2 on top now, which gives me this on live
> migrate:
>
> [...]
> [ 51.871076] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000098
> [ 51.871091] #PF error: [normal kernel read fault]
> [ 51.871100] PGD 0 P4D 0
> [ 51.871109] Oops: 0000 [#1] SMP NOPTI
> [ 51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1
> [ 51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270
> [...]

Dunno about all the 5.0-rc1 crashes yet. I can provide more feedback
about that if you want, but not in here I presume.

Hans

2019-01-14 12:44:55

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH v2] xen: Fix x86 sched_clock() interface for xen

On 11/01/2019 21:35, Boris Ostrovsky wrote:
> On 1/11/19 7:08 AM, Juergen Gross wrote:
>> @@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void)
>> if (ret != 0)
>> pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
>> ret);
>> +
>> +out:
>> + /* Need pvclock_resume() before using xen_clocksource_read(). */
>> + pvclock_resume();
>
> Do we want to drop it from xen_timer_resume then?

Yes. V3 on its way...


Juergen