Received: by 10.223.164.221 with SMTP id h29csp2179581wrb; Sun, 29 Oct 2017 22:29:14 -0700 (PDT) X-Google-Smtp-Source: ABhQp+Q9Tmz+pPvJbGUeu24/8E6F7BhnFJkxbKprFIMMXyP/Z3zCHH8DJA7ZdkywexglebybaP5Z X-Received: by 10.99.53.72 with SMTP id c69mr6806504pga.225.1509341354269; Sun, 29 Oct 2017 22:29:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509341354; cv=none; d=google.com; s=arc-20160816; b=coaBsIDQmmp3HFrEgiGTiA1c9So8x7oWvynsTLbm1WQn4FwADQ7MZ48ntNO77NWAIJ uaRMHwzM/x/aMeg6z0eneAfBO8mwQGLxV8eONg25ezGJFkW6lj9ewIUmlheZXGbwzp2n E+V5yQWKZ5xYSX9+8PILph2pQxZqJLZ4t0k/s3VZKW6RzpBGlZYCdPDv7sTqjXDs7ETf TxwPWiob2Y+XeQtEbeS+Xa0uh7IfhMlWgcdfzuvXox/J5ZNHx8umBo3YjFaMlOsy8dpo rko0N5kv+l18GC21f1tie58aF64X1GESUOj5Rk8NHujNjc7mV4oS5RWwVTppsm1g9Vda hEow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=Pr/ycdUjIiLQCuGy6LN5BfU8/Q4pJ7ugsKXFPNUMO20=; b=NO0KJz+mUemGiroxVyjSf4HBTaHfbAElpGpP3BTAbzHRz3lFBcpPFjNMlWlQF5Dd2e nneRce0evpqrCm/7MbSSRP9/l/29Wx0trLGicYS4kzFyVBeeX+2PLkt2R5dji/cI0Tf1 kN78XU0rX2w6mh5nxs2pERbxUjn5Jtr8jxkyCYFW/xey/Uw72hgBJgT+LpGWu8iO90fJ WtRnOWWBs+EsB1PGZKKbdEe/Dq9N84GP5axYPlCBcm8uJKY5IOf7HNsmLoCSwjp9Vox+ tvJckl/1KVW7cG0HPOspN3w+7mrC3leRQ8oVZUZP9WMGyjAXEan/nfqsNIfDsz4SYA93 o7gw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t70si10157721pfa.105.2017.10.29.22.28.20; Sun, 29 Oct 2017 22:29:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752131AbdJ3F0x (ORCPT + 99 others); Mon, 30 Oct 2017 01:26:53 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:27006 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751568AbdJ3F0w (ORCPT ); Mon, 30 Oct 2017 01:26:52 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v9U5Qkmp020519 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Oct 2017 05:26:47 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v9U5QkFZ029480 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Oct 2017 05:26:46 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v9U5Qk77016026; Mon, 30 Oct 2017 05:26:46 GMT Received: from linux.cn.oracle.com (/10.182.70.198) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Oct 2017 22:26:45 -0700 From: Dongli Zhang To: xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Cc: boris.ostrovsky@oracle.com, jgross@suse.com, joao.m.martins@oracle.com Subject: [PATCH v4 1/1] xen/time: do not decrease steal time after live migration on xen Date: Mon, 30 Oct 2017 13:26:19 +0800 Message-Id: <1509341179-8802-1-git-send-email-dongli.zhang@oracle.com> X-Mailer: git-send-email 2.7.4 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org After guest live migration on xen, steal time in /proc/stat (cpustat[CPUTIME_STEAL]) might decrease because steal returned by xen_steal_lock() might be less than this_rq()->prev_steal_time which is derived from previous return value of xen_steal_clock(). For instance, steal time of each vcpu is 335 before live migration. cpu 198 0 368 200064 1962 0 0 1340 0 0 cpu0 38 0 81 50063 492 0 0 335 0 0 cpu1 65 0 97 49763 634 0 0 335 0 0 cpu2 38 0 81 50098 462 0 0 335 0 0 cpu3 56 0 107 50138 374 0 0 335 0 0 After live migration, steal time is reduced to 312. cpu 200 0 370 200330 1971 0 0 1248 0 0 cpu0 38 0 82 50123 500 0 0 312 0 0 cpu1 65 0 97 49832 634 0 0 312 0 0 cpu2 39 0 82 50167 462 0 0 312 0 0 cpu3 56 0 107 50207 374 0 0 312 0 0 Since runstate times are cumulative and cleared during xen live migration by xen hypervisor, the idea of this patch is to accumulate runstate times to global percpu variables before live migration suspend. Once guest VM is resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new runstate times and previously accumulated times stored in global percpu variables. Similar and more severe issue would impact prior linux 4.8-4.10 as discussed by Michael Las at https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest, which would overflow steal time and lead to 100% st usage in top command for linux 4.8-4.10. A backport of this patch would fix that issue. References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest Signed-off-by: Dongli Zhang --- Changed since v1: * relocate modification to xen_get_runstate_snapshot_cpu Changed since v2: * accumulate runstate times before live migration Changed since v3: * do not accumulate times in the case of guest checkpointing --- drivers/xen/manage.c | 2 ++ drivers/xen/time.c | 83 ++++++++++++++++++++++++++++++++++++++++++-- include/xen/interface/vcpu.h | 2 ++ include/xen/xen-ops.h | 1 + 4 files changed, 86 insertions(+), 2 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index c425d03..3dc085d 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -72,6 +72,7 @@ static int xen_suspend(void *data) } gnttab_suspend(); + xen_accumulate_runstate_time(-1); xen_arch_pre_suspend(); /* @@ -84,6 +85,7 @@ static int xen_suspend(void *data) : 0); xen_arch_post_suspend(si->cancelled); + xen_accumulate_runstate_time(si->cancelled); gnttab_resume(); if (!si->cancelled) { diff --git a/drivers/xen/time.c b/drivers/xen/time.c index ac5f23f..18e2b76 100644 --- a/drivers/xen/time.c +++ b/drivers/xen/time.c @@ -19,6 +19,9 @@ /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); +static DEFINE_PER_CPU(u64[RUNSTATE_max], old_runstate_time); +static u64 **runstate_time_delta; + /* return an consistent snapshot of 64-bit time/counter value */ static u64 get64(const u64 *p) { @@ -47,8 +50,8 @@ static u64 get64(const u64 *p) return ret; } -static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, - unsigned int cpu) +static void xen_get_runstate_snapshot_cpu_delta( + struct vcpu_runstate_info *res, unsigned int cpu) { u64 state_time; struct vcpu_runstate_info *state; @@ -66,6 +69,82 @@ static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, (state_time & XEN_RUNSTATE_UPDATE)); } +static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, + unsigned int cpu) +{ + int i; + + xen_get_runstate_snapshot_cpu_delta(res, cpu); + + for (i = 0; i < RUNSTATE_max; i++) + res->time[i] += per_cpu(old_runstate_time, cpu)[i]; +} + +void xen_accumulate_runstate_time(int action) +{ + struct vcpu_runstate_info state; + int cpu, i; + + switch (action) { + case -1: /* backup runstate time before suspend */ + WARN_ON_ONCE(unlikely(runstate_time_delta)); + + runstate_time_delta = kcalloc(num_possible_cpus(), + sizeof(*runstate_time_delta), + GFP_KERNEL); + if (unlikely(!runstate_time_delta)) { + pr_alert("%s: failed to allocate runstate_time_delta\n", + __func__); + return; + } + + for_each_possible_cpu(cpu) { + runstate_time_delta[cpu] = kmalloc_array(RUNSTATE_max, + sizeof(**runstate_time_delta), + GFP_KERNEL); + if (unlikely(!runstate_time_delta[cpu])) { + pr_alert("%s: failed to allocate runstate_time_delta[%d]\n", + __func__, cpu); + action = 0; + goto reclaim_mem; + } + + xen_get_runstate_snapshot_cpu_delta(&state, cpu); + memcpy(runstate_time_delta[cpu], + state.time, + RUNSTATE_max * sizeof(**runstate_time_delta)); + } + break; + + case 0: /* backup runstate time after resume */ + if (unlikely(!runstate_time_delta)) { + pr_alert("%s: cannot accumulate runstate time as runstate_time_delta is NULL\n", + __func__); + return; + } + + for_each_possible_cpu(cpu) { + for (i = 0; i < RUNSTATE_max; i++) + per_cpu(old_runstate_time, cpu)[i] += + runstate_time_delta[cpu][i]; + } + break; + + default: /* do not accumulate runstate time for checkpointing */ + break; + } + +reclaim_mem: + if (action != -1 && runstate_time_delta) { + for_each_possible_cpu(cpu) { + if (likely(runstate_time_delta[cpu])) + kfree(runstate_time_delta[cpu]); + } + kfree(runstate_time_delta); + runstate_time_delta = NULL; + } +} + /* * Runstate accounting */ diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h index 98188c8..85e81ce 100644 --- a/include/xen/interface/vcpu.h +++ b/include/xen/interface/vcpu.h @@ -110,6 +110,8 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_runstate_info); */ #define RUNSTATE_offline 3 +#define RUNSTATE_max 4 + /* * Register a shared memory area from which the guest may obtain its own * runstate information without needing to execute a hypercall. diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 218e6aa..b1f9ae9 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -32,6 +32,7 @@ void xen_resume_notifier_unregister(struct notifier_block *nb); bool xen_vcpu_stolen(int vcpu); void xen_setup_runstate_info(int cpu); void xen_time_setup_guest(void); +void xen_accumulate_runstate_time(int action); void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); u64 xen_steal_clock(int cpu); -- 2.7.4 From 1583211408556663734@xxx Sun Nov 05 07:47:37 +0000 2017 X-GM-THRID: 1582912579471707091 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread