Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp136277ybl; Tue, 7 Jan 2020 15:44:34 -0800 (PST) X-Google-Smtp-Source: APXvYqwted/QLQwd6qppNfqdwCKUkPn5Bs6G9BGa9pXC5wh5ggWSIEMdy4jCHzr2IosQIf+BlHXV X-Received: by 2002:a54:4501:: with SMTP id l1mr761098oil.101.1578440674476; Tue, 07 Jan 2020 15:44:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578440674; cv=none; d=google.com; s=arc-20160816; b=M3zbS7eCXU/u7/8mktFyBRESGo3/RM6fa4iw71I3QaPTizjssxTJ8pW7PmUDoIOagw XcQwWJLISuoggDr//s7JMs6PFSi5yT/e1sPFami8xTinhlbLHBHpk68YsK1g9WSfTPVY ucczftHbZw6sV0lQJMfZSH0PixeIMtc/Fh1gBHP96q8XH5OSd8hwRiS+25PCY0VtoC8p y4OtHXqYMosos/aEWmj4E7GaoAEEcl3d/it+wz563JVqbsERwuxI2r0oYXhES92XD4yZ 1BtHQiAT47N6FBijk46OBAd6COU5/alOBmQyWXHvGkemL37Y6oxQ/4KdcPzVVAyLmhUq 1YjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date:ironport-sdr :dkim-signature; bh=rdcg1lWycoz2N6RBe/TY1zMNmoMIsPWFhPaqjiy0cqo=; b=THVbnsoKIUeIxvBSmRiCp9PAOS1QKDcdS16jng1EV/IpOorqOrU8BTzqPc9m8fQC18 4QJAU2K4fpAGwdHYYsFrXZrfk9NFWQ+wRsk1IXM9qiEcQmsNCEqxpNnWPZQm8GvuOxfS Ttoa6anZqHYwvqHyHHL88zB6k3tc2pcMOPrtyLLh6/QJ7qMijoO6Qnxdn2r8ONKFz743 XH0shBpe3y5RvP5EnmXNhZfU4QUMtDYYTlwVjoeZDGU34s747A2Y8rf1q8iegq2yaUHk FFJFEdIzlGPM8B0IHjkIS/Evk+hi44wS7eepWrwkMF632EI1PRSGW/DlAyE8DS+hCOGN c47w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=WWfVw+9p; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y22si726141oti.269.2020.01.07.15.44.23; Tue, 07 Jan 2020 15:44:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=WWfVw+9p; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727510AbgAGXng (ORCPT + 99 others); Tue, 7 Jan 2020 18:43:36 -0500 Received: from smtp-fw-9102.amazon.com ([207.171.184.29]:11255 "EHLO smtp-fw-9102.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726932AbgAGXnf (ORCPT ); Tue, 7 Jan 2020 18:43:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1578440614; x=1609976614; h=date:from:to:cc:subject:message-id:mime-version; bh=rdcg1lWycoz2N6RBe/TY1zMNmoMIsPWFhPaqjiy0cqo=; b=WWfVw+9p/is/nhE0J1Pu4xxEMlotTsl0ERg4FgAutT8zjCt3V8+xMpn6 TgLeL+JlUXI65sK8wv8FYciwlsBCHUSaZRaocnm6uPr65FTS1pcfejTeu V7r2/qJTFelQ8vvhGBwPaPIvqJehLXItX6mVkn8PA5cyrw8BDtUbLB9Rn A=; IronPort-SDR: axEFrB4PTHlFKe/HXSD29MdTq89JZXh7B1fKqv3wb6QDgYRWQBHq3hPDRCvX3S368jrGZCfCgc QU7WOYAKbDQA== X-IronPort-AV: E=Sophos;i="5.69,407,1571702400"; d="scan'208";a="17326029" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-a70de69e.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP; 07 Jan 2020 23:43:32 +0000 Received: from EX13MTAUEE002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1e-a70de69e.us-east-1.amazon.com (Postfix) with ESMTPS id E2104A01E2; Tue, 7 Jan 2020 23:43:24 +0000 (UTC) Received: from EX13D08UEE002.ant.amazon.com (10.43.62.92) by EX13MTAUEE002.ant.amazon.com (10.43.62.24) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 7 Jan 2020 23:43:06 +0000 Received: from EX13MTAUEA002.ant.amazon.com (10.43.61.77) by EX13D08UEE002.ant.amazon.com (10.43.62.92) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 7 Jan 2020 23:43:06 +0000 Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (172.22.96.68) by mail-relay.amazon.com (10.43.61.169) with Microsoft SMTP Server id 15.0.1236.3 via Frontend Transport; Tue, 7 Jan 2020 23:43:06 +0000 Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130) id 0A80440E65; Tue, 7 Jan 2020 23:43:06 +0000 (UTC) Date: Tue, 7 Jan 2020 23:43:06 +0000 From: Anchal Agarwal To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: Subject: [RFC PATCH V2 07/11] x86/xen: save and restore steal clock during hibernation Message-ID: <20200107234306.GA18610@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Munehisa Kamata Currently, steal time accounting code in scheduler expects steal clock callback to provide monotonically increasing value. If the accounting code receives a smaller value than previous one, it uses a negative value to calculate steal time and results in incorrectly updated idle and steal time accounting. This breaks userspace tools which read /proc/stat. top - 08:05:35 up 2:12, 3 users, load average: 0.00, 0.07, 0.23 Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,30100.0%id, 0.0%wa, 0.0%hi, 0.0%si, -1253874204672.0%st This can actually happen when a Xen PVHVM guest gets restored from hibernation, because such a restored guest is just a fresh domain from Xen perspective and the time information in runstate info starts over from scratch. Introduce xen_save_steal_clock() which saves current steal clock values of all present CPUs in runstate info into per-cpu variables during system core ops suspend callbacks. Its couterpart, xen_restore_steal_clock(), restores a boot CPU's steal clock in the system core resume callback. It sets offset if it found the current values in runstate info are smaller than previous ones. xen_steal_clock() is also modified to use the offset to ensure that scheduler only sees monotonically increasing number. For non-boot CPUs, restore after they're brought up, because runstate info for non-boot CPUs are not active until then. [Anchal Changelog: Merged patch xen/time: introduce xen_{save,restore}_steal_clock with this one for better code readability] Signed-off-by: Anchal Agarwal Signed-off-by: Munehisa Kamata --- arch/x86/xen/suspend.c | 13 ++++++++++++- arch/x86/xen/time.c | 3 +++ drivers/xen/time.c | 28 +++++++++++++++++++++++++++- include/xen/xen-ops.h | 2 ++ 4 files changed, 44 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 784c4484100b..dae0f74f5390 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -91,12 +91,20 @@ void xen_arch_suspend(void) static int xen_syscore_suspend(void) { struct xen_remove_from_physmap xrfp; - int ret; + int cpu, ret; /* Xen suspend does similar stuffs in its own logic */ if (xen_suspend_mode_is_xen_suspend()) return 0; + for_each_present_cpu(cpu) { + /* + * Nonboot CPUs are already offline, but the last copy of + * runstate info is still accessible. + */ + xen_save_steal_clock(cpu); + } + xrfp.domid = DOMID_SELF; xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT; @@ -118,6 +126,9 @@ static void xen_syscore_resume(void) pvclock_resume(); + /* Nonboot CPUs will be resumed when they're brought up */ + xen_restore_steal_clock(smp_processor_id()); + gnttab_resume(); } diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index befbdd8b17f0..8cf632dda605 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -537,6 +537,9 @@ static void xen_hvm_setup_cpu_clockevents(void) { int cpu = smp_processor_id(); xen_setup_runstate_info(cpu); + if (cpu) + xen_restore_steal_clock(cpu); + /* * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence * doing it xen_hvm_cpu_notify (which gets called by smp_init during diff --git a/drivers/xen/time.c b/drivers/xen/time.c index 0968859c29d0..3713d716070c 100644 --- a/drivers/xen/time.c +++ b/drivers/xen/time.c @@ -20,6 +20,8 @@ /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); +static DEFINE_PER_CPU(u64, xen_prev_steal_clock); +static DEFINE_PER_CPU(u64, xen_steal_clock_offset); static DEFINE_PER_CPU(u64[4], old_runstate_time); @@ -149,7 +151,7 @@ bool xen_vcpu_stolen(int vcpu) return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable; } -u64 xen_steal_clock(int cpu) +static u64 __xen_steal_clock(int cpu) { struct vcpu_runstate_info state; @@ -157,6 +159,30 @@ u64 xen_steal_clock(int cpu) return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; } +u64 xen_steal_clock(int cpu) +{ + return __xen_steal_clock(cpu) + per_cpu(xen_steal_clock_offset, cpu); +} + +void xen_save_steal_clock(int cpu) +{ + per_cpu(xen_prev_steal_clock, cpu) = xen_steal_clock(cpu); +} + +void xen_restore_steal_clock(int cpu) +{ + u64 steal_clock = __xen_steal_clock(cpu); + + if (per_cpu(xen_prev_steal_clock, cpu) > steal_clock) { + /* Need to update the offset */ + per_cpu(xen_steal_clock_offset, cpu) = + per_cpu(xen_prev_steal_clock, cpu) - steal_clock; + } else { + /* Avoid unnecessary steal clock warp */ + per_cpu(xen_steal_clock_offset, cpu) = 0; + } +} + void xen_setup_runstate_info(int cpu) { struct vcpu_register_runstate_memory_area area; diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 3b3992b5b0c2..12b3f4474a05 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -37,6 +37,8 @@ void xen_time_setup_guest(void); void xen_manage_runstate_time(int action); void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); u64 xen_steal_clock(int cpu); +void xen_save_steal_clock(int cpu); +void xen_restore_steal_clock(int cpu); int xen_setup_shutdown_event(void); -- 2.15.3.AMZN