Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1957993ybl; Thu, 9 Jan 2020 04:37:34 -0800 (PST) X-Google-Smtp-Source: APXvYqyetexhkbhonvf72b0Wf7/fV/qwRR3BKS6/r5cPggaM5j00Cf0mCOLMQTrmW8UEtsq8vHrL X-Received: by 2002:aca:c3c4:: with SMTP id t187mr2974137oif.89.1578573454408; Thu, 09 Jan 2020 04:37:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578573454; cv=none; d=google.com; s=arc-20160816; b=K2ADONRnKLI28Yqkae1qTmp9W86TNJg6LCUzPC5oShon3ZP5LceN6zW3feKuBEcbN2 90l4B2yv9P8mdqYs4t8WMtaSLY46lqY+n9RxyvZ14vp0D7LbfSuTL5dZeqynIPx9cJE3 5Ldqpj5oeXOlCH3XxxwY47/wKbt9go/kJ7Z8UpP2Q5jZNPYVNWQ+HkES0KO+1e+oVA5G 5ur+9PIts2fi4XYAT2q7qGYH6PiZhOK16GrNa6jUqxT9AXDPwiJRB5reKuqX4OC88wiH 1N3nwhbv6oBHONyaPH8hlIAXOm7yP3uIJMvFUYP+loaow7rT/xGXk7K/I4zNIVLfpFVW tSkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=NNEeraLrSTT17N38sHkUfX3+Gz3IA611lZ8ecXGhX3A=; b=SDVCmPY8avyxHiIRlGYTgIb+ew7bmzapxCoQIm0kKC8qJnFhjrfx+DFkoPFi6S3twZ PwXVEUnu/JambVCC6iAQOHazMpVj8aKIFoOoIfhIfesv6r8PzxHDoI0gIqJt/lSRC8SL m3PeV8Q2lhPy/8QquJ0EScGyDaXgt6R56S4zh7ZxEn2Gv11nqJX0Po8CB0OUWGLDOYsF nX9fwQ2cjPNEy08PQZdk5jKm3nh4ZT4WyQLRhQpA5PZ4GRSZyPfS1JWF103rC+2UgqTQ gvrTAxW1+AhqhrYQzsqIBSGFuaogPT0hHowcjqtvf9YK62klr4mBfAYjEuof0T/hnDpw Z6lQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h15si3501931oih.130.2020.01.09.04.37.22; Thu, 09 Jan 2020 04:37:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730596AbgAILZI (ORCPT + 99 others); Thu, 9 Jan 2020 06:25:08 -0500 Received: from foss.arm.com ([217.140.110.172]:57368 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729891AbgAILZH (ORCPT ); Thu, 9 Jan 2020 06:25:07 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C6A9A31B; Thu, 9 Jan 2020 03:25:06 -0800 (PST) Received: from localhost (unknown [10.37.6.20]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4A57D3F703; Thu, 9 Jan 2020 03:25:06 -0800 (PST) Date: Thu, 9 Jan 2020 11:25:04 +0000 From: Andrew Murray To: Will Deacon Cc: Catalin Marinas , kvm@vger.kernel.org, Marc Zyngier , linux-kernel@vger.kernel.org, Sudeep Holla , kvmarm , linux-arm-kernel Subject: Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls Message-ID: <20200109112504.GZ42593@e119886-lin.cambridge.arm.com> References: <20191220143025.33853-1-andrew.murray@arm.com> <20191220143025.33853-10-andrew.murray@arm.com> <20191221141325.5a177343@why> <20200107151328.GW42593@e119886-lin.cambridge.arm.com> <20200108115816.GB15861@willie-the-truck> <745529f7e469b898b74dfc5153e3daf6@kernel.org> <20200108131020.GB16658@willie-the-truck> <20200109112336.GY42593@e119886-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200109112336.GY42593@e119886-lin.cambridge.arm.com> User-Agent: Mutt/1.10.1+81 (426a6c1) (2018-08-26) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2020 at 11:23:37AM +0000, Andrew Murray wrote: > On Wed, Jan 08, 2020 at 01:10:21PM +0000, Will Deacon wrote: > > On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote: > > > On 2020-01-08 11:58, Will Deacon wrote: > > > > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote: > > > > > On 2020-01-07 15:13, Andrew Murray wrote: > > > > > > Looking at the vcpu_load and related code, I don't see a way of saying > > > > > > 'don't schedule this VCPU on this CPU' or bailing in any way. > > > > > > > > > > That would actually be pretty easy to implement. In vcpu_load(), check > > > > > that that the CPU physical has SPE. If not, raise a request for that > > > > > vcpu. > > > > > In the run loop, check for that request and abort if raised, returning > > > > > to userspace. > > I hadn't really noticed the kvm_make_request mechanism - however it's now > clear how this could be implemented. > > This approach gives responsibility for which CPUs should be used to userspace > and if userspace gets it wrong then the KVM_RUN ioctl won't do very much. > > > > > > > > > > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out > > > > > where to run that particular vcpu. > > > > > > > > It's also worth considering systems where there are multiple > > > > implementations > > > > of SPE in play. Assuming we don't want to expose this to a guest, then > > > > the > > > > right interface here is probably for userspace to pick one SPE > > > > implementation and expose that to the guest. > > If I understand correctly then this implies the following: > > - If the host userspace indicates it wants support for SPE in the guest (via > KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that > the minimum version of SPE is present on the current CPU. 'minimum' because > we don't know why userspace has selected the given cpumask. > > - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that > have SPE with differing versions. If it does, and all CPUs have some form of > SPE then errors may occur in the guest. Perhaps this is OK and userspace > shouldn't get it wrong? Actually this could be guarded against by emulating the ID_AA64DFR0_EL1 such to cap the version to the minimum SPE version - if absolutely required. Thanks, Andrew Murray > > > > > > That fits with your idea > > > > above, > > > > where you basically get an immediate exit if we try to schedule a vCPU > > > > onto > > > > a CPU that isn't part of the SPE mask. > > > > > > Then it means that the VM should be configured with a mask indicating > > > which CPUs it is intended to run on, and setting such a mask is mandatory > > > for SPE. > > > > Yeah, and this could probably all be wrapped up by userspace so you just > > pass the SPE PMU name or something and it grabs the corresponding cpumask > > for you. > > > > > > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap > > > > > > the > > > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that > > > > > > reads the non-sanitised feature register. Therefore we don't go bang, > > > > > > but > > > > > > we also increase the size of any black-holes in SPE capturing. Though > > > > > > this > > > > > > feels like something that will cause grief down the line. > > > > > > > > > > > > Is there something else that can be done? > > > > > > > > > > How does userspace deal with this? When SPE is only available on > > > > > half of > > > > > the CPUs, how does perf work in these conditions? > > > > > > > > Not sure about userspace, but the kernel driver works by instantiating > > > > an > > > > SPE PMU instance only for the CPUs that have it and then that instance > > > > profiles for only those CPUs. You also need to do something similar if > > > > you had two CPU types with SPE, since the SPE configuration is likely to > > > > be > > > > different between them. > > > > > > So that's closer to what Andrew was suggesting above (running a guest on a > > > non-SPE CPU creates a profiling black hole). Except that we can't really > > > run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF > > > at EL1. > > > > Right. I wouldn't suggest the "black hole" approach for VMs, but it works > > for userspace so that's why the driver does it that way. > > > > > Conclusion: we need a mix of a cpumask to indicate which CPUs we want to > > > run on (generic, not-SPE related), > > If I understand correctly this mask isn't exposed to KVM (in the kernel) and > KVM (in the kernel) is unware of how the CPUs that have KVM_RUN called are > selected. > > Thus this implies the cpumask is a feature of KVM tool or QEMU that would > need to be added there. (E.g. kvm_cmd_run_work would set some affinity when > creating pthreads - based on a CPU mask triggered by setting the --spe flag)? > > Thanks, > > Andrew Murray > > > and a check for SPE-capable CPUs. > > > If any of these condition is not satisfied, the vcpu exits for userspace > > > to sort out the affinity. > > > > > > I hate heterogeneous systems. > > > > They hate you too ;) > > > > Will > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm