Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp6153568ybx; Mon, 11 Nov 2019 04:47:34 -0800 (PST) X-Google-Smtp-Source: APXvYqzD55110qZl7w7jQvY8/AW4SIZw56x4c6ByZAimHpZAg+G7exQiBxBrjtPVfLVnmTCow78Q X-Received: by 2002:a50:cd53:: with SMTP id d19mr25899826edj.197.1573476453867; Mon, 11 Nov 2019 04:47:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573476453; cv=none; d=google.com; s=arc-20160816; b=pat2GCKAodMWP3kVyp80wZ5TZTSYs8UtRVEnQA0TNfo8J4Y/ZXUZ6utiPZtdnTp/dr XOp63XjgEbvxoIxMPxk8DIs7LoltGFXKShFCMOvJWeSzeDwQXYOvSu8jlYreRv5Yh+fZ lUNWLevSKP4FLYhpiQCTElWnqvF1LxXaAK5dO45kPmviMHoBc4DdAy3AH8ZOI8ib5c8d HhwVi0gmYJIB84WO2/0rLBGt0I5grPmQ7lzKwz7R7DJFBrJqRHFol7ZM+b32KZYZENYq NzHorJbIqNRlV68fWFJQpf1dtpMvwK7bFStwdH2N8xlOHLhliCMhZvEW1srv9W+nE5eF x3vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=5FZEDMvL+J2f7HDeq/JXWWewtwG76vsBNmcxw8VJ7WQ=; b=JiTQmW1IYDcDA+s/dW54J8DI22oGKDLe5HSRtLuY/us9b1j6btdMFLPngMz99GU3aq hS3z9PrYXpkDg5RiDxl4asN8wpjfGXEvC8m7NNVuaZITf58UDi/N0hf+VIM4Xyo0pjBJ k4KOV7dDB1GkhnL9852nVKrI0+0XEuS8LS/RxJSBDjeG0tXEkxE+eiBciY+SuE5IiL8f aHTpqeSFptaLN0evIdNI+Xv/rvtQCXxX1mtsbSWC91LLGUra3zKLs5Oo0wLuZYdrDBLm qTdk76mqAvzOXUF5JCZHZIWBhewixEKv74rx5tMqnQIUfP1SQUxXz6nl4r0FXKe6rH7c cx2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t30si11125067edt.330.2019.11.11.04.47.09; Mon, 11 Nov 2019 04:47:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727059AbfKKMpx (ORCPT + 99 others); Mon, 11 Nov 2019 07:45:53 -0500 Received: from 3.mo173.mail-out.ovh.net ([46.105.34.1]:46963 "EHLO 3.mo173.mail-out.ovh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726834AbfKKMpw (ORCPT ); Mon, 11 Nov 2019 07:45:52 -0500 Received: from player788.ha.ovh.net (unknown [10.109.160.253]) by mo173.mail-out.ovh.net (Postfix) with ESMTP id 9ADE211D7E3 for ; Mon, 11 Nov 2019 12:26:39 +0100 (CET) Received: from kaod.org (lfbn-1-2229-223.w90-76.abo.wanadoo.fr [90.76.50.223]) (Authenticated sender: clg@kaod.org) by player788.ha.ovh.net (Postfix) with ESMTPSA id 9FB97BEE6275; Mon, 11 Nov 2019 11:26:26 +0000 (UTC) Subject: Re: [PATCH] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one To: Greg Kurz , Paul Mackerras Cc: Michael Ellerman , Benjamin Herrenschmidt , David Gibson , Lijun Pan , Satheesh Rajendran , kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, stable@vger.kernel.org, linux-kernel@vger.kernel.org References: <157346576671.818016.10401178701091199969.stgit@bahia.lan> From: =?UTF-8?Q?C=c3=a9dric_Le_Goater?= Message-ID: <3373a85a-09bb-3345-ef27-68177c360786@kaod.org> Date: Mon, 11 Nov 2019 12:26:25 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <157346576671.818016.10401178701091199969.stgit@bahia.lan> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Ovh-Tracer-Id: 5003217712743287575 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedufedruddvjedgvdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdqfffguegfifdpvefjgfevmfevgfenuceurghilhhouhhtmecuhedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepuffvfhfhkffffgggjggtgfesthejredttdefjeenucfhrhhomhepveorughrihgtpgfnvggpifhorghtvghruceotghlgheskhgrohgurdhorhhgqeenucfkpheptddrtddrtddrtddpledtrdejiedrhedtrddvvdefnecurfgrrhgrmhepmhhouggvpehsmhhtphdqohhuthdphhgvlhhopehplhgrhigvrhejkeekrdhhrgdrohhvhhdrnhgvthdpihhnvghtpedtrddtrddtrddtpdhmrghilhhfrhhomheptghlgheskhgrohgurdhorhhgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptd Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/11/2019 10:49, Greg Kurz wrote: > The EQ page is allocated by the guest and then passed to the hypervisor > with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page > before handing it over to the HW. This reference is dropped either when > the guest issues the H_INT_RESET hcall or when the KVM device is released. > But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times > to reset the EQ (vCPU hot unplug) or set a new EQ (guest reboot). In both > cases the EQ page reference is leaked. This is especially visible when > the guest memory is backed with huge pages: start a VM up to the guest > userspace, either reboot it or unplug a vCPU, quit QEMU. The leak is > observed by comparing the value of HugePages_Free in /proc/meminfo before > and after the VM is run. > > Note that the EQ reset path seems to be calling put_page() but this is > done after xive_native_configure_queue() which clears the qpage field > in the XIVE queue structure, ie. the put_page() block is a nop and the > previous page pointer was just overwritten anyway. In the other case of > configuring a new EQ page, nothing seems to be done to release the old > one. Yes. Nice catch. I think we should try to fix the problem differently. The routine xive_native_configure_queue() is only suited for XIVE drivers doing their own EQ page allocation: Linux PowerNV and the KVM XICS-over-XIVE device. The KVM XIVE device acts as a proxy for the guest OS doing the allocation and it has different needs. Having a specific xive_native_configure_queue() for the KVM XIVE device seems overkill. May be, we could introduce a helper routine in KVM XIVE device calling xive_native_configure_queue() and handling the page reference how it should be ? That is to drop the previous page reference in case of a change on q->qpage. Also, we should try to preserve the previous setting until the whole configuration is in place. That seems possible up to the call to xive_native_configure_queue(). If kvmppc_xive_attach_escalation() fails I think it is too late, as the HW has been configured by xive_native_configure_queue(), and we should just cleanup everything. Thanks, C. > Fix both cases by always calling put_page() on the existing EQ page in > kvmppc_xive_native_set_queue_config(). This is a seemless change for the > EQ reset case. However this causes xive_native_configure_queue() to be > called twice for the new EQ page case: one time to reset the EQ and another > time to configure the new page. This is needed because we cannot release > the EQ page before calling xive_native_configure_queue() since it may still > be used by the HW. We cannot modify xive_native_configure_queue() to drop > the reference either because this function is also used by the XICS-on-XIVE > device which requires free_pages() instead of put_page(). This isn't a big > deal anyway since H_INT_SET_QUEUE_CONFIG isn't a hot path. > > Reported-by: Satheesh Rajendran > Cc: stable@vger.kernel.org # v5.2 > Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") > Signed-off-by: Greg Kurz > --- > arch/powerpc/kvm/book3s_xive_native.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c > index 34bd123fa024..8ab908d23dc2 100644 > --- a/arch/powerpc/kvm/book3s_xive_native.c > +++ b/arch/powerpc/kvm/book3s_xive_native.c > @@ -570,10 +570,12 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive, > __func__, server, priority, kvm_eq.flags, > kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex); > > - /* reset queue and disable queueing */ > - if (!kvm_eq.qshift) { > - q->guest_qaddr = 0; > - q->guest_qshift = 0; > + /* > + * Reset queue and disable queueing. It will be re-enabled > + * later on if the guest is configuring a new EQ page. > + */ > + if (q->guest_qshift) { > + page = virt_to_page(q->qpage); > > rc = xive_native_configure_queue(xc->vp_id, q, priority, > NULL, 0, true); > @@ -583,12 +585,13 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive, > return rc; > } > > - if (q->qpage) { > - put_page(virt_to_page(q->qpage)); > - q->qpage = NULL; > - } > + put_page(page); > > - return 0; > + if (!kvm_eq.qshift) { > + q->guest_qaddr = 0; > + q->guest_qshift = 0; > + return 0; > + } > } > > /* >