Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp818272ybl; Wed, 11 Dec 2019 08:00:04 -0800 (PST) X-Google-Smtp-Source: APXvYqzP12Hbnu8u5rQNdC9qBHEbk4WazqC+RrkSjJUqexd5v3z8TNRaDapEjxsju5OeY/oL4MsF X-Received: by 2002:aca:dc45:: with SMTP id t66mr3437104oig.39.1576080004417; Wed, 11 Dec 2019 08:00:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576080004; cv=none; d=google.com; s=arc-20160816; b=RgNlr5gWDQzG2LVED/dTY667ymvd5TWb3Vx5tU6ErEKn02YYtxb2rLfsHHzIG5Ly/e kigcGVlKYu00KDn5OQd+ozeKM5cmFqsJiF7kIbKv6aK8By4b7auSSiRmZgvu+csDmzle H1WiGTOwSp1pEG2AKDGD/MWKlb6Yhqc/3VJ22KcG5LLp+XGtoovGSDAeYc0Mr/6SE5Zg h0QrDfaEyYpAE7Btsg8LqBjkrxWk1N2tfDwoTBlQOwdcEo6H3HD/PPI8qO2GOPW7P/OP HFEJVj5F4QlGzRL0+WnPMkqhkXzkug8uvX/RxtqvcbNTMe8qBcBVYdlrgW22DvEl9W2o yyjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=zYQVPB6eHeNMF7OolxAnivt5wgsrcRob7eyOaILHJVI=; b=W6lFrDag3ARB5nUmve5qJu2d75vhkaeFhXVQHdurtuQjfReiMwMiqvboBUmHGUOgWp FpLwnB43aBZoTV2h2ohcbBLkF2jbYebNRNhYQgbp1KzEIDDcOe8iyhozebCRAr8tG9g1 Re6hMFRJDwZs9SiR+E4L41tAoHeSp8CkmbKlb3Ydf6JMIQhTvltrHi8KKea2DYQnt5Vz PC4AAemNWj/jdbWcWNF+THn/E4B3iQeMzQDprVLfakYdoE2JnF0ECytBupibwABWJ9SU UmjCRApGwQAcwW35VJNsT8UnUlTxTc89BOjU+1HWvRTvKNwp+wCgjv7lxp9z1oqbm6Sf oB7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=UDxN7ZUq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n11si1288875otq.112.2019.12.11.07.59.52; Wed, 11 Dec 2019 08:00:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=UDxN7ZUq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732091AbfLKP6d (ORCPT + 99 others); Wed, 11 Dec 2019 10:58:33 -0500 Received: from mail.kernel.org ([198.145.29.99]:39636 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730883AbfLKPOV (ORCPT ); Wed, 11 Dec 2019 10:14:21 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0471E24654; Wed, 11 Dec 2019 15:14:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1576077260; bh=QZ0cs6rH6n1Q7qsmq1TmRkgiIWevSyuM7/uNSv/U0Bs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UDxN7ZUqFzBmETaiY8/fjm5B1f3YkVBpG7T1k68iin3FYP7bCMuFlOEhJ3IazxosQ yqqMsNZTXERdexZ+49+6AJRZS9FXYTtSctsaFwUWDMVNSqhH797LmXlJN7r8Jeew8c yuhAXXpJnWRrXYvcihayyhDsFe84odeYP4h2cR48= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Satheesh Rajendran , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Greg Kurz , Lijun Pan , Paul Mackerras Subject: [PATCH 5.3 076/105] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Date: Wed, 11 Dec 2019 16:06:05 +0100 Message-Id: <20191211150255.164740236@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191211150221.153659747@linuxfoundation.org> References: <20191211150221.153659747@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Greg Kurz commit 31a88c82b466d2f31a44e21c479f45b4732ccfd0 upstream. The EQ page is allocated by the guest and then passed to the hypervisor with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page before handing it over to the HW. This reference is dropped either when the guest issues the H_INT_RESET hcall or when the KVM device is released. But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times, either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot). In both cases the existing EQ page reference is leaked because we simply overwrite it in the XIVE queue structure without calling put_page(). This is especially visible when the guest memory is backed with huge pages: start a VM up to the guest userspace, either reboot it or unplug a vCPU, quit QEMU. The leak is observed by comparing the value of HugePages_Free in /proc/meminfo before and after the VM is run. Ideally we'd want the XIVE code to handle the EQ page de-allocation at the platform level. This isn't the case right now because the various XIVE drivers have different allocation needs. It could maybe worth introducing hooks for this purpose instead of exposing XIVE internals to the drivers, but this is certainly a huge work to be done later. In the meantime, for easier backport, fix both vCPU unplug and guest reboot leaks by introducing a wrapper around xive_native_configure_queue() that does the necessary cleanup. Reported-by: Satheesh Rajendran Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: Cédric Le Goater Signed-off-by: Greg Kurz Tested-by: Lijun Pan Signed-off-by: Paul Mackerras Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kvm/book3s_xive_native.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_q } } +static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q, + u8 prio, __be32 *qpage, + u32 order, bool can_escalate) +{ + int rc; + __be32 *qpage_prev = q->qpage; + + rc = xive_native_configure_queue(vp_id, q, prio, qpage, order, + can_escalate); + if (rc) + return rc; + + if (qpage_prev) + put_page(virt_to_page(qpage_prev)); + + return rc; +} + void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; @@ -582,19 +600,14 @@ static int kvmppc_xive_native_set_queue_ q->guest_qaddr = 0; q->guest_qshift = 0; - rc = xive_native_configure_queue(xc->vp_id, q, priority, - NULL, 0, true); + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority, + NULL, 0, true); if (rc) { pr_err("Failed to reset queue %d for VCPU %d: %d\n", priority, xc->server_num, rc); return rc; } - if (q->qpage) { - put_page(virt_to_page(q->qpage)); - q->qpage = NULL; - } - return 0; } @@ -653,8 +666,8 @@ static int kvmppc_xive_native_set_queue_ * OPAL level because the use of END ESBs is not supported by * Linux. */ - rc = xive_native_configure_queue(xc->vp_id, q, priority, - (__be32 *) qaddr, kvm_eq.qshift, true); + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority, + (__be32 *) qaddr, kvm_eq.qshift, true); if (rc) { pr_err("Failed to configure queue %d for VCPU %d: %d\n", priority, xc->server_num, rc);