Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp465763ybv; Wed, 5 Feb 2020 08:36:27 -0800 (PST) X-Google-Smtp-Source: APXvYqwDWJNQLSusJr50bI6t3TaalkRrrk73O5Ku4HfXCADQbPo+AciQiSt3AnFF9XmO3sMkGPSP X-Received: by 2002:aca:4789:: with SMTP id u131mr3455005oia.43.1580920587822; Wed, 05 Feb 2020 08:36:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580920587; cv=none; d=google.com; s=arc-20160816; b=IoSEVXXJjKJFP2SFXnOKYCPEHY/NusSOPfi2XUQZU1DArO7G2RYfZHEaClOSMrz0yC oXqhUpEEkMjh9b8o0OL1X/NFKo4D7dME29+Q2w5NvE197VZGyFncnLNqddYXvaSKXdyN ViplYysqw2OYqdBcR0idNvr8NdVr6yxI44ncbql7rd55apE40M6VIrp22EdH10SSpOVk injs7UFqazcXFMYwkWNw8FSv1ZroaiyfUUf4IL/fu4+dCmliAM1n8GmLiThHuNCE0O6f R8jH/AmcM00swjtPOC9bx/zRTqgCsIHfsOOvYArgTUHiXvNaiy2sIJM2rLWV0jVBNVHI vXSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kWKcGYmxntiUk5Ko4hidMpCo0wHxS/tQV8/X20Zaihs=; b=XxCqqUAg7gfY35TS6EhXm8tskYcb8V8HE8ocZeeZ+NQVc3m1HD7r+1RF9gAW4vJ1AQ UxK8/e5lBVjg6GCnL71bM+eKfdYHdHHfDkH18ftnBpDR6FnhGGH9WyKOokpvgAPhEiTE +fBtftcshvi4SVRGtj0ytMoluEBbMTSFIaaD4mEPs62eVjFW0Pb38amozqocu27hqjNm ygKK2vWJr73LuGYtFAnxbNwsa1lS8N279DsYRGs1gJN5DZthQJ8hRSDVuxU8QamXgPzL Bm6NEKch787P6XQ68+Ahx5Tbq6ReFni76mmm6pKXesWNJ7G4vl4KqEJXc1H3TzRV39Ay ah0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GKmXoWEt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m26si15547716otk.305.2020.02.05.08.36.15; Wed, 05 Feb 2020 08:36:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GKmXoWEt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727496AbgBEQei (ORCPT + 99 others); Wed, 5 Feb 2020 11:34:38 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:43243 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726359AbgBEQeh (ORCPT ); Wed, 5 Feb 2020 11:34:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580920476; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kWKcGYmxntiUk5Ko4hidMpCo0wHxS/tQV8/X20Zaihs=; b=GKmXoWEtplcbWOjnEQmLlOfv+r0eKMoE7+VTHMzAsLcVfHq7nCgiEOXHyEcZaoTlNAALJG ByG277VZY08uaueBOCXWyEkX1sIroMNzIKnMBcTLYZfvCnDf5pFNvb3Mr9q7VaJ6axlvw/ lJ268nlf1++gLx0fATK6Captb8uoEe0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-137-UfkRunrjPGm1V6VTXpzUkQ-1; Wed, 05 Feb 2020 11:34:19 -0500 X-MC-Unique: UfkRunrjPGm1V6VTXpzUkQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EDF70802B8C; Wed, 5 Feb 2020 16:34:17 +0000 (UTC) Received: from t480s.redhat.com (ovpn-116-217.ams2.redhat.com [10.36.116.217]) by smtp.corp.redhat.com (Postfix) with ESMTP id BFC3B1001B05; Wed, 5 Feb 2020 16:34:15 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Tyler Sanderson , "Michael S . Tsirkin" , Wei Wang , Alexander Duyck , David Rientjes , Nadav Amit , Michal Hocko Subject: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM Date: Wed, 5 Feb 2020 17:34:02 +0100 Message-Id: <20200205163402.42627-4-david@redhat.com> In-Reply-To: <20200205163402.42627-1-david@redhat.com> References: <20200205163402.42627-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker"= ) changed the behavior when deflation happens automatically. Instead of deflating when called by the OOM handler, the shrinker is used. However, the balloon is not simply some slab cache that should be shrunk when under memory pressure. The shrinker does not have a concept o= f priorities, so this behavior cannot be configured. There was a report that this results in undesired side effects when inflating the balloon to shrink the page cache. [1] "When inflating the balloon against page cache (i.e. no free memory remains) vmscan.c will both shrink page cache, but also invoke the shrinkers -- including the balloon's shrinker. So the balloon driver allocates memory which requires reclaim, vmscan gets this memory by shrinking the balloon, and then the driver adds the memory back to the balloon. Basically a busy no-op." The name "deflate on OOM" makes it pretty clear when deflation should happen - after other approaches to reclaim memory failed, not while reclaiming. This allows to minimize the footprint of a guest - memory will only be taken out of the balloon when really needed. Especially, a drop_slab() will result in the whole balloon getting deflated - undesired. While handling it via the OOM handler might not be perfect, it keeps existing behavior. If we want a different behavior, the= n we need a new feature bit and document it properly (although, there shoul= d be a clear use case and the intended effects should be well described). Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because this has no such side effects. Always register the shrinker with VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free pages that are still to be processed by the guest. The hypervisor takes care of identifying and resolving possible races between processing a hinting request and the guest reusing a page. In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker"), don't add a moodule parameter to configure the number of pages to deflate on OOM. Can be re-added if really needed. Also, pay attention that leak_balloon() returns the number of 4k pages - convert it properly in virtio_balloon_oom_notify(). Note1: using the OOM handler is frowned upon, but it really is what we need for this feature. Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we could actually skip sending deflation requests to our hypervisor, making the OOM path *very* simple. Besically freeing pages and updating the balloon. If the communication with the host ever becomes a problem on this call path. [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html Reported-by: Tyler Sanderson Cc: Michael S. Tsirkin Cc: Wei Wang Cc: Alexander Duyck Cc: David Rientjes Cc: Nadav Amit Cc: Michal Hocko Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_balloon.c | 107 +++++++++++++------------------- 1 file changed, 44 insertions(+), 63 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_ball= oon.c index 7e5d84caeb94..e7b18f556c5e 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -27,7 +28,9 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BAL= LOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +/* Maximum number of (4k) pages to deflate on OOM notifications. */ +#define VIRTIO_BALLOON_OOM_NR_PAGES 256 +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80 =20 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWAR= N | \ __GFP_NOMEMALLOC) @@ -112,8 +115,11 @@ struct virtio_balloon { /* Memory statistics */ struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; =20 - /* To register a shrinker to shrink memory upon memory pressure */ + /* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */ struct shrinker shrinker; + + /* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */ + struct notifier_block oom_nb; }; =20 static struct virtio_device_id id_table[] =3D { @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virti= o_balloon *vb, return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES; } =20 -static unsigned long leak_balloon_pages(struct virtio_balloon *vb, - unsigned long pages_to_free) -{ - return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) = / - VIRTIO_BALLOON_PAGES_PER_PAGE; -} - -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb, - unsigned long pages_to_free) -{ - unsigned long pages_freed =3D 0; - - /* - * One invocation of leak_balloon can deflate at most - * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it - * multiple times to deflate pages till reaching pages_to_free. - */ - while (vb->num_pages && pages_freed < pages_to_free) - pages_freed +=3D leak_balloon_pages(vb, - pages_to_free - pages_freed); - - update_balloon_size(vb); - - return pages_freed; -} - static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrin= ker, struct shrink_control *sc) { - unsigned long pages_to_free, pages_freed =3D 0; struct virtio_balloon *vb =3D container_of(shrinker, struct virtio_balloon, shrinker); =20 - pages_to_free =3D sc->nr_to_scan; - - if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) - pages_freed =3D shrink_free_pages(vb, pages_to_free); - - if (pages_freed >=3D pages_to_free) - return pages_freed; - - pages_freed +=3D shrink_balloon_pages(vb, pages_to_free - pages_freed); - - return pages_freed; + return shrink_free_pages(vb, sc->nr_to_scan); } =20 static unsigned long virtio_balloon_shrinker_count(struct shrinker *shri= nker, @@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(= struct shrinker *shrinker, { struct virtio_balloon *vb =3D container_of(shrinker, struct virtio_balloon, shrinker); - unsigned long count; - - count =3D vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE; - count +=3D vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES; =20 - return count; + return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES; } =20 -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb= ) +static int virtio_balloon_oom_notify(struct notifier_block *nb, + unsigned long dummy, void *parm) { - unregister_shrinker(&vb->shrinker); -} + struct virtio_balloon *vb =3D container_of(nb, + struct virtio_balloon, oom_nb); + unsigned long *freed =3D parm; =20 -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb) -{ - vb->shrinker.scan_objects =3D virtio_balloon_shrinker_scan; - vb->shrinker.count_objects =3D virtio_balloon_shrinker_count; - vb->shrinker.seeks =3D DEFAULT_SEEKS; + *freed +=3D leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) / + VIRTIO_BALLOON_PAGES_PER_PAGE; + update_balloon_size(vb); =20 - return register_shrinker(&vb->shrinker); + return NOTIFY_OK; } =20 static int virtballoon_probe(struct virtio_device *vdev) @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *= vdev) virtio_cwrite(vb->vdev, struct virtio_balloon_config, poison_val, &poison_val); } - } - /* - * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a - * shrinker needs to be registered to relieve memory pressure. - */ - if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) { - err =3D virtio_balloon_register_shrinker(vb); + + /* + * We're allowed to reuse any free pages, even if they are + * still to be processed by the host. + */ + vb->shrinker.scan_objects =3D virtio_balloon_shrinker_scan; + vb->shrinker.count_objects =3D virtio_balloon_shrinker_count; + vb->shrinker.seeks =3D DEFAULT_SEEKS; + err =3D register_shrinker(&vb->shrinker); if (err) goto out_del_balloon_wq; } + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) { + vb->oom_nb.notifier_call =3D virtio_balloon_oom_notify; + vb->oom_nb.priority =3D VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY; + err =3D register_oom_notifier(&vb->oom_nb); + if (err < 0) + goto out_unregister_shrinker; + } + virtio_device_ready(vdev); =20 if (towards_target(vb)) virtballoon_changed(vdev); return 0; =20 +out_unregister_shrinker: + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) + unregister_shrinker(&vb->shrinker); out_del_balloon_wq: if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) destroy_workqueue(vb->balloon_wq); @@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device = *vdev) { struct virtio_balloon *vb =3D vdev->priv; =20 - if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) - virtio_balloon_unregister_shrinker(vb); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) + unregister_oom_notifier(&vb->oom_nb); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) + unregister_shrinker(&vb->shrinker); + spin_lock_irq(&vb->stop_update_lock); vb->stop_update =3D true; spin_unlock_irq(&vb->stop_update_lock); --=20 2.24.1