Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4875247imm; Sun, 22 Jul 2018 07:49:14 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdLs+VFkXYr9cNQCi+OeU9QIFWXSQcK5tPeCt2Jo/Bv0qr0C5vQl//028KlR0k53nwrWfxT X-Received: by 2002:a62:f206:: with SMTP id m6-v6mr9514833pfh.171.1532270954929; Sun, 22 Jul 2018 07:49:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532270954; cv=none; d=google.com; s=arc-20160816; b=TWcblg0+VHoSYv3U7l7qJriNJEA5ddUCu6uc6UcktrMDxI4z0nG2nJnlxqsJSujXRT QtxiQfyzfo1E2q/mPZoBwa2fHZYIQiiTPkFM86Q5tR7ZEivyh4Fw1LXqmX08Gad2VtWx pGhdLLIJ7+wCMTpe4tT9YvxP8AlFj+wtxowIikKrlsF+ecdVNugJmfaRFm2PbIn8Ed/d bXgsQFFERJzzx9ej9maNNu0Zb1QEvidc7sKxc5RlNIV8Y8C/Tw/o/5kzFqr4QHkrBRLZ pQRSyogaZTGvw/gWSVDXX0Rny7OdPgfl7i32MxUe7vrZEQZQbDcc2+mjfe7iMy67av2F CMhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=EO5xlMxCB8iCFU9t4QgqqSZlQxRQtcmjWBpZ3Dj7dJ8=; b=Fa8k9jI/rs/xF2YrkxrckCmWoTmoLHb1V8zAn84kn6BbJGRZVhJO9Xez6h9c3FLAPa I49mrLndFt/jQ2giBYNpEWXp2f9kqT5RFBIdbmSQBPPrPG4Ha8XkKMoWCxg3FgPHRiWG JESvlmxL0kbe9PaHPHOcWWc+EcCiTUwn+QQf+JhIajuW+MuPYP7ehSWALksjUif7BW/v ZMqMNd7nxeSmsF5pufxH8EknJZsitcWVdycChX2CDi4coSNxjD3PkbWP0+au1+QfUo/H W+Q2KsOFFb3nv136DIxjd/cMgt9wMgZNkKsCLrJ23Rq2i91mCPWf8r4cNjv3Fxh2Eiyj VgTw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v11-v6si6272214pgl.27.2018.07.22.07.48.56; Sun, 22 Jul 2018 07:49:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728738AbeGVPpC (ORCPT + 99 others); Sun, 22 Jul 2018 11:45:02 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41742 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728408AbeGVPpC (ORCPT ); Sun, 22 Jul 2018 11:45:02 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E749040241C4; Sun, 22 Jul 2018 14:48:05 +0000 (UTC) Received: from redhat.com (ovpn-120-237.rdu2.redhat.com [10.10.120.237]) by smtp.corp.redhat.com (Postfix) with SMTP id A6D3E2026D66; Sun, 22 Jul 2018 14:48:02 +0000 (UTC) Date: Sun, 22 Jul 2018 17:48:01 +0300 From: "Michael S. Tsirkin" To: Wei Wang Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mhocko@kernel.org, akpm@linux-foundation.org, torvalds@linux-foundation.org, pbonzini@redhat.com, liliang.opensource@gmail.com, yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com, peterx@redhat.com Subject: Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker Message-ID: <20180722174125-mutt-send-email-mst@kernel.org> References: <1532075585-39067-1-git-send-email-wei.w.wang@intel.com> <1532075585-39067-3-git-send-email-wei.w.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1532075585-39067-3-git-send-email-wei.w.wang@intel.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sun, 22 Jul 2018 14:48:06 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sun, 22 Jul 2018 14:48:06 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mst@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 20, 2018 at 04:33:02PM +0800, Wei Wang wrote: > The OOM notifier is getting deprecated to use for the reasons mentioned > here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314 > > This patch replaces the virtio-balloon oom notifier with a shrinker > to release balloon pages on memory pressure. > > In addition, the bug in the replaced virtballoon_oom_notify that only > VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed > though the user has specified more than that number is fixed in the > shrinker_scan function. > > Signed-off-by: Wei Wang > Cc: Michael S. Tsirkin > Cc: Michal Hocko > Cc: Andrew Morton > Cc: Linus Torvalds > --- > drivers/virtio/virtio_balloon.c | 113 +++++++++++++++++++++++----------------- > 1 file changed, 65 insertions(+), 48 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 9356a1a..c6fd406 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -27,7 +27,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -40,12 +39,12 @@ > */ > #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) > #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 > -#define OOM_VBALLOON_DEFAULT_PAGES 256 > +#define DEFAULT_BALLOON_PAGES_TO_SHRINK 256 > #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > -static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > -module_param(oom_pages, int, S_IRUSR | S_IWUSR); > -MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > +static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK; > +module_param(balloon_pages_to_shrink, ulong, 0600); > +MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure"); > > #ifdef CONFIG_BALLOON_COMPACTION > static struct vfsmount *balloon_mnt; > @@ -86,8 +85,8 @@ struct virtio_balloon { > /* Memory statistics */ > struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; > > - /* To register callback in oom notifier call chain */ > - struct notifier_block nb; > + /* To register a shrinker to shrink memory upon memory pressure */ > + struct shrinker shrinker; > }; > > static struct virtio_device_id id_table[] = { > @@ -365,38 +364,6 @@ static void update_balloon_size(struct virtio_balloon *vb) > &actual); > } > > -/* > - * virtballoon_oom_notify - release pages when system is under severe > - * memory pressure (called from out_of_memory()) > - * @self : notifier block struct > - * @dummy: not used > - * @parm : returned - number of freed pages > - * > - * The balancing of memory by use of the virtio balloon should not cause > - * the termination of processes while there are pages in the balloon. > - * If virtio balloon manages to release some memory, it will make the > - * system return and retry the allocation that forced the OOM killer > - * to run. > - */ > -static int virtballoon_oom_notify(struct notifier_block *self, > - unsigned long dummy, void *parm) > -{ > - struct virtio_balloon *vb; > - unsigned long *freed; > - unsigned num_freed_pages; > - > - vb = container_of(self, struct virtio_balloon, nb); > - if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > - return NOTIFY_OK; > - > - freed = parm; > - num_freed_pages = leak_balloon(vb, oom_pages); > - update_balloon_size(vb); > - *freed += num_freed_pages; > - > - return NOTIFY_OK; > -} > - > static void update_balloon_stats_func(struct work_struct *work) > { > struct virtio_balloon *vb; > @@ -548,6 +515,61 @@ static struct file_system_type balloon_fs = { > > #endif /* CONFIG_BALLOON_COMPACTION */ > > +static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker, > + struct shrink_control *sc) > +{ > + unsigned long pages_to_free = balloon_pages_to_shrink, > + pages_freed = 0; > + struct virtio_balloon *vb = container_of(shrinker, > + struct virtio_balloon, shrinker); > + > + /* > + * One invocation of leak_balloon can deflate at most > + * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it > + * multiple times to deflate pages till reaching > + * balloon_pages_to_shrink pages. > + */ > + while (vb->num_pages && pages_to_free) { > + pages_to_free = balloon_pages_to_shrink - pages_freed; > + pages_freed += leak_balloon(vb, pages_to_free); > + } > + update_balloon_size(vb); Are you sure that this is never called if count returned 0? > + > + return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE; > +} > + > +static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker, > + struct shrink_control *sc) > +{ > + struct virtio_balloon *vb = container_of(shrinker, > + struct virtio_balloon, shrinker); > + > + /* > + * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the > + * case when shrinker needs to be invoked to relieve memory pressure. > + */ > + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > + return 0; So why not skip notifier registration when deflate on oom is clear? > + > + return min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) / > + VIRTIO_BALLOON_PAGES_PER_PAGE; > +} > + > +static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb) > +{ > + unregister_shrinker(&vb->shrinker); > +} > + > +static int virtio_balloon_register_shrinker(struct virtio_balloon *vb) > +{ > + vb->shrinker.scan_objects = virtio_balloon_shrinker_scan; > + vb->shrinker.count_objects = virtio_balloon_shrinker_count; > + vb->shrinker.batch = 0; > + vb->shrinker.seeks = DEFAULT_SEEKS; > + > + return register_shrinker(&vb->shrinker); > +} > + > static int virtballoon_probe(struct virtio_device *vdev) > { > struct virtio_balloon *vb; > @@ -580,17 +602,10 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (err) > goto out_free_vb; > > - vb->nb.notifier_call = virtballoon_oom_notify; > - vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; > - err = register_oom_notifier(&vb->nb); > - if (err < 0) > - goto out_del_vqs; > - > #ifdef CONFIG_BALLOON_COMPACTION > balloon_mnt = kern_mount(&balloon_fs); > if (IS_ERR(balloon_mnt)) { > err = PTR_ERR(balloon_mnt); > - unregister_oom_notifier(&vb->nb); > goto out_del_vqs; > } > > @@ -599,12 +614,14 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (IS_ERR(vb->vb_dev_info.inode)) { > err = PTR_ERR(vb->vb_dev_info.inode); > kern_unmount(balloon_mnt); > - unregister_oom_notifier(&vb->nb); > vb->vb_dev_info.inode = NULL; > goto out_del_vqs; > } > vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops; > #endif > + err = virtio_balloon_register_shrinker(vb); > + if (err) > + goto out_del_vqs; > So we can get scans before device is ready. Leak will fail then. Why not register later after device is ready? > virtio_device_ready(vdev); > > @@ -637,7 +654,7 @@ static void virtballoon_remove(struct virtio_device *vdev) > { > struct virtio_balloon *vb = vdev->priv; > > - unregister_oom_notifier(&vb->nb); > + virtio_balloon_unregister_shrinker(vb); > > spin_lock_irq(&vb->stop_update_lock); > vb->stop_update = true; > -- > 2.7.4