Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp5226903ybl; Tue, 27 Aug 2019 01:07:56 -0700 (PDT) X-Google-Smtp-Source: APXvYqxcSGxCIgjgreGCWppGYhB+u490Z9AIrOMygfEp318erleCMe57PvY1m6Rfbh58VmAaXkwU X-Received: by 2002:a65:4808:: with SMTP id h8mr19951141pgs.22.1566893276325; Tue, 27 Aug 2019 01:07:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566893276; cv=none; d=google.com; s=arc-20160816; b=uCuGykLqkRgInepv0YQyKit370n+sELmCez/zsou5EH7rEcgEXgUqA51ZhoR9AJUfr iUIL9fKRyTdWuyc2trtNePqhlDMqALTUCPwjF41QphG6z8A0z04hm+38sO9mCogRLKy1 wtrP8OgXzieP53Hd4mkKV33QiK3eRljoJVH1Zy6VrsvjyIk28b0jKRuzcxoK3XiWNWMZ OLbviiyUFqErZGBMcx7/naOdxdbGX3nRCI4mpAitHKJh/UyQikEQPjZ2gD34kK2NcWY2 QkK4xddg2MRU6uWYUmF7WWR5Nby34OXLMgUZ79LBG2HDh2jrFYYQ7xli2Oauc+cE2QTO PbaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=HZKwbUpwv/cNVTzZpJ2qZZg5i/6+Xt/CG70N7YXIlLU=; b=m5fhoovHlq0I+XrHrSUVy0NsZWIB5ZVtP2Z/33KNsQMhpumsJdr7dZRSFSykR6Qy1/ hjndXHOkQZWrba1CuPPVwpf8dbZbi/UPWbTKwXbdV+QC+5WMnbvFzgRw3tHAVmKRIPLj zl0fO/BKFWLsACGBnvXAVWV6fVLlHoF3nBv6q6XmZsU+0VtgdyTcK0IGwmCZB6+/AJ3/ AuHpXvZo/3Ttlk1/613gvOzzdJEqwsVi+w528MH4d6BAg6W2lJsWNVTywckEWPnYoIA8 mGyIhuKzXAdYJfbPTIr0StZulCmpdzsHLkRKzPWKMOrUNX+GR0vYr+Z5EMV4wDJQyK93 2PDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rOPv7XqD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a19si11169480pgw.234.2019.08.27.01.07.40; Tue, 27 Aug 2019 01:07:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rOPv7XqD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732202AbfH0IGg (ORCPT + 99 others); Tue, 27 Aug 2019 04:06:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:36362 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732242AbfH0IGW (ORCPT ); Tue, 27 Aug 2019 04:06:22 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AB1A3206BF; Tue, 27 Aug 2019 08:06:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566893181; bh=8HOcoIqM+xupIrDU6PtPTFta0MQBFueMPaSNpe1xc00=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rOPv7XqDnEE/0y1mD7qSQCFBEGKj7/DGd1yxkCUDi/RCcPjjtj3SoXSJQoie0B6fW ryawj27rInjiifKxEauGXGGijZjVpVlC+vaTb6A7HTydOQmM3agCKsIg7GgASoLFQ9 wpKTvw1rp0z5WuVGpJUnkTvHiXAHxE03ykdA+DtE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Henry Burns , Vitaly Wool , Shakeel Butt , Jonathan Adams , Henry Burns , Andrew Morton , Linus Torvalds Subject: [PATCH 5.2 146/162] mm/z3fold.c: fix race between migration and destruction Date: Tue, 27 Aug 2019 09:51:14 +0200 Message-Id: <20190827072743.798323282@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190827072738.093683223@linuxfoundation.org> References: <20190827072738.093683223@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Henry Burns commit d776aaa9895eb6eb770908e899cb7f5bd5025b3c upstream. In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq). However, we have no guarantee that migration isn't happening in the background at that time. Migration directly calls queue_work_on(pool->compact_wq), if destruction wins that race we are using a destroyed workqueue. Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com Signed-off-by: Henry Burns Cc: Vitaly Wool Cc: Shakeel Butt Cc: Jonathan Adams Cc: Henry Burns Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/z3fold.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -41,6 +41,7 @@ #include #include #include +#include #include /* @@ -144,6 +145,8 @@ struct z3fold_header { * @release_wq: workqueue for safe page release * @work: work_struct for safe page release * @inode: inode for z3fold pseudo filesystem + * @destroying: bool to stop migration once we start destruction + * @isolated: int to count the number of pages currently in isolation * * This structure is allocated at pool creation time and maintains metadata * pertaining to a particular z3fold pool. @@ -162,8 +165,11 @@ struct z3fold_pool { const struct zpool_ops *zpool_ops; struct workqueue_struct *compact_wq; struct workqueue_struct *release_wq; + struct wait_queue_head isolate_wait; struct work_struct work; struct inode *inode; + bool destroying; + int isolated; }; /* @@ -771,6 +777,7 @@ static struct z3fold_pool *z3fold_create goto out_c; spin_lock_init(&pool->lock); spin_lock_init(&pool->stale_lock); + init_waitqueue_head(&pool->isolate_wait); pool->unbuddied = __alloc_percpu(sizeof(struct list_head)*NCHUNKS, 2); if (!pool->unbuddied) goto out_pool; @@ -810,6 +817,15 @@ out: return NULL; } +static bool pool_isolated_are_drained(struct z3fold_pool *pool) +{ + bool ret; + + spin_lock(&pool->lock); + ret = pool->isolated == 0; + spin_unlock(&pool->lock); + return ret; +} /** * z3fold_destroy_pool() - destroys an existing z3fold pool * @pool: the z3fold pool to be destroyed @@ -819,6 +835,22 @@ out: static void z3fold_destroy_pool(struct z3fold_pool *pool) { kmem_cache_destroy(pool->c_handle); + /* + * We set pool-> destroying under lock to ensure that + * z3fold_page_isolate() sees any changes to destroying. This way we + * avoid the need for any memory barriers. + */ + + spin_lock(&pool->lock); + pool->destroying = true; + spin_unlock(&pool->lock); + + /* + * We need to ensure that no pages are being migrated while we destroy + * these workqueues, as migration can queue work on either of the + * workqueues. + */ + wait_event(pool->isolate_wait, !pool_isolated_are_drained(pool)); /* * We need to destroy pool->compact_wq before pool->release_wq, @@ -1309,6 +1341,28 @@ static u64 z3fold_get_pool_size(struct z return atomic64_read(&pool->pages_nr); } +/* + * z3fold_dec_isolated() expects to be called while pool->lock is held. + */ +static void z3fold_dec_isolated(struct z3fold_pool *pool) +{ + assert_spin_locked(&pool->lock); + VM_BUG_ON(pool->isolated <= 0); + pool->isolated--; + + /* + * If we have no more isolated pages, we have to see if + * z3fold_destroy_pool() is waiting for a signal. + */ + if (pool->isolated == 0 && waitqueue_active(&pool->isolate_wait)) + wake_up_all(&pool->isolate_wait); +} + +static void z3fold_inc_isolated(struct z3fold_pool *pool) +{ + pool->isolated++; +} + static bool z3fold_page_isolate(struct page *page, isolate_mode_t mode) { struct z3fold_header *zhdr; @@ -1335,6 +1389,33 @@ static bool z3fold_page_isolate(struct p spin_lock(&pool->lock); if (!list_empty(&page->lru)) list_del(&page->lru); + /* + * We need to check for destruction while holding pool->lock, as + * otherwise destruction could see 0 isolated pages, and + * proceed. + */ + if (unlikely(pool->destroying)) { + spin_unlock(&pool->lock); + /* + * If this page isn't stale, somebody else holds a + * reference to it. Let't drop our refcount so that they + * can call the release logic. + */ + if (unlikely(kref_put(&zhdr->refcount, + release_z3fold_page_locked))) { + /* + * If we get here we have kref problems, so we + * should freak out. + */ + WARN(1, "Z3fold is experiencing kref problems\n"); + return false; + } + z3fold_page_unlock(zhdr); + return false; + } + + + z3fold_inc_isolated(pool); spin_unlock(&pool->lock); z3fold_page_unlock(zhdr); return true; @@ -1408,6 +1489,10 @@ static int z3fold_page_migrate(struct ad queue_work_on(new_zhdr->cpu, pool->compact_wq, &new_zhdr->work); + spin_lock(&pool->lock); + z3fold_dec_isolated(pool); + spin_unlock(&pool->lock); + page_mapcount_reset(page); unlock_page(page); put_page(page); @@ -1428,10 +1513,14 @@ static void z3fold_page_putback(struct p INIT_LIST_HEAD(&page->lru); if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { atomic64_dec(&pool->pages_nr); + spin_lock(&pool->lock); + z3fold_dec_isolated(pool); + spin_unlock(&pool->lock); return; } spin_lock(&pool->lock); list_add(&page->lru, &pool->lru); + z3fold_dec_isolated(pool); spin_unlock(&pool->lock); z3fold_page_unlock(zhdr); }