Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp5215610ybl; Tue, 27 Aug 2019 00:56:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJWPM9CHS03fcpWcYd/5kZ2wKLj4QjKvEv8R91sB6pgedCsICHB0mYc6QOGrrEowlGNfXY X-Received: by 2002:a17:90a:c08f:: with SMTP id o15mr24536525pjs.31.1566892599580; Tue, 27 Aug 2019 00:56:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566892599; cv=none; d=google.com; s=arc-20160816; b=W73q+hsrW6Ng1CgSBqNQ1hA26gRggTMA8cdzNg6B4S+SkotsuPS5Mf1z0WunDl9v1x CHjO9+k89dPmWWv+agSGVZbHNPb0GFlzbZSias3JSb0l4oU22a26hIgcC0UH9q4Au3nr mlTurcko8pbYVKKb93yyYvM2tmcqvy+HM8vefD6MKOC49CwlzEnvhtC1B69+KPPxcqxs LHQyxp4SnGNNPgqKtiE6/6qFOdH0fHBCZuLU47a7CN2dr4zUY0MI4nW1BrSqAfSnKm7o FHlPEdpYh1egj+k3YMbW3ZG3xXLZuE45EdVVOlwH1NZcUfcOIjnkqzdfQJiJS4KKYqWP QRyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=MhlOZIn0KXtRRx4awyrqsBBGO2sXAEu+eF2jSbIm6hA=; b=ZSPAuot+1j/j78gOiQr1W2nMsPPZmnwMalBqjiENpy+1Ry8CrNp47ITWgN1iSMZp3X ssRJLysv6Vdsfd6y6j7JMlWdX9Ys5sPTvqUQ2hdFopMoPqm7/l/Xeqf+Um5jKEXQ4gQu jZONn4OsrOMOwsyKNvf8CBiE5fIpEURPrC8RGxeylQUTm+m4nLEiMT2tpr1jtvX+pjyZ f7iiPo4qNAUVSnFipZhamz7iCbHH0C7vy7PUJETbqJtgDhCv4wXRrgTlHW8SJUub61Mo XB5R9FDEDz+LyzQRBetppJyRVGduKyAgkx3lTqbuJazixysxQcljdalcTPZCIJTxEgrW r4nQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="MT/V53NL"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e1si12056504plt.276.2019.08.27.00.56.24; Tue, 27 Aug 2019 00:56:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="MT/V53NL"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730101AbfH0Hyi (ORCPT + 99 others); Tue, 27 Aug 2019 03:54:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:46188 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730055AbfH0Hyd (ORCPT ); Tue, 27 Aug 2019 03:54:33 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8916F206BF; Tue, 27 Aug 2019 07:54:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566892472; bh=b+d6dTt/l3HLoNKPCYh1f+hLtLLNeYCSTSed6qc0nVg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MT/V53NLJzMcmZ0cYvMuBRSDBzgADdoYiqyQsdXaMpPFX1gm1UuNYuOWHZRcWSajB adzp1GpiNcsvVeyVJZAagfdz0rehVh5ce7yq1YrhqtNx5/T8r3fFxHkb//bCbiJ99e BRbF6E5nARsP0XJTbILMIz0CFxUPW2A4jqE8UEPA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Henry Burns , Sergey Senozhatsky , Henry Burns , Minchan Kim , Shakeel Butt , Jonathan Adams , Andrew Morton , Linus Torvalds Subject: [PATCH 4.14 59/62] mm/zsmalloc.c: fix race condition in zs_destroy_pool Date: Tue, 27 Aug 2019 09:51:04 +0200 Message-Id: <20190827072703.875523743@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190827072659.803647352@linuxfoundation.org> References: <20190827072659.803647352@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Henry Burns commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream. In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time. Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work *after* zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it. Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward. In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction). Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns Reviewed-by: Sergey Senozhatsky Cc: Henry Burns Cc: Minchan Kim Cc: Shakeel Butt Cc: Jonathan Adams Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/zsmalloc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 2 deletions(-) --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #define ZSPAGE_MAGIC 0x58 @@ -267,6 +268,10 @@ struct zs_pool { #ifdef CONFIG_COMPACTION struct inode *inode; struct work_struct free_work; + /* A wait queue for when migration races with async_free_zspage() */ + struct wait_queue_head migration_wait; + atomic_long_t isolated_pages; + bool destroying; #endif }; @@ -1890,6 +1895,19 @@ static void putback_zspage_deferred(stru } +static inline void zs_pool_dec_isolated(struct zs_pool *pool) +{ + VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0); + atomic_long_dec(&pool->isolated_pages); + /* + * There's no possibility of racing, since wait_for_isolated_drain() + * checks the isolated count under &class->lock after enqueuing + * on migration_wait. + */ + if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying) + wake_up_all(&pool->migration_wait); +} + static void replace_sub_page(struct size_class *class, struct zspage *zspage, struct page *newpage, struct page *oldpage) { @@ -1959,6 +1977,7 @@ bool zs_page_isolate(struct page *page, */ if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) { get_zspage_mapping(zspage, &class_idx, &fullness); + atomic_long_inc(&pool->isolated_pages); remove_zspage(class, zspage, fullness); } @@ -2058,8 +2077,16 @@ int zs_page_migrate(struct address_space * Page migration is done so let's putback isolated zspage to * the list if @page is final isolated subpage in the zspage. */ - if (!is_zspage_isolated(zspage)) + if (!is_zspage_isolated(zspage)) { + /* + * We cannot race with zs_destroy_pool() here because we wait + * for isolation to hit zero before we start destroying. + * Also, we ensure that everyone can see pool->destroying before + * we start waiting. + */ putback_zspage_deferred(pool, class, zspage); + zs_pool_dec_isolated(pool); + } reset_page(page); put_page(page); @@ -2110,8 +2137,8 @@ void zs_page_putback(struct page *page) * so let's defer. */ putback_zspage_deferred(pool, class, zspage); + zs_pool_dec_isolated(pool); } - spin_unlock(&class->lock); } @@ -2134,8 +2161,36 @@ static int zs_register_migration(struct return 0; } +static bool pool_isolated_are_drained(struct zs_pool *pool) +{ + return atomic_long_read(&pool->isolated_pages) == 0; +} + +/* Function for resolving migration */ +static void wait_for_isolated_drain(struct zs_pool *pool) +{ + + /* + * We're in the process of destroying the pool, so there are no + * active allocations. zs_page_isolate() fails for completely free + * zspages, so we need only wait for the zs_pool's isolated + * count to hit zero. + */ + wait_event(pool->migration_wait, + pool_isolated_are_drained(pool)); +} + static void zs_unregister_migration(struct zs_pool *pool) { + pool->destroying = true; + /* + * We need a memory barrier here to ensure global visibility of + * pool->destroying. Thus pool->isolated pages will either be 0 in which + * case we don't care, or it will be > 0 and pool->destroying will + * ensure that we wake up once isolation hits 0. + */ + smp_mb(); + wait_for_isolated_drain(pool); /* This can block */ flush_work(&pool->free_work); iput(pool->inode); } @@ -2376,6 +2431,8 @@ struct zs_pool *zs_create_pool(const cha if (!pool->name) goto err; + init_waitqueue_head(&pool->migration_wait); + if (create_cache(pool)) goto err;