Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp277192pxj; Wed, 9 Jun 2021 23:59:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwnj4DSbsGI8s3uGG9S/HhBvJuS+wynitD0IEf7gXFS5VVXXu3GQnYQX+r3lJkgWExCg0bJ X-Received: by 2002:aa7:d801:: with SMTP id v1mr3234357edq.349.1623308346892; Wed, 09 Jun 2021 23:59:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623308346; cv=none; d=google.com; s=arc-20160816; b=OoeV7byFU8BmGUmzVuvMxNmFBfpPUqKuayWsINO91/co9dShTNWNPvvrssQRX6U+4p ALdrFSqSg+jiY9D7gY1ApFF30zZOoR7TZKOHsbBEqDnh7nyFBd0eIR7gpGfU3n+PwAzY bvdRvq4gJlnKgfVOzS8se/g1oZUYn5cGwQtHz2P90EWES0v1B9e4bSrdvnz5o2AHVprV r/5Ej+kSorsDme0Egxl3ejiYMjSUnraebUa5T3XZV4d2CO9N4ewlAl8+fHWhsXtkj7HR 5oNFKxa5/tfdgwJQHDJ/V8MZF8/lACekKKDfXWhmfUyOE/NuAAXawSqnvQshBqIsjtgN yJmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2xI9DhwUbPIwsok2RGKHqBDnQZkdygy7sVBmHeixOo0=; b=y7yv5chX5FIblEwz7xqwRtEcuLPaEJA9w2B1MKhlFboUzx7FPSZho1dCs9dqgrRL37 BxQccgiBc5Qec5dhbkF2QcN1tGk2TMaHztAPJyhoJEwwNwvD0IZLZpWiG1ed9r+6KV6G ODkFB1yc7P219zBSCrDuqdAij15AcgXfdYUL/KtcNiO7g9XPdWyKEoOpThbIBDwqz26A x9aNi352t9WZhjbJJlSCgc20gM9kokFa7bGptNZZSdzNhvjStWtZ9SnhfVdjXfgtdWJi jK2S2r8ph6fr/sqDPrnRslVKOdZVwBb2+J7CTLFpWUbF00MNtUWa9iQn7gtLkzKG3Ho5 egSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XS1NRVgI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 6si1750085edx.302.2021.06.09.23.58.43; Wed, 09 Jun 2021 23:59:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XS1NRVgI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229993AbhFJG7U (ORCPT + 99 others); Thu, 10 Jun 2021 02:59:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:43258 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229823AbhFJG7U (ORCPT ); Thu, 10 Jun 2021 02:59:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623308244; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2xI9DhwUbPIwsok2RGKHqBDnQZkdygy7sVBmHeixOo0=; b=XS1NRVgI78bKMY/9v0VhbLBo7dESzPx++vlg71gFCbTk0MBKbxuJNjELARKlxqCBOHo9MT 5LDgYs004y/1OojEeIiDrea03uG0YjVO2RKm4wZaujs1E9qDo0CcNA3+qFjTb/9KcFCVUI V2fFDzD//WLjjyGsLOlCjHlg6Wh5X8E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-355-K_zC0YpLOs6gHmmqGjcAbg-1; Thu, 10 Jun 2021 02:57:20 -0400 X-MC-Unique: K_zC0YpLOs6gHmmqGjcAbg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE5DD100C662; Thu, 10 Jun 2021 06:57:18 +0000 (UTC) Received: from T590 (ovpn-13-145.pek2.redhat.com [10.72.13.145]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7BE5760937; Thu, 10 Jun 2021 06:57:07 +0000 (UTC) Date: Thu, 10 Jun 2021 14:57:03 +0800 From: Ming Lei To: Roman Gushchin Cc: Andrew Morton , Tejun Heo , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro , Jan Kara , Dennis Zhou , Dave Chinner , cgroups@vger.kernel.org, Jan Kara Subject: Re: [PATCH v9 3/8] writeback, cgroup: increment isw_nr_in_flight before grabbing an inode Message-ID: References: <20210608230225.2078447-1-guro@fb.com> <20210608230225.2078447-4-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 09, 2021 at 05:21:14PM -0700, Roman Gushchin wrote: > On Wed, Jun 09, 2021 at 11:32:44AM +0800, Ming Lei wrote: > > On Tue, Jun 08, 2021 at 04:02:20PM -0700, Roman Gushchin wrote: > > > isw_nr_in_flight is used do determine whether the inode switch queue > > > should be flushed from the umount path. Currently it's increased > > > after grabbing an inode and even scheduling the switch work. It means > > > the umount path can be walked past cleanup_offline_cgwb() with active > > > inode references, which can result in a "Busy inodes after unmount." > > > message and use-after-free issues (with inode->i_sb which gets freed). > > > > > > Fix it by incrementing isw_nr_in_flight before doing anything with > > > the inode and decrementing in the case when switching wasn't scheduled. > > > > > > The problem hasn't yet been seen in the real life and was discovered > > > by Jan Kara by looking into the code. > > > > > > Suggested-by: Jan Kara > > > Signed-off-by: Roman Gushchin > > > Reviewed-by: Jan Kara > > > --- > > > fs/fs-writeback.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > > index b6fc13a4962d..4413e005c28c 100644 > > > --- a/fs/fs-writeback.c > > > +++ b/fs/fs-writeback.c > > > @@ -505,6 +505,8 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) > > > if (!isw) > > > return; > > > > > > + atomic_inc(&isw_nr_in_flight); > > > > smp_mb() may be required for ordering the WRITE in 'atomic_inc(&isw_nr_in_flight)' > > and the following READ on 'inode->i_sb->s_flags & SB_ACTIVE'. Otherwise, > > cgroup_writeback_umount() may observe zero of 'isw_nr_in_flight' because of > > re-order of the two OPs, then miss the flush_workqueue(). > > > > Also this barrier should serve as pair of the one added in cgroup_writeback_umount(), > > so maybe this patch should be merged with 2/8. > > Hi Ming! > > Good point, I agree. How about a patch below? > > Thanks! > > -- > > From 282861286074c47907759d80c01419f0d0630dae Mon Sep 17 00:00:00 2001 > From: Roman Gushchin > Date: Wed, 9 Jun 2021 14:14:26 -0700 > Subject: [PATCH] cgroup, writeback: add smp_mb() to inode_prepare_wbs_switch() > > Add a memory barrier between incrementing isw_nr_in_flight > and checking the sb's SB_ACTIVE flag and grabbing an inode in > inode_prepare_wbs_switch(). It's required to prevent grabbing > an inode before incrementing isw_nr_in_flight, otherwise > 0 can be obtained as isw_nr_in_flight in cgroup_writeback_umount() > and isw_wq will not be flushed, potentially leading to a memory > corruption. > > Added smp_mb() will work in pair with smp_mb() in > cgroup_writeback_umount(). > > Suggested-by: Ming Lei > Signed-off-by: Roman Gushchin > --- > fs/fs-writeback.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 545fce68e919..6332b86ca4ed 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -513,6 +513,14 @@ static void inode_switch_wbs_work_fn(struct work_struct *work) > static bool inode_prepare_wbs_switch(struct inode *inode, > struct bdi_writeback *new_wb) > { > + /* > + * Paired with smp_mb() in cgroup_writeback_umount(). > + * isw_nr_in_flight must be increased before checking SB_ACTIVE and > + * grabbing an inode, otherwise isw_nr_in_flight can be observed as 0 > + * in cgroup_writeback_umount() and the isw_wq will be not flushed. > + */ > + smp_mb(); > + > /* while holding I_WB_SWITCH, no one else can update the association */ > spin_lock(&inode->i_lock); > if (!(inode->i_sb->s_flags & SB_ACTIVE) || Looks fine, you may have to merge this one with 2/8 & 3/8, so the memory barrier use can be correct & intact for avoiding the race between switching cgwb and generic_shutdown_super(). Thanks, Ming