Received: by 10.192.165.156 with SMTP id m28csp460191imm; Wed, 11 Apr 2018 01:54:10 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+tMfTKD07aqNiZ7Chvqootbi4RRdv9t0w2R9nVO4ke0+UqZrPs/M/zfm1IfdIk+/+2Vv68 X-Received: by 10.167.128.81 with SMTP id y17mr3273370pfm.194.1523436850853; Wed, 11 Apr 2018 01:54:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523436850; cv=none; d=google.com; s=arc-20160816; b=xF4P6hIEYw6yl4Dm62ujSRYI3GDoOWCpTGVl2SrANnRE37QP5JqqUfzOGlD3MrlSAS 8Z0smUbbVgU+AHvwyeE+/eKpjPNkwswSdxCZzvHSRgMVAFPN4v0HE4jdfRuQ/aAnrB+X 9xJJx9aDiZdium4jnIvgZ4HiGMhOqNYvTcxGs67lfq/F7V2ctb8l7JNBw9v3qwYzazpo BX6XeuT7nuoYMpizUwfMpNVNugTPwVXiRQvKJFX2Jm2UMLB4ehNj3l1V8iWEI/jJWPFS 5EYrqzE/c9F7zi4LxM4k88pHriB0Kq29uq0m+aAXQyNQPsKqngeY9qTOiQjPEwqxIH/W 60Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=rY4wUbdpHzz46SlH713UcIm/42s6VYasQgV0KDXJ5Cc=; b=UI4c51AHA4MtWtlC2TgdcK7Dxe3xw7ZYAHrrwQtkoBlDIJbbDUWSjPdtMz/djR4Cy9 WAaR6TbTCNUCA1OeEsyaR34OCaxkY3nGhEdV0faL5mzfHBXgQf+p3ut2TaTWHh5jWFu6 LwwULKVATix9j7lvKXO185QXqya6COz04L9Ns2ePcmD0ETMzGX0C4Qs7nljY5zNLQBS/ NuCUqbIhFpYxx33E7TktXt4Vzso11DwJm6ttx2nWMyQYdBxae0sfrTylFlV9a5d6ovMT Xz1ZHW4SxUBuRR8QerBeMQfjk/9XzjVl15zna0kQ3Feg1yCoXRqyB1zlS3F/RZQXUL2/ O05Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LIONvCys; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z85si517660pfk.194.2018.04.11.01.53.34; Wed, 11 Apr 2018 01:54:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LIONvCys; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752511AbeDKIuq (ORCPT + 99 others); Wed, 11 Apr 2018 04:50:46 -0400 Received: from mail-yb0-f193.google.com ([209.85.213.193]:46923 "EHLO mail-yb0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751841AbeDKIuk (ORCPT ); Wed, 11 Apr 2018 04:50:40 -0400 Received: by mail-yb0-f193.google.com with SMTP id e5-v6so344527ybq.13 for ; Wed, 11 Apr 2018 01:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rY4wUbdpHzz46SlH713UcIm/42s6VYasQgV0KDXJ5Cc=; b=LIONvCys9+xNXyLkbmApQdMiU7J/8oXSZo/KHYM+9l0XPFqJ4sCkKDuTWoubgUJkQC ww65nfQdTjaCmZTPi3zkuyHyWsoN9vR2Gv2Po/PuQH3e85xccpi/DNhfDT7HD2jsdWW+ rT6AitJl5GFbmjh96bKFRlzV6pjhjkGINK9Nk9e4pEhugn6Am0zFV0lQlaMNNQDe20lM 6JhjG6mw4iTEPB+XiKAgO6Z+MBBXkUidH3eCuIaFqrgyVn9kXh/WZAXKlT0DDv54OFyA M9U92RVZon9QlCmCqfuq6mc1eo3P1ghlhzF/ECOoTe4opbVTAsuRYUGjrhT4gaVA+qNH rcxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rY4wUbdpHzz46SlH713UcIm/42s6VYasQgV0KDXJ5Cc=; b=VOGKr3HVQtw9vjg2HNsmVCCBOVjmSVkYpvaAPTI4NkQVOxAZLO2pGjQvZGp8crjRsb DaMP5Km9o3+rf/V8B/1XwNGRIN5G+ZXTd6ID5v8Ij2DsUJXdfjBmvtl88UesWTCCGP5G VmO+Ve/ieROmUZxuqz4D+3l0yMg/7d9Tftd5xq+I8a22KO8v14c5+Ye1ivG1MAeru+aN Em1Qi4qE2vfoytrjPAEfzSTQrpgwZC0GFW2VqHn1m47jqH0w2ZSN/7rmIn+alFFVKY2R erUPTM0vd7POhd8uVjB5sypghh5Lu/g04GuK8FP2VJ3kcmR1CLRlGbvOH70tvOp73E9U VuiA== X-Gm-Message-State: ALQs6tBMKtwHCTRrocQPqRU4EZpGgOiyBnz/j6S5Fuoo94/fXFys3rSj v3UwGNM+9m/vt+XDfScDb6NN3pV6jzhEqFrYEc0saw== X-Received: by 2002:a25:ac8d:: with SMTP id x13-v6mr1832629ybi.448.1523436639605; Wed, 11 Apr 2018 01:50:39 -0700 (PDT) MIME-Version: 1.0 References: <20180411084521.254006-1-gthelen@google.com> In-Reply-To: <20180411084521.254006-1-gthelen@google.com> From: Greg Thelen Date: Wed, 11 Apr 2018 08:50:28 +0000 Message-ID: Subject: Re: [PATCH for-4.4] writeback: safer lock nesting To: Wang Long , Michal Hocko , Andrew Morton , Johannes Weiner , Tejun Heo , Sasha Levin Cc: npiggin@gmail.com, LKML , Linux MM , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 11, 2018 at 1:45 AM Greg Thelen wrote: > lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if > the page's memcg is undergoing move accounting, which occurs when a > process leaves its memcg for a new one that has > memory.move_charge_at_immigrate set. > unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the > given inode is switching writeback domains. Switches occur when enough > writes are issued from a new domain. > This existing pattern is thus suspicious: > lock_page_memcg(page); > unlocked_inode_to_wb_begin(inode, &locked); > ... > unlocked_inode_to_wb_end(inode, locked); > unlock_page_memcg(page); > If both inode switch and process memcg migration are both in-flight then > unlocked_inode_to_wb_end() will unconditionally enable interrupts while > still holding the lock_page_memcg() irq spinlock. This suggests the > possibility of deadlock if an interrupt occurs before > unlock_page_memcg(). > truncate > __cancel_dirty_page > lock_page_memcg > unlocked_inode_to_wb_begin > unlocked_inode_to_wb_end > > > end_page_writeback > test_clear_page_writeback > lock_page_memcg > > unlock_page_memcg > Due to configuration limitations this deadlock is not currently possible > because we don't mix cgroup writeback (a cgroupv2 feature) and > memory.move_charge_at_immigrate (a cgroupv1 feature). > If the kernel is hacked to always claim inode switching and memcg > moving_account, then this script triggers lockup in less than a minute: > cd /mnt/cgroup/memory > mkdir a b > echo 1 > a/memory.move_charge_at_immigrate > echo 1 > b/memory.move_charge_at_immigrate > ( > echo $BASHPID > a/cgroup.procs > while true; do > dd if=/dev/zero of=/mnt/big bs=1M count=256 > done > ) & > while true; do > sync > done & > sleep 1h & > SLEEP=$! > while true; do > echo $SLEEP > a/cgroup.procs > echo $SLEEP > b/cgroup.procs > done > The deadlock does not seem possible, so it's debatable if there's > any reason to modify the kernel. I suggest we should to prevent future > surprises. And Wang Long said "this deadlock occurs three times in our > environment", so there's more reason to apply this, even to stable. Wang Long: I wasn't able to reproduce the 4.4 problem. But tracing suggests this 4.4 patch is effective. If you can reproduce the problem in your 4.4 environment, then it'd be nice to confirm this fixes it. Thanks! > [ This patch is only for 4.4 stable. Newer stable kernels should use be able to > cherry pick the upstream "writeback: safer lock nesting" patch. ] > Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") > Cc: stable@vger.kernel.org # v4.2+ > Reported-by: Wang Long > Signed-off-by: Greg Thelen > Acked-by: Michal Hocko > Acked-by: Wang Long > --- > fs/fs-writeback.c | 7 ++++--- > include/linux/backing-dev-defs.h | 5 +++++ > include/linux/backing-dev.h | 31 +++++++++++++++++-------------- > mm/page-writeback.c | 18 +++++++++--------- > 4 files changed, 35 insertions(+), 26 deletions(-) > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 22b30249fbcb..0fe667875852 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -747,11 +747,12 @@ int inode_congested(struct inode *inode, int cong_bits) > */ > if (inode && inode_to_wb_is_valid(inode)) { > struct bdi_writeback *wb; > - bool locked, congested; > + struct wb_lock_cookie lock_cookie = {}; > + bool congested; > - wb = unlocked_inode_to_wb_begin(inode, &locked); > + wb = unlocked_inode_to_wb_begin(inode, &lock_cookie); > congested = wb_congested(wb, cong_bits); > - unlocked_inode_to_wb_end(inode, locked); > + unlocked_inode_to_wb_end(inode, &lock_cookie); > return congested; > } > diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h > index 140c29635069..a307c37c2e6c 100644 > --- a/include/linux/backing-dev-defs.h > +++ b/include/linux/backing-dev-defs.h > @@ -191,6 +191,11 @@ static inline void set_bdi_congested(struct backing_dev_info *bdi, int sync) > set_wb_congested(bdi->wb.congested, sync); > } > +struct wb_lock_cookie { > + bool locked; > + unsigned long flags; > +}; > + > #ifdef CONFIG_CGROUP_WRITEBACK > /** > diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h > index 89d3de3e096b..361274ce5815 100644 > --- a/include/linux/backing-dev.h > +++ b/include/linux/backing-dev.h > @@ -366,7 +366,7 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode) > /** > * unlocked_inode_to_wb_begin - begin unlocked inode wb access transaction > * @inode: target inode > - * @lockedp: temp bool output param, to be passed to the end function > + * @cookie: output param, to be passed to the end function > * > * The caller wants to access the wb associated with @inode but isn't > * holding inode->i_lock, mapping->tree_lock or wb->list_lock. This > @@ -374,12 +374,12 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode) > * association doesn't change until the transaction is finished with > * unlocked_inode_to_wb_end(). > * > - * The caller must call unlocked_inode_to_wb_end() with *@lockdep > - * afterwards and can't sleep during transaction. IRQ may or may not be > - * disabled on return. > + * The caller must call unlocked_inode_to_wb_end() with *@cookie afterwards and > + * can't sleep during the transaction. IRQs may or may not be disabled on > + * return. > */ > static inline struct bdi_writeback * > -unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp) > +unlocked_inode_to_wb_begin(struct inode *inode, struct wb_lock_cookie *cookie) > { > rcu_read_lock(); > @@ -387,10 +387,10 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp) > * Paired with store_release in inode_switch_wb_work_fn() and > * ensures that we see the new wb if we see cleared I_WB_SWITCH. > */ > - *lockedp = smp_load_acquire(&inode->i_state) & I_WB_SWITCH; > + cookie->locked = smp_load_acquire(&inode->i_state) & I_WB_SWITCH; > - if (unlikely(*lockedp)) > - spin_lock_irq(&inode->i_mapping->tree_lock); > + if (unlikely(cookie->locked)) > + spin_lock_irqsave(&inode->i_mapping->tree_lock, cookie->flags); > /* > * Protected by either !I_WB_SWITCH + rcu_read_lock() or tree_lock. > @@ -402,12 +402,14 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp) > /** > * unlocked_inode_to_wb_end - end inode wb access transaction > * @inode: target inode > - * @locked: *@lockedp from unlocked_inode_to_wb_begin() > + * @cookie: @cookie from unlocked_inode_to_wb_begin() > */ > -static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked) > +static inline void unlocked_inode_to_wb_end(struct inode *inode, > + struct wb_lock_cookie *cookie) > { > - if (unlikely(locked)) > - spin_unlock_irq(&inode->i_mapping->tree_lock); > + if (unlikely(cookie->locked)) > + spin_unlock_irqrestore(&inode->i_mapping->tree_lock, > + cookie->flags); > rcu_read_unlock(); > } > @@ -454,12 +456,13 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode) > } > static inline struct bdi_writeback * > -unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp) > +unlocked_inode_to_wb_begin(struct inode *inode, struct wb_lock_cookie *cookie) > { > return inode_to_wb(inode); > } > -static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked) > +static inline void unlocked_inode_to_wb_end(struct inode *inode, > + struct wb_lock_cookie *cookie) > { > } > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 6d0dbde4503b..3309dbda7ffa 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -2510,13 +2510,13 @@ void account_page_redirty(struct page *page) > if (mapping && mapping_cap_account_dirty(mapping)) { > struct inode *inode = mapping->host; > struct bdi_writeback *wb; > - bool locked; > + struct wb_lock_cookie cookie = {}; > - wb = unlocked_inode_to_wb_begin(inode, &locked); > + wb = unlocked_inode_to_wb_begin(inode, &cookie); > current->nr_dirtied--; > dec_zone_page_state(page, NR_DIRTIED); > dec_wb_stat(wb, WB_DIRTIED); > - unlocked_inode_to_wb_end(inode, locked); > + unlocked_inode_to_wb_end(inode, &cookie); > } > } > EXPORT_SYMBOL(account_page_redirty); > @@ -2622,15 +2622,15 @@ void cancel_dirty_page(struct page *page) > struct inode *inode = mapping->host; > struct bdi_writeback *wb; > struct mem_cgroup *memcg; > - bool locked; > + struct wb_lock_cookie cookie = {}; > memcg = mem_cgroup_begin_page_stat(page); > - wb = unlocked_inode_to_wb_begin(inode, &locked); > + wb = unlocked_inode_to_wb_begin(inode, &cookie); > if (TestClearPageDirty(page)) > account_page_cleaned(page, mapping, memcg, wb); > - unlocked_inode_to_wb_end(inode, locked); > + unlocked_inode_to_wb_end(inode, &cookie); > mem_cgroup_end_page_stat(memcg); > } else { > ClearPageDirty(page); > @@ -2663,7 +2663,7 @@ int clear_page_dirty_for_io(struct page *page) > struct inode *inode = mapping->host; > struct bdi_writeback *wb; > struct mem_cgroup *memcg; > - bool locked; > + struct wb_lock_cookie cookie = {}; > /* > * Yes, Virginia, this is indeed insane. > @@ -2701,14 +2701,14 @@ int clear_page_dirty_for_io(struct page *page) > * exclusion. > */ > memcg = mem_cgroup_begin_page_stat(page); > - wb = unlocked_inode_to_wb_begin(inode, &locked); > + wb = unlocked_inode_to_wb_begin(inode, &cookie); > if (TestClearPageDirty(page)) { > mem_cgroup_dec_page_stat(memcg, MEM_CGROUP_STAT_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_wb_stat(wb, WB_RECLAIMABLE); > ret = 1; > } > - unlocked_inode_to_wb_end(inode, locked); > + unlocked_inode_to_wb_end(inode, &cookie); > mem_cgroup_end_page_stat(memcg); > return ret; > } > -- > 2.17.0.484.g0c8726318c-goog