Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp2895989ybd; Thu, 27 Jun 2019 22:37:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqx8FZW9MrXiLR92lT2yylIMtJOvZCWkd+VUykwXiVVqCJTnXSnag4sLh7ZVUydSRo45TnUF X-Received: by 2002:a17:902:d70a:: with SMTP id w10mr8786490ply.251.1561700265009; Thu, 27 Jun 2019 22:37:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561700265; cv=none; d=google.com; s=arc-20160816; b=e3/v1BRxUbQhOEw8Q5uUSHmO5yMwVtFjqcJ6n80l1WCUE7u94TUWQKTRqVkqzIIp5V f8Fhvl7W8rfyBiMveJrYIVVH7Hyo7CHkFLiYrNuAgdCE4IH98gRv+drtdMz6DZq4v0xU gcehD9ZbnXCXX89lCqzBPQ7Use1LLKLFIcy+jO99agpNWO6pGal7kY/zvMu28AykJG9f igtFhtipwBwB5fg+/QVWvS7kmR4K9w7MIrgPqPJsjPygJT9UGo55UgptCaI0oBqgOwSn 0MucFSiReB/ZvxO7JpvaI2HxrgnN2XNdZ0w5RGl9VlNsmCRCbwqdOsxsjM1eRneBc6pm YoEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=vprX6gOziN3Is0tDZGXBHkrbGu8gG54H4BxDM3EGVmU=; b=06Y9Dq9wKOJA740EbeFmdofCnDD1lJ0wWWNnlEBRzysjtn+xIHqwby+JtiunMZHi7q 3HbKsRcgnWzhzIWlCeL+eG9Pdnbb4syLPvG/z81azbzNcYlDMbtvrEMXqF+BweRLPX9f vogDwVTrL/J1tO3fLjry2gaVZ0ON1Nrtbc5xc92ishDB0XhtKtwdT07+Ssb7+fiJH460 s3U4AnRjkYUxad8U3i69r7+loCGSOjae/e3Aey8+wqg+btbIjGqqSxZYQQDEl9SQHIbf jxDVm7YpbZD/4WFroWHy8egHOiXcan+7B2MHJ67GWjUq4qHuRUvVz4Zekbd1B+SYXNe0 w7OQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si1170198pli.151.2019.06.27.22.37.28; Thu, 27 Jun 2019 22:37:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726707AbfF1FhV (ORCPT + 99 others); Fri, 28 Jun 2019 01:37:21 -0400 Received: from verein.lst.de ([213.95.11.210]:45054 "EHLO newverein.lst.de" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726240AbfF1FhV (ORCPT ); Fri, 28 Jun 2019 01:37:21 -0400 Received: by newverein.lst.de (Postfix, from userid 2407) id 350AF68C7B; Fri, 28 Jun 2019 07:37:17 +0200 (CEST) Date: Fri, 28 Jun 2019 07:37:17 +0200 From: Christoph Hellwig To: "Darrick J. Wong" Cc: Christoph Hellwig , Damien Le Moal , Andreas Gruenbacher , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 06/13] xfs: remove XFS_TRANS_NOFS Message-ID: <20190628053717.GB26902@lst.de> References: <20190627104836.25446-1-hch@lst.de> <20190627104836.25446-7-hch@lst.de> <20190627223030.GS5171@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190627223030.GS5171@magnolia> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 27, 2019 at 03:30:30PM -0700, Darrick J. Wong wrote: > I think the wording of this is too indirect. The reason we need to set > NOFS is because we could be doing writeback as part of reclaiming > memory, which means that we cannot recurse back into filesystems to > satisfy the memory allocation needed to create a transaction. The NOFS > part applies to any memory allocation, of course. > > If you're fine with the wording below I'll just edit that into the > patch: > > /* > * We can allocate memory here while doing writeback on behalf of > * memory reclaim. To avoid memory allocation deadlocks set the > * task-wide nofs context for the following operations. > */ > nofs_flag = memalloc_nofs_save(); Fine with me. > > trace_xfs_end_io_direct_write(ip, offset, size); > > @@ -395,10 +396,11 @@ xfs_dio_write_end_io( > > */ > > XFS_STATS_ADD(ip->i_mount, xs_write_bytes, size); > > > > + nofs_flag = memalloc_nofs_save(); > > Hmm, do we need this here? I can't remember if there was a need for > setting NOFS under xfs_reflink_end_cow from a dio completion or if that > was purely the buffered writeback case... We certainly had to add it for the unwritten extent conversion, maybe the corner case just didn't manage to show up for COW yet: commit 80641dc66a2d6dfb22af4413227a92b8ab84c7bb Author: Christoph Hellwig Date: Mon Oct 19 04:00:03 2009 +0000 xfs: I/O completion handlers must use NOFS allocations When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig Reported-by: Thomas Neumann Tested-by: Thomas Neumann Reviewed-by: Alex Elder Signed-off-by: Alex Elder diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 2d0b3e1da9e6..6f83f58c099f 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -611,7 +611,7 @@ xfs_fs_log_dummy( xfs_inode_t *ip; int error; - tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1); + tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1, KM_SLEEP); error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0); if (error) { xfs_trans_cancel(tp, 0); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 67ae5555a30a..7294abce6ef2 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -860,8 +860,15 @@ xfs_iomap_write_unwritten( * set up a transaction to convert the range of extents * from unwritten to real. Do allocations in a loop until * we have covered the range passed in. + * + * Note that we open code the transaction allocation here + * to pass KM_NOFS--we can't risk to recursing back into + * the filesystem here as we might be asked to write out + * the same inode that we complete here and might deadlock + * on the iolock. */ - tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE); + xfs_wait_for_freeze(mp, SB_FREEZE_TRANS); + tp = _xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE, KM_NOFS); tp->t_flags |= XFS_TRANS_RESERVE; error = xfs_trans_reserve(tp, resblks, XFS_WRITE_LOG_RES(mp), 0, diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 8b6c9e807efb..4d509f742bd2 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1471,7 +1471,7 @@ xfs_log_sbcount( if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) return 0; - tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT); + tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT, KM_SLEEP); error = xfs_trans_reserve(tp, 0, mp->m_sb.sb_sectsize + 128, 0, 0, XFS_DEFAULT_LOG_COUNT); if (error) { diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 66b849358e62..237badcbac3b 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -236,19 +236,20 @@ xfs_trans_alloc( uint type) { xfs_wait_for_freeze(mp, SB_FREEZE_TRANS); - return _xfs_trans_alloc(mp, type); + return _xfs_trans_alloc(mp, type, KM_SLEEP); } xfs_trans_t * _xfs_trans_alloc( xfs_mount_t *mp, - uint type) + uint type, + uint memflags) { xfs_trans_t *tp; atomic_inc(&mp->m_active_trans); - tp = kmem_zone_zalloc(xfs_trans_zone, KM_SLEEP); + tp = kmem_zone_zalloc(xfs_trans_zone, memflags); tp->t_magic = XFS_TRANS_MAGIC; tp->t_type = type; tp->t_mountp = mp; diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index ed47fc77759c..a0574f593f52 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -924,7 +924,7 @@ typedef struct xfs_trans { * XFS transaction mechanism exported interfaces. */ xfs_trans_t *xfs_trans_alloc(struct xfs_mount *, uint); -xfs_trans_t *_xfs_trans_alloc(struct xfs_mount *, uint); +xfs_trans_t *_xfs_trans_alloc(struct xfs_mount *, uint, uint); xfs_trans_t *xfs_trans_dup(xfs_trans_t *); int xfs_trans_reserve(xfs_trans_t *, uint, uint, uint, uint, uint);