Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp998392ybl; Tue, 13 Aug 2019 06:00:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqxCHK3zUuSxRGNojXydV6R31evfT6nbKAnV0tmPVty4oeuEQR85FCiIP08rzfXnJ9NMu3tP X-Received: by 2002:a63:6d8d:: with SMTP id i135mr13737124pgc.303.1565701218147; Tue, 13 Aug 2019 06:00:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565701218; cv=none; d=google.com; s=arc-20160816; b=zb5vYCp+WISHWRLP28oV5sYZ0JErh0m5GgW+rr+2vLi8m+/6mB2VfeOV4tK8McW0FM II1CtbDrt61HlJLdQq4aOEFFfybW4yu+VmA7nF2Wvc17nMtqSSk5i9FjArnzr74Ep0XI 2VHAHeNcJfPXQuVx8xExE4lFvby3cD0pibVDMUPBmt+bowBBdyWqQzcp3G8kemPIAv05 kxzLEl5o7IAD5O3ImEZiGLQ+vYMRMVYk87gjiXrxK6Go0s2mqjQsT4c6wKK86DH1xtFy N/DMTJuqn3U8uguXiPIx3T/OlRx9eG44DPUDKcEmC8JygrmCLGzSag6Mgn6Uds54u3pu uCrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=BtgHrm08etfJjpF5KZmN8TsxqVOwcQO1JBv4H+Dr5cg=; b=GsDZWdteBRYNdvWXwhu83nIRRhJUCLYUqlE6c8IpAJYJebu9uavsAZhUw0WlZAjmzF bo0lkXsoncENTy+s5sBB3cNH6+wkmJFmGTk+X5HlL+zts4vBGVCTkuZvKsbGZC2Qjs2N ws4ZdgIeHMgf2J37Ws2SOOwPDVkJevCpx/C2VxEmzlwN3d6j+CUD//4nGAkT+4z5HR7g xSKWQFaIHTtDa+EqXuQ2qpOvhfOR5/O/jk35ucBkwxtD3ZzhxCfgBRLEG0kSTgzXB+K4 0T+tpOzDYNVb681jrpVqqKdmjM9LvVCdVBH0BQNJxOTG9KbWuyZT9fX7aIKPZ3iQh87Y i0fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mbobrowski-org.20150623.gappssmtp.com header.s=20150623 header.b=wEmr3cZi; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x5si447523pln.274.2019.08.13.05.59.56; Tue, 13 Aug 2019 06:00:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@mbobrowski-org.20150623.gappssmtp.com header.s=20150623 header.b=wEmr3cZi; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728757AbfHMM7B (ORCPT + 99 others); Tue, 13 Aug 2019 08:59:01 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:44334 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728342AbfHMM7A (ORCPT ); Tue, 13 Aug 2019 08:59:00 -0400 Received: by mail-pg1-f196.google.com with SMTP id i18so51230542pgl.11 for ; Tue, 13 Aug 2019 05:58:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=BtgHrm08etfJjpF5KZmN8TsxqVOwcQO1JBv4H+Dr5cg=; b=wEmr3cZiUAc2JpvdtUVMLKLT9KycYfEQTCLPN6raNK0E8SO/r2D6MAsFB8sxobyYTB JKHsJgYhlJ71M2Rlf1ky2eEywEQE1JgqOuuGQm+6KVHDcnZn4XWKc8NshLfPWMTv8dHQ 9Ac288GNUhb8Cum4V/duyv4hUCR86xG5Uuv8PJaE8K6Wb/IkZ0Bt3TOExh2Da7X8EfGG zAFjRbflE8ZDCijWL3TdCjVc3Beqo0oi7dVa6rxuXlTTtpSNi9W40Hz+3i+6e/yjBJpb moSTb4d4/BQkNFf3UZP7KPmE8TYoL+t/56kin8CzTW45xpfBb3aM4+pUtYz0oq52jHZE zvJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=BtgHrm08etfJjpF5KZmN8TsxqVOwcQO1JBv4H+Dr5cg=; b=kHrIgYmsXkeJF6xQvTuhGDV/Q/g5k62fXkDBYWLDJMJeHD9qDOoPoPDh4RggTLzf3X IFENArbJwaU0ZAi7JvBIJJOLhgsCHqwwg2uH0KEbjfUwT4brFx2efd4NgtkP8uvIUYlR 4kr5SIo1tA2udoXP/dwp0v7ml4swcNaHIkutRdpI133pdzSzm3Pb4arfU5j2eCeFVDrf WDOdTvKrRcSlX6/dISosVliqwvN4S5QIPCpMQSW9SMahRgulq+MZ4DVAWSL3c4yvDrj5 W3/TW3bZ/xjEMyhNH85hAlw4i9UfpT1dCtKDuPi3PypIJZ9n3Ld70vZ0jZWmYXUIGP4p JQBQ== X-Gm-Message-State: APjAAAWZKxKH133W0C7/n1X8Yf6HZd29jvHuIrRQJMVmnR5bv73/9Jvy Ycj1g6hzhd/BtYIeu59+S15T X-Received: by 2002:a63:89c7:: with SMTP id v190mr33058801pgd.299.1565701134413; Tue, 13 Aug 2019 05:58:54 -0700 (PDT) Received: from neo.Home ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id x1sm22171627pfj.182.2019.08.13.05.58.51 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 13 Aug 2019 05:58:53 -0700 (PDT) Date: Tue, 13 Aug 2019 22:58:42 +1000 From: Matthew Bobrowski To: RITESH HARJANI Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, jack@suse.cz, tytso@mit.edu Subject: Re: [PATCH 4/5] ext4: introduce direct IO write code path using iomap infrastructure Message-ID: <20190813125840.GA10187@neo.Home> References: <581c3a2da89991e7ce5862d93dcfb23e1dc8ddc8.1565609891.git.mbobrowski@mbobrowski.org> <20190812170430.982E552051@d06av21.portsmouth.uk.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190812170430.982E552051@d06av21.portsmouth.uk.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Aug 12, 2019 at 10:34:29PM +0530, RITESH HARJANI wrote: > > + if (offset + count > i_size_read(inode) || > > + offset + count > EXT4_I(inode)->i_disksize) { > > + ext4_update_i_disksize(inode, inode->i_size); > > + extend = true; > > + } > > + > > + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, ext4_dio_write_end_io); > > + > > + /* > > + * Unaligned direct AIO must be the only IO in flight or else > > + * any overlapping aligned IO after unaligned IO might result > > + * in data corruption. > > + */ > > + if (ret == -EIOCBQUEUED && (unaligned_aio || extend)) > > + inode_dio_wait(inode); > > Could you please add explain & add a comment about why we wait in AIO DIO > case > when extend is true? As I see without iomap code this case was not present > earlier. Because while using the iomap infrastructure for AIO writes, we return to the caller prior to invoking the ->end_io() handler. This callback is responsible for performing the in-core/on-disk inode extension if it is deemed necessary. If we don't wait in the case of an extend, we run the risk of loosing inode size consistencies in addition to things leading to possible data corruption... > > + > > + if (ret >= 0 && iov_iter_count(from)) { > > + overwrite ? inode_unlock_shared(inode) : inode_unlock(inode); > > + return ext4_buffered_write_iter(iocb, from); > > + } > should not we copy code from "__generic_file_write_iter" which does below? > > 3436???????????????? /* > 3437????????????????? * We need to ensure that the page cache pages are > written to > 3438????????????????? * disk and invalidated to preserve the expected > O_DIRECT > 3439????????????????? * semantics. > 3440????????????????? */ Hm, I don't see why this would be required seeing as though the page cache invalidation semantics pre and post write are handled by iomap_dio_rw() and iomap_dio_complete(). But, I could be completely wrong here, so we may need to wait for some others to provide comments on this. > > + WARN_ON(!(flags & IOMAP_DIRECT)); > > + if (round_down(offset, i_blocksize(inode)) >= > > + i_size_read(inode)) { > > + ret = ext4_map_blocks(handle, inode, &map, > > + EXT4_GET_BLOCKS_CREATE); > > + } else if (!ext4_test_inode_flag(inode, > > + EXT4_INODE_EXTENTS)) { > > + /* > > + * We cannot fill holes in indirect > > + * tree based inodes as that could > > + * expose stale data in the case of a > > + * crash. Use magic error code to > > + * fallback to buffered IO. > > + */ > > + ret = ext4_map_blocks(handle, inode, &map, 0); > > + if (ret == 0) > > + ret = -ENOTBLK; > > + } else { > > + ret = ext4_map_blocks(handle, inode, &map, > > + EXT4_GET_BLOCKS_IO_CREATE_EXT); > > + } > > + } > > Could you please check & confirm on below points - > 1. Do you see a problem @above in case of *overwrite* with extents mapping? > It will fall into EXT4_GET_BLOCKS_IO_CREATE_EXT case. > So are we piggy backing on the fact that ext4_map_blocks first call > ext4_ext_map_blocks > with flags & EXT4_GET_BLOCKS_KEEP_SIZE. And so for overwrite case since it > will return > val > 0 then we will anyway not create any blocks and so we don't need to > check overwrite > case specifically here? > > > 2. For cases with flags passed is 0 to ext4_map_blocks (overwrite & > fallocate without extent case), > we need not start the journaling transaction. But in above we are doing > ext4_journal_start/stop unconditionally > & unnecessarily reserving dio_credits blocks. > We need to take care of that right? Hm, I think you raise valid points here. Jan, do you have any comments on the above? I vaguely remember having a discussion around dropping the overwrite checks in ext4_iomap_begin() as we're removing the inode_lock() early on in ext4_dio_write_iter(), so it woudln't be necessary to do so. But, now that Ritesh mentioned it again I'm thinking it may actually be required... > > if (ret < 0) { > > ext4_journal_stop(handle); > > if (ret == -ENOSPC && > > @@ -3581,10 +3611,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > > iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE; > > iomap->addr = IOMAP_NULL_ADDR; > > } else { > > - if (map.m_flags & EXT4_MAP_MAPPED) { > > - iomap->type = IOMAP_MAPPED; > > - } else if (map.m_flags & EXT4_MAP_UNWRITTEN) { > > + if (map.m_flags & EXT4_MAP_UNWRITTEN) { > > iomap->type = IOMAP_UNWRITTEN; > > + } else if (map.m_flags & EXT4_MAP_MAPPED) { > > + iomap->type = IOMAP_MAPPED; > Maybe a comment as to explaining why checking UNWRITTEN before is necessary > for others. > So in case of fallocate & DIO write case we may get extent which is both > unwritten & mapped (right?). > so we need to check if we have an unwritten extent first so that it will > need the conversion in ->end_io > callback. Yes, that is essentially correct. --M