Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4495169pxv; Tue, 20 Jul 2021 05:12:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNL1uB1w55881qpRdugejW2fME5JOCBd4uElYiHbRFF9G3kiig1v7/ZHzxeQRqWtVgxCqE X-Received: by 2002:a02:ca58:: with SMTP id i24mr25731472jal.101.1626783173690; Tue, 20 Jul 2021 05:12:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626783173; cv=none; d=google.com; s=arc-20160816; b=ylOChnZGA0J+XSgnampBzeqdhZYchCK2HRxGtcvblqrc9aPsmttvtgpxFJ5WvgDlDz 4cn2HXXS9Nl0KtmscniXlpNso17ECbeBlN6Uy4vwefx2+94jU1i4rvapF3fUq4u6lzkH 2Hd5e1I6BbLnLKixo48QAzBKN4DtWaPwwB5u8QfqwrAEiRXunxzXuB5oepyTy1R7rYIc t6p/+sfqU3LxcHHhfC2fWqQcEGg2wf8K0EZOU0hiCUDODcm1msFNZ7NVUNgytnDEXhZs YM4mfqhpudGQqS2ggKrF0G9t9qlDBh2j9ahao0v5ogjdZLXMrN1KiFZDTEyiEjXnb8bN hMSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date; bh=8w3DqmEF48XcRCC92X7GZbOf//po2B7bmSJxVu13dJM=; b=OMz0TygAL84nao8SspPnmdBdIYOgpSbatLA3ETkN6d6LlyHysyY6g7OoYUZq1Xvb3x OlSwV9s/YdWoe1dr6z2lFpRn/CwGmIU7CAD7y0DfZs1krWhK25Ct8yb3rkt2YzpvwiBr gnz1TrdCGCxo6Bd1tFNClsgHqhGkmHSSv/7r/bgg8n7ExRPmvOM1s2u4toy4wotr3ZOX sAMgQYzFEf0Cspugn0vkUeFPMnaHsoAqNqnhQhVMBfS1yUlk4KSpcTonD8Ol7wWNTCOZ QgRmS6zpnOzD2bytS5lQf1oJBEgGuQHO/r1ba4/kiJzdrfmwXyHsl7JfPl4ZA1zOFP+2 PUnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s16si22864294jan.83.2021.07.20.05.12.42; Tue, 20 Jul 2021 05:12:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235455AbhGTLaD (ORCPT + 99 others); Tue, 20 Jul 2021 07:30:03 -0400 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:50389 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234895AbhGTL36 (ORCPT ); Tue, 20 Jul 2021 07:29:58 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0UgPxzhH_1626783033; Received: from B-P7TQMD6M-0146.local(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0UgPxzhH_1626783033) by smtp.aliyun-inc.com(127.0.0.1); Tue, 20 Jul 2021 20:10:34 +0800 Date: Tue, 20 Jul 2021 20:10:33 +0800 From: Gao Xiang To: Andreas =?utf-8?Q?Gr=C3=BCnbacher?= Cc: linux-erofs@lists.ozlabs.org, Linux FS-devel Mailing List , LKML , Christoph Hellwig , "Darrick J . Wong" , Matthew Wilcox Subject: Re: [PATCH v3] iomap: support tail packing inline read Message-ID: Mail-Followup-To: Andreas =?utf-8?Q?Gr=C3=BCnbacher?= , linux-erofs@lists.ozlabs.org, Linux FS-devel Mailing List , LKML , Christoph Hellwig , "Darrick J . Wong" , Matthew Wilcox References: <20210719144747.189634-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andreas, On Tue, Jul 20, 2021 at 01:23:41PM +0200, Andreas Grünbacher wrote: > Am Mo., 19. Juli 2021 um 16:48 Uhr schrieb Gao Xiang > : > > This tries to add tail packing inline read to iomap, which can support > > several inline tail blocks. Similar to the previous approach, it cleans > > post-EOF in one iteration. > > > > The write path remains untouched since EROFS cannot be used for testing. > > It'd be better to be implemented if upcoming real users care rather than > > leave untested dead code around. > > > > Cc: Christoph Hellwig > > Cc: Darrick J. Wong > > Cc: Matthew Wilcox > > Cc: Andreas Gruenbacher > > Signed-off-by: Gao Xiang > > --- > > v2: https://lore.kernel.org/r/YPLdSja%2F4FBsjss%2F@B-P7TQMD6M-0146.local/ > > changes since v2: > > - update suggestion from Christoph: > > https://lore.kernel.org/r/YPVe41YqpfGLNsBS@infradead.org/ > > > > Hi Andreas, > > would you mind test on the gfs2 side? Thanks in advance! > > > > Thanks, > > Gao Xiang > > > > fs/iomap/buffered-io.c | 50 ++++++++++++++++++++++++++---------------- > > fs/iomap/direct-io.c | 11 ++++++---- > > 2 files changed, 38 insertions(+), 23 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index 87ccb3438bec..cac8a88660d8 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -207,23 +207,22 @@ struct iomap_readpage_ctx { > > > > static void > > iomap_read_inline_data(struct inode *inode, struct page *page, > > - struct iomap *iomap) > > + struct iomap *iomap, loff_t pos) > > { > > - size_t size = i_size_read(inode); > > + unsigned int size, poff = offset_in_page(pos); > > void *addr; > > > > - if (PageUptodate(page)) > > - return; > > - > > - BUG_ON(page_has_private(page)); > > - BUG_ON(page->index); > > - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + /* inline source data must be inside a single page */ > > + BUG_ON(iomap->length > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + /* handle tail-packing blocks cross the current page into the next */ > > + size = min_t(unsigned int, iomap->length + pos - iomap->offset, > > + PAGE_SIZE - poff); > > Hmm, so EROFS really does multi-page tail packing? This contradicts > the comment and code in iomap_dio_inline_actor. No, it doesn't really contradict anything. There are 2 different concepts, the one is the metapage of iomap->inline_data itself. It should be in the same page, so the inode itself and inline data can be in the same page since currently assumed we don't support partial page read. ___________________________________________________ | inode | inline data | |<-------------- metadata page ------------------->| (here inline data can be multiple blocks.) The other one is actual file tail blocks, I think it can cross pages due to multiple tail inline blocks. |<---------- inline data ------------->| _________________________________________________________________ | file block | file block | file block | file block | file block | |<---------------- page ---------------------->|<--- page Although EROFS currently only support page-sized block, but I will look into subpage-sized blocks after iomap work is done (due to PAGE_SIZE of some platform is large, e.g. 64kb rather than 4kb.) > > > addr = kmap_atomic(page); > > - memcpy(addr, iomap->inline_data, size); > > - memset(addr + size, 0, PAGE_SIZE - size); > > + memcpy(addr + poff, iomap->inline_data - iomap->offset + pos, size); > > + memset(addr + poff + size, 0, PAGE_SIZE - poff - size); > > kunmap_atomic(addr); > > - SetPageUptodate(page); > > + iomap_set_range_uptodate(page, poff, PAGE_SIZE - poff); > > } > > It's been said before, but iomap_read_inline_data should return > PAGE_SIZE - poff, and no (void) casting when the return value is > ignored. Ok, anyway, I could update it in the next version. > > > static inline bool iomap_block_needs_zeroing(struct inode *inode, > > @@ -246,18 +245,19 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > > unsigned poff, plen; > > sector_t sector; > > > > - if (iomap->type == IOMAP_INLINE) { > > - WARN_ON_ONCE(pos); > > - iomap_read_inline_data(inode, page, iomap); > > - return PAGE_SIZE; > > - } > > - > > - /* zero post-eof blocks as the page may be mapped */ > > iop = iomap_page_create(inode, page); > > + /* needs to skip some leading uptodated blocks */ > > "needs to skip some leading uptodate blocks" will update. > > > iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); > > if (plen == 0) > > goto done; > > > > + if (iomap->type == IOMAP_INLINE) { > > + iomap_read_inline_data(inode, page, iomap, pos); > > + plen = PAGE_SIZE - poff; > > + goto done; > > + } > > + > > + /* zero post-eof blocks as the page may be mapped */ > > if (iomap_block_needs_zeroing(inode, iomap, pos)) { > > zero_user(page, poff, plen); > > iomap_set_range_uptodate(page, poff, plen); > > @@ -589,6 +589,18 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags, > > return 0; > > } > > > > +static int iomap_write_begin_inline(struct inode *inode, loff_t pos, > > + struct page *page, struct iomap *srcmap) > > +{ > > + /* needs more work for the tailpacking case, disable for now */ > > + if (WARN_ON_ONCE(pos != 0)) > > This should be a WARN_ON_ONCE(srcmap->offset != 0). Otherwise, something like: > > xfs_io -ft -c 'pwrite 1 2' > > will fail because pos will be 1. Yeah, will update. Thanks for pointing out! > > > + return -EIO; > > + if (PageUptodate(page)) > > + return 0; > > + iomap_read_inline_data(inode, page, srcmap, pos); > > The above means that passing pos to iomap_read_inline_data here won't > do the right thing, either. yeah, I think it should use 0 here instead. since iomap->offset == 0 Thanks, Gao Xiang > > > + return 0; > > +} > > + > > static int > > iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > struct page **pagep, struct iomap *iomap, struct iomap *srcmap) > > @@ -618,7 +630,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > } > > > > if (srcmap->type == IOMAP_INLINE) > > - iomap_read_inline_data(inode, page, srcmap); > > + status = iomap_write_begin_inline(inode, pos, page, srcmap); > > else if (iomap->flags & IOMAP_F_BUFFER_HEAD) > > status = __block_write_begin_int(page, pos, len, NULL, srcmap); > > else > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index 9398b8c31323..ee6309967b77 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -379,22 +379,25 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, > > { > > struct iov_iter *iter = dio->submit.iter; > > size_t copied; > > + void *dst = iomap->inline_data + pos - iomap->offset; > > > > - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + /* inline data must be inside a single page */ > > + BUG_ON(length > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > > > if (dio->flags & IOMAP_DIO_WRITE) { > > loff_t size = inode->i_size; > > > > if (pos > size) > > - memset(iomap->inline_data + size, 0, pos - size); > > - copied = copy_from_iter(iomap->inline_data + pos, length, iter); > > + memset(iomap->inline_data + size - iomap->offset, > > + 0, pos - size); > > + copied = copy_from_iter(dst, length, iter); > > if (copied) { > > if (pos + copied > size) > > i_size_write(inode, pos + copied); > > mark_inode_dirty(inode); > > } > > } else { > > - copied = copy_to_iter(iomap->inline_data + pos, length, iter); > > + copied = copy_to_iter(dst, length, iter); > > } > > dio->size += copied; > > return copied; > > -- > > 2.24.4 > > > > Thanks, > Andreas