Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2621570pxb; Sun, 17 Oct 2021 20:45:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8946uiJJ/Rw0N17RKsDNTPy/wI6qLJoUpzF0AnALwGqpVGHbr2Z6yipR8pFF8PgvtMUT7 X-Received: by 2002:a17:90b:1049:: with SMTP id gq9mr41156571pjb.180.1634528749211; Sun, 17 Oct 2021 20:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634528749; cv=none; d=google.com; s=arc-20160816; b=xCbI0CtpvjtMV8WkaQFzAvvBUlpeaL0P3YLNasoENUboYDJTMHRZ+qycfKChdV3Oqw +ovS97eklSNVJiSOVCQhKn8uT/O3thSyyDkMHMLADRjjFsUcn7xMRTk/0UoQ7NsxnPYD B2d62tNRMxNO78k2qJAE0b4wx91FzFZvU/j0GywlIXDO2LsM1qpriNBuy7MQI+uL/R4C ChppBSPylh6RRT+tBYKp1KepAfV3xXvDjxQmyB09/OR68pq4zik8XH6hUejvxU2zvhHv ZXdwAkkUOPvHR3lQdDTep8PU7+FeEtdWKcsc8x4zyWPgyBqrA950M6H1r2OBV59kX44D smow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:cc:to :from:date:dkim-signature; bh=IY730OPa2E5nsqsMhgCVV/qNWUXE2aNpMQDnFnoUtyo=; b=HIjmH++V3jr1pN99o1xiHKnuBKxlCqFAFeU3Lg0WPfCSGzd8BUHgVZTgBfHJaVkDGb uKAxh9UIMd78wnDCiGJYeW5L+cwGDGPQ5eK0+wLwuN7CwtNemMRsd8oc5QORZMYurBJm 9leTw6lWF9L/AabGM58SY1h/9/ZBEjraTKkusnsbQOaJxk/2bpanBfZEj9VXTenpRY0d yA2d6qKvz29DXGxoFpnyOvHcY3fEnIJ0V9i5NZbQZs+N24tVCcpHXHw00qSzHPrxHokk NCeZ4JjzZ7Ge7lEh9Acc750HOHAocA+pwY6dwIzgNu4cQM+/ViBz2dPntIezny4oMAtI MSkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IumawfjG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t63si6889910pgd.159.2021.10.17.20.45.37; Sun, 17 Oct 2021 20:45:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IumawfjG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241274AbhJQPp2 (ORCPT + 98 others); Sun, 17 Oct 2021 11:45:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:59512 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237507AbhJQPp2 (ORCPT ); Sun, 17 Oct 2021 11:45:28 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 231A660F9E; Sun, 17 Oct 2021 15:43:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1634485398; bh=7rtVxTf07dmNllT8+b9Wc8k/EraK1VLR7qga4vJgb+4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IumawfjGFQvmYJtYQi5xoKSPg6/hO3CbBANAd0ig7u1r4b732ITusRACK3UH2hLw/ du3nMIQ/X2DHgVpBchW9pripRmmsj/dtyEsVtUxec7BK/2JB/itXDpk5rKFJHoMyuz tXen++ellTcynMc+bv6rM1bviMBzMdARawOe7eYu8iEh3S38P00qs00Rb+hc6oH/Tf hc0LEyuMp6sm9ee6wdvEVVfSz9KUhAmfTxF9lwSJnkt0TNVL3/wXcT5ha1mWT+Vfw0 SuPLtwsJKlJXN69ZfCXav5b7ySPsDrUk8nJgf6cSr0amcfvq2bQnAKaqtpzyxZeyFm IqO4nLP023EfQ== Date: Sun, 17 Oct 2021 23:42:55 +0800 From: Gao Xiang To: Chao Yu Cc: Gao Xiang , linux-erofs@lists.ozlabs.org, LKML , Yue Hu , Gao Xiang Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy Message-ID: <20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1> Mail-Followup-To: Chao Yu , Gao Xiang , linux-erofs@lists.ozlabs.org, LKML , Yue Hu , Gao Xiang References: <20211008200839.24541-1-xiang@kernel.org> <20211008200839.24541-4-xiang@kernel.org> <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote: > On 2021/10/9 4:08, Gao Xiang wrote: > > From: Gao Xiang > > > > Previously, the readahead window was strictly followed by EROFS > > decompression strategy in order to minimize extra memory footprint. > > However, it could become inefficient if just reading the partial > > requested data for much big LZ4 pclusters and the upcoming LZMA > > implementation. > > > > Let's try to request the leading data in a pcluster without > > triggering memory reclaiming instead for the LZ4 approach first > > to boost up 100% randread of large big pclusters, and it has no real > > impact on low memory scenarios. > > > > It also introduces a way to expand read lengths in order to decompress > > the whole pcluster, which is useful for LZMA since the algorithm > > itself is relatively slow and causes CPU bound, but LZ4 is not. > > > > Signed-off-by: Gao Xiang > > --- > > fs/erofs/internal.h | 13 ++++++ > > fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++--------- > > 2 files changed, 93 insertions(+), 19 deletions(-) > > > > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h > > index 48bfc6eb2b02..7f96265ccbdb 100644 > > --- a/fs/erofs/internal.h > > +++ b/fs/erofs/internal.h > > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value) > > EROFS_I_DATALAYOUT_BITS); > > } > > +/* > > + * Different from grab_cache_page_nowait(), reclaiming is never triggered > > + * when allocating new pages. > > + */ > > +static inline > > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping, > > + pgoff_t index) > > +{ > > + return pagecache_get_page(mapping, index, > > + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT, > > + readahead_gfp_mask(mapping) & ~__GFP_RECLAIM); > > +} > > + > > extern const struct super_operations erofs_sops; > > extern const struct address_space_operations erofs_raw_access_aops; > > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c > > index 5c34ef66677f..febb018e10a7 100644 > > --- a/fs/erofs/zdata.c > > +++ b/fs/erofs/zdata.c > > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb, > > z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool); > > } > > +/* > > + * Since partial uptodate is still unimplemented for now, we have to use > > + * approximate readmore strategies as a start. > > + */ > > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f, > > + struct readahead_control *rac, > > + erofs_off_t end, > > + struct list_head *pagepool, > > + bool backmost) > > +{ > > + struct inode *inode = f->inode; > > + struct erofs_map_blocks *map = &f->map; > > + erofs_off_t cur; > > + int err; > > + > > + if (backmost) { > > + map->m_la = end; > > + /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */ > > + err = z_erofs_map_blocks_iter(inode, map, 0); > > + if (err) > > + return; > > + > > + /* expend ra for the trailing edge if readahead */ > > + if (rac) { > > + loff_t newstart = readahead_pos(rac); > > + > > + cur = round_up(map->m_la + map->m_llen, PAGE_SIZE); > > + readahead_expand(rac, newstart, cur - newstart); > > + return; > > + } > > + end = round_up(end, PAGE_SIZE); > > + } else { > > + end = round_up(map->m_la, PAGE_SIZE); > > + > > + if (!map->m_llen) > > + return; > > + } > > + > > + cur = map->m_la + map->m_llen - 1; > > + while (cur >= end) { > > + pgoff_t index = cur >> PAGE_SHIFT; > > + struct page *page; > > + > > + page = erofs_grab_cache_page_nowait(inode->i_mapping, index); > > + if (!page) > > + goto skip; > > + > > + if (PageUptodate(page)) { > > + unlock_page(page); > > + put_page(page); > > + goto skip; > > + } > > + > > + err = z_erofs_do_read_page(f, page, pagepool); > > + if (err) > > + erofs_err(inode->i_sb, > > + "readmore error at page %lu @ nid %llu", > > + index, EROFS_I(inode)->nid); > > + put_page(page); > > +skip: > > + if (cur < PAGE_SIZE) > > + break; > > + cur = (index << PAGE_SHIFT) - 1; > > Looks a little bit weird to readahead backward, any special reason here? Due to the do_read_page implementation, since I'd like to avoid to get the exact full extent length (FIEMAP-likewise) inside do_read_page but only request the needed range, so it should be all in a backward way. Also the submission chain can be then in a forward way. If the question was asked why we should read backward, as I said in the commit message, big pclusters matter since we could read in more leading data at once. Thanks, Gao Xiang > > Thanks,