Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp548892imw; Thu, 14 Jul 2022 06:30:52 -0700 (PDT) X-Google-Smtp-Source: AGRyM1us6YbZ+psNDfkFUPITFxZPbcRMgcuc4gP6+yreEYYyl0fBsODbrHaiHGaaUY7Z/2PBg7Ui X-Received: by 2002:a17:902:ce8a:b0:16c:1c14:d2ea with SMTP id f10-20020a170902ce8a00b0016c1c14d2eamr8412933plg.85.1657805452071; Thu, 14 Jul 2022 06:30:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657805452; cv=none; d=google.com; s=arc-20160816; b=mvuZY/lzTQAuPB76tweC0pYPodYAiWKPaIQq0AHcVHuCU1X5QwmGaArSCdlScnoont /cQgJPMNWKHAf8mKKh2L8ndmImdSaEWVEWGiCPgWz+2JNemwZtv+I45u+BmHMT/DPrsX tyunfHP9G8O9UVgfTF6qATUS9dN/z/rk35e6Wa/8IzBBpt80PM1GOTLXUQsljzlDm+fQ ftO/qEQBMBXZ22Z0eEtzrCe7MlD6ddu9550xorrpDqRQ96mNx6ugqip1Ms2TyRU6fAHm fyxtm3qkylOQQL3H5g0hLH4a2bs4NftcJQw/5zEsKqh03KJqp7Bb40DvPAx7lG0uK+Th mkcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=/fNQt/mLK+wafmqU+ugmeF8MdAM6pXkg+3K3WmTnhsU=; b=v3zXEdvZteNITvSWW9vn6lEr/MkT4k84uAK6a99JMDFc7/ZQiLb5sD3ObVKl2wdhSG TR5xafGhfRBloOZQ8SyKe1aEiGfliZfgbqq+YXBrSEwEOrc71hhMkgmwkqHNQu7ETHxP RrmxdtZ77dB1nViQFy/3Biatx4NfxMMJ4jjkk5C11I0hEntSFOUToSy5W1aTmKV+kjB0 /VSFfN9OMfAqwfk4Fp1Sz7x1S0hbNNf0JmMWOU/klmOlzAicPYasJFqPUfeCQnMeqIOd nSjxEiodMfc/hWP45SnM4XeJxBI4vKOfFysZmI4Go76VYeqcsNF1jR+uMKx/EuS8y26V wnog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k15-20020aa79d0f000000b00522c9732f7dsi2298816pfp.322.2022.07.14.06.30.27; Thu, 14 Jul 2022 06:30:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239873AbiGNNWL (ORCPT + 99 others); Thu, 14 Jul 2022 09:22:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239777AbiGNNV1 (ORCPT ); Thu, 14 Jul 2022 09:21:27 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B43B95E330 for ; Thu, 14 Jul 2022 06:21:25 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VJJkPSH_1657804881; Received: from e18g06460.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VJJkPSH_1657804881) by smtp.aliyun-inc.com; Thu, 14 Jul 2022 21:21:23 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org, Chao Yu Cc: LKML , Gao Xiang Subject: [PATCH 16/16] erofs: introduce multi-reference pclusters (fully-referenced) Date: Thu, 14 Jul 2022 21:20:51 +0800 Message-Id: <20220714132051.46012-17-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.24.4 In-Reply-To: <20220714132051.46012-1-hsiangkao@linux.alibaba.com> References: <20220714132051.46012-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's introduce multi-reference pclusters at runtime. In details, if one pcluster is requested by multiple extents at almost the same time (even belong to different files), the longest extent will be decompressed as representative and the other extents are actually copied from the longest one. After this patch, fully-referenced extents can be correctly handled and the full decoding check needs to be bypassed for partial-referenced extents. Signed-off-by: Gao Xiang --- fs/erofs/compress.h | 2 +- fs/erofs/decompressor.c | 2 +- fs/erofs/zdata.c | 120 +++++++++++++++++++++++++++------------- fs/erofs/zdata.h | 3 + 4 files changed, 88 insertions(+), 39 deletions(-) diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h index 19e6c56a9f47..26fa170090b8 100644 --- a/fs/erofs/compress.h +++ b/fs/erofs/compress.h @@ -17,7 +17,7 @@ struct z_erofs_decompress_req { /* indicate the algorithm will be used for decompression */ unsigned int alg; - bool inplace_io, partial_decoding; + bool inplace_io, partial_decoding, fillgaps; }; struct z_erofs_decompressor { diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 6dca1900c733..91b9bff10198 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -83,7 +83,7 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx, j = 0; /* 'valid' bounced can only be tested after a complete round */ - if (test_bit(j, bounced)) { + if (!rq->fillgaps && test_bit(j, bounced)) { DBG_BUGON(i < lz4_max_distance_pages); DBG_BUGON(top >= lz4_max_distance_pages); availables[top++] = rq->out[i - lz4_max_distance_pages]; diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 8dcfc2a9704e..601cfcb07c50 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -467,7 +467,8 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f) * type 2, link to the end of an existing open chain, be careful * that its submission is controlled by the original attached chain. */ - if (cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL, + if (*owned_head != &pcl->next && pcl != f->tailpcl && + cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL, *owned_head) == Z_EROFS_PCLUSTER_TAIL) { *owned_head = Z_EROFS_PCLUSTER_TAIL; f->mode = Z_EROFS_PCLUSTER_HOOKED; @@ -480,20 +481,8 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f) static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe) { - struct erofs_map_blocks *map = &fe->map; struct z_erofs_pcluster *pcl = fe->pcl; - /* to avoid unexpected loop formed by corrupted images */ - if (fe->owned_head == &pcl->next || pcl == fe->tailpcl) { - DBG_BUGON(1); - return -EFSCORRUPTED; - } - - if (pcl->pageofs_out != (map->m_la & ~PAGE_MASK)) { - DBG_BUGON(1); - return -EFSCORRUPTED; - } - mutex_lock(&pcl->lock); /* used to check tail merging loop due to corrupted images */ if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL) @@ -785,6 +774,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, z_erofs_onlinepage_split(page); /* bump up the number of spiltted parts of a page */ ++spiltted; + fe->pcl->multibases = + (fe->pcl->pageofs_out != (map->m_la & ~PAGE_MASK)); if (fe->pcl->length < offset + end - map->m_la) { fe->pcl->length = offset + end - map->m_la; @@ -842,36 +833,90 @@ struct z_erofs_decompress_backend { /* pages to keep the compressed data */ struct page **compressed_pages; + struct list_head decompressed_secondary_bvecs; struct page **pagepool; unsigned int onstack_used, nr_pages; }; -static int z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be, - struct z_erofs_bvec *bvec) +struct z_erofs_bvec_item { + struct z_erofs_bvec bvec; + struct list_head list; +}; + +static void z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be, + struct z_erofs_bvec *bvec) { - unsigned int pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT; - struct page *oldpage; + struct z_erofs_bvec_item *item; - DBG_BUGON(pgnr >= be->nr_pages); - oldpage = be->decompressed_pages[pgnr]; - be->decompressed_pages[pgnr] = bvec->page; + if (!((bvec->offset + be->pcl->pageofs_out) & ~PAGE_MASK)) { + unsigned int pgnr; + struct page *oldpage; - /* error out if one pcluster is refenenced multiple times. */ - if (oldpage) { - DBG_BUGON(1); - z_erofs_page_mark_eio(oldpage); - z_erofs_onlinepage_endio(oldpage); - return -EFSCORRUPTED; + pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT; + DBG_BUGON(pgnr >= be->nr_pages); + oldpage = be->decompressed_pages[pgnr]; + be->decompressed_pages[pgnr] = bvec->page; + + if (!oldpage) + return; + } + + /* (cold path) one pcluster is requested multiple times */ + item = kmalloc(sizeof(*item), GFP_KERNEL | __GFP_NOFAIL); + item->bvec = *bvec; + list_add(&item->list, &be->decompressed_secondary_bvecs); +} + +static void z_erofs_fill_other_copies(struct z_erofs_decompress_backend *be, + int err) +{ + unsigned int off0 = be->pcl->pageofs_out; + struct list_head *p, *n; + + list_for_each_safe(p, n, &be->decompressed_secondary_bvecs) { + struct z_erofs_bvec_item *bvi; + unsigned int end, cur; + void *dst, *src; + + bvi = container_of(p, struct z_erofs_bvec_item, list); + cur = bvi->bvec.offset < 0 ? -bvi->bvec.offset : 0; + end = min_t(unsigned int, be->pcl->length - bvi->bvec.offset, + bvi->bvec.end); + dst = kmap_local_page(bvi->bvec.page); + while (cur < end) { + unsigned int pgnr, scur, len; + + pgnr = (bvi->bvec.offset + cur + off0) >> PAGE_SHIFT; + DBG_BUGON(pgnr >= be->nr_pages); + + scur = bvi->bvec.offset + cur - + ((pgnr << PAGE_SHIFT) - off0); + len = min_t(unsigned int, end - cur, PAGE_SIZE - scur); + if (!be->decompressed_pages[pgnr]) { + err = -EFSCORRUPTED; + cur += len; + continue; + } + src = kmap_local_page(be->decompressed_pages[pgnr]); + memcpy(dst + cur, src + scur, len); + kunmap_local(src); + cur += len; + } + kunmap_local(dst); + if (err) + z_erofs_page_mark_eio(bvi->bvec.page); + z_erofs_onlinepage_endio(bvi->bvec.page); + list_del(p); + kfree(bvi); } - return 0; } -static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be) +static void z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be) { struct z_erofs_pcluster *pcl = be->pcl; struct z_erofs_bvec_iter biter; struct page *old_bvpage; - int i, err = 0; + int i; z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0); for (i = 0; i < pcl->vcnt; ++i) { @@ -883,13 +928,12 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be) z_erofs_put_shortlivedpage(be->pagepool, old_bvpage); DBG_BUGON(z_erofs_page_is_invalidated(bvec.page)); - err = z_erofs_do_decompressed_bvec(be, &bvec); + z_erofs_do_decompressed_bvec(be, &bvec); } old_bvpage = z_erofs_bvec_iter_end(&biter); if (old_bvpage) z_erofs_put_shortlivedpage(be->pagepool, old_bvpage); - return err; } static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be, @@ -924,7 +968,7 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be, err = -EIO; continue; } - err = z_erofs_do_decompressed_bvec(be, bvec); + z_erofs_do_decompressed_bvec(be, bvec); *overlapped = true; } } @@ -940,7 +984,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be, struct erofs_sb_info *const sbi = EROFS_SB(be->sb); struct z_erofs_pcluster *pcl = be->pcl; unsigned int pclusterpages = z_erofs_pclusterpages(pcl); - unsigned int i, inputsize, err2; + unsigned int i, inputsize; struct page *page; bool overlapped; @@ -970,10 +1014,8 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be, kvcalloc(pclusterpages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL); - err = z_erofs_parse_out_bvecs(be); - err2 = z_erofs_parse_in_bvecs(be, &overlapped); - if (err2) - err = err2; + z_erofs_parse_out_bvecs(be); + err = z_erofs_parse_in_bvecs(be, &overlapped); if (err) goto out; @@ -993,6 +1035,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be, .alg = pcl->algorithmformat, .inplace_io = overlapped, .partial_decoding = pcl->partial, + .fillgaps = pcl->multibases, }, be->pagepool); out: @@ -1016,6 +1059,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be, if (be->compressed_pages < be->onstack_pages || be->compressed_pages >= be->onstack_pages + Z_EROFS_ONSTACK_PAGES) kvfree(be->compressed_pages); + z_erofs_fill_other_copies(be, err); for (i = 0; i < be->nr_pages; ++i) { page = be->decompressed_pages[i]; @@ -1052,6 +1096,8 @@ static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io, struct z_erofs_decompress_backend be = { .sb = io->sb, .pagepool = pagepool, + .decompressed_secondary_bvecs = + LIST_HEAD_INIT(be.decompressed_secondary_bvecs), }; z_erofs_next_pcluster_t owned = io->head; diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h index a7fd44d21d9e..515fa2b28b97 100644 --- a/fs/erofs/zdata.h +++ b/fs/erofs/zdata.h @@ -84,6 +84,9 @@ struct z_erofs_pcluster { /* L: whether partial decompression or not */ bool partial; + /* L: indicate several pageofs_outs or not */ + bool multibases; + /* A: compressed bvecs (can be cached or inplaced pages) */ struct z_erofs_bvec compressed_bvecs[]; }; -- 2.24.4