Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp571142imw; Fri, 15 Jul 2022 08:44:33 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vLkm8Aaisjr0UEraCCQY0pn4wwLPpZkNIn4yHYFK1NIvG9c4tDQmI5aGGN1UZjvRSPWcdJ X-Received: by 2002:a17:907:3da0:b0:72b:47df:c1d7 with SMTP id he32-20020a1709073da000b0072b47dfc1d7mr14127109ejc.214.1657899873561; Fri, 15 Jul 2022 08:44:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657899873; cv=none; d=google.com; s=arc-20160816; b=StcyzIa6B138FFAgokhPoXsvfpTtuboLpiaXWKZBNlEbga7VA8Avwk2t/w41NwfKmk N5YttS1mnwzViY29vQmloNz2lRotj7qm+GUWShJEVJhcb570C5r1xKzXAsq8Q8EDqDMT t4lRhdkMqaM1Pc2CzGotQLJpp5V0xOilSwtNW8klNAhaCNEiPcPPJPefc+kq4fHZGliJ VNvXJFMdxeYxGgQRcGHOTaVW6LUn9b7fowxrPfkxGW3EuD2PqiWJllyo6iBXbwWQXk11 EoG/ifSCM2KDNVVAZMCo3sm4bthVggTdkwoH31iUxQShp1ZNbFHY7w7bb+VuTEvmmnqf j3bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=wVnV1PPU/32xmZ26SawR3xBFw7zhJVkxOdlF5h/qgCk=; b=O/8II9pfNG3f1UMkWnPVNaDjfS6ymmNN4bLIt7SOsQ/LQ7mZse2TYpS5DG5r9/ttUP RUAZyF/xYgOt6XaG4Xx9SZT5n9i0YSDYi86nWatVoJk58UoF6M/Yq8sfIfbIzturuW85 JkCSk/EeDUyuS+T9y1P716VlfrQaPFMQKUQ098d9V90uDaYILRBvtyWIAvKEsYBa4b+z C7Kh+ppKIdNnqEtVk89EMna0pC0wN68SA56yNr4LjRSgky4XEexx5B2AMS+NSPrsT8Fo Oi97VAyl4VD5gn5pFmjUHChA2AUbUiv6o1c7coLQSoWZ1TlIlOcreqt++kzi+zHMLoce qNsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f20-20020a50fc94000000b0043a6d3e827bsi5651939edq.119.2022.07.15.08.43.46; Fri, 15 Jul 2022 08:44:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234588AbiGOPm0 (ORCPT + 99 others); Fri, 15 Jul 2022 11:42:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231782AbiGOPmT (ORCPT ); Fri, 15 Jul 2022 11:42:19 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52E99528A7 for ; Fri, 15 Jul 2022 08:42:16 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VJPnC-h_1657899724; Received: from e18g06460.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VJPnC-h_1657899724) by smtp.aliyun-inc.com; Fri, 15 Jul 2022 23:42:11 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org, Chao Yu Cc: LKML , Gao Xiang Subject: [PATCH v2 00/16] erofs: prepare for folios, deduplication and kill PG_error Date: Fri, 15 Jul 2022 23:41:47 +0800 Message-Id: <20220715154203.48093-1-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.24.4 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi folks, I've been doing this for almost 2 months, the main point of this is to support large folios and rolling hash deduplication for compressed data. This patchset is as a start of this work targeting for the next 5.20, it introduces a flexable range representation for (de)compressed buffers instead of too relying on page(s) directly themselves, so large folios can laterly base on this work. Also, this patchset gets rid of all PG_error flags in the decompression code. It's a cleanup as a result as well. In addition, this patchset kicks off rolling hash deduplication for compressed data by introducing fully-referenced multi-reference pclusters first instead of reporting fs corruption if one pcluster is introduced by several differnt extents. The full implementation is expected to be finished in the merge window after the next. One of my colleagues is actively working on the userspace part of this feature. However, it's still easy to verify fully-referenced multi-reference pcluster by constructing some image by hand (see attachment): Dataset: 300M seq-read (data-deduplicated, read_ahead_kb 8192): 1095MiB/s seq-read (data-deduplicated, read_ahead_kb 4096): 771MiB/s seq-read (data-deduplicated, read_ahead_kb 512): 577MiB/s seq-read (vanilla, read_ahead_kb 8192): 364MiB/s Finally, this patchset survives ro-fsstress on my side. Thanks, Gao Xiang Changes since v1: - rename left pagevec words to bvpage (Yue Hu); Gao Xiang (16): erofs: get rid of unneeded `inode', `map' and `sb' erofs: clean up z_erofs_collector_begin() erofs: introduce `z_erofs_parse_out_bvecs()' erofs: introduce bufvec to store decompressed buffers erofs: drop the old pagevec approach erofs: introduce `z_erofs_parse_in_bvecs' erofs: switch compressed_pages[] to bufvec erofs: rework online page handling erofs: get rid of `enum z_erofs_page_type' erofs: clean up `enum z_erofs_collectmode' erofs: get rid of `z_pagemap_global' erofs: introduce struct z_erofs_decompress_backend erofs: try to leave (de)compressed_pages on stack if possible erofs: introduce z_erofs_do_decompressed_bvec() erofs: record the longest decompressed size in this round erofs: introduce multi-reference pclusters (fully-referenced) fs/erofs/compress.h | 2 +- fs/erofs/decompressor.c | 2 +- fs/erofs/zdata.c | 785 +++++++++++++++++++++++----------------- fs/erofs/zdata.h | 119 +++--- fs/erofs/zpvec.h | 159 -------- 5 files changed, 496 insertions(+), 571 deletions(-) delete mode 100644 fs/erofs/zpvec.h -- 2.24.4