Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1719524pxf; Fri, 2 Apr 2021 20:54:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzQ3VpVk3nERMn+x6fM1Q9fp82uLdIcDUI/+mXafM8YKjvCOHZi36ANbRSA93tiTNPyX2j8 X-Received: by 2002:a5e:8e41:: with SMTP id r1mr13297725ioo.5.1617422097014; Fri, 02 Apr 2021 20:54:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617422097; cv=none; d=google.com; s=arc-20160816; b=bOZ/CLiSqa2ekT71tlhVwWn+RXPMVlqbE7u6kmzuXajgIhzXbe1iQ8wUUaULwXLw8r PxiFnZaEkVX15/8mpneOuMQPsWQ7YuMc0asTT34IqSsJWteLL7Yk723hUwcNo+VaFQmf DTxpqzhnzZxl7BRhS1uK+na535rtPDMTT192+hGq3BbFh4LMq8bHPrP8ylCOg4mh1dFb dm295t3gov/CQ9AdrU97HNeezscdBKuYt/q1URZ2YQYmKSemrRl82xIim3fArHFJyYX+ 5vCGsipv0CeIRSdM8qvcORFQj7c4/AS7VDu8PUHyRYuT+SgPYhRYjXI2aBziBZU45xgS CDtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=vnNV9bcW7F3SJYzxxGDN42d/pRzII7DTwmzGxRwIrck=; b=XQ56uw5W8AsoNOhdoaXkRmLOaXZqDg8klZbMTRsLAVRVPeBBYxvdS95718StXRoeZa k5w/EEwnvYp+n3Am/tm626JjroX2/31EYnr7/XT5xF76NFb6t8qOJ3EmnooUDl5KN7MM de5XdGCMYuVLRSpZoODjIrr5lrI12K88iHmQHSA0cQAvDCIexjXPJz/QOp2lnjXyrmEM 1CmEmlrBjJqYcM47swYfQW316bByOA2wsBNNqfx5abmWIAclwq7LVPIZSvRq8Vmqe8eU 9np4pdobLsJw3By82yk3taPw+2bBnm2usoQoHxz6ndEADMngJ/wQ/HUBRwiXp5fus5qa 67AQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="dm/C09gZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d2si8971129iod.10.2021.04.02.20.54.26; Fri, 02 Apr 2021 20:54:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="dm/C09gZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235850AbhDCDwM (ORCPT + 99 others); Fri, 2 Apr 2021 23:52:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:54582 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234488AbhDCDwM (ORCPT ); Fri, 2 Apr 2021 23:52:12 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3EF4D61177; Sat, 3 Apr 2021 03:52:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1617421930; bh=Gcbhjp2hTrSu3AA+Ysg1cwzwUzf7hrlEJ66hTxf6VcQ=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=dm/C09gZ05/a1nINUpO4C66J3RMOupdbfSmnHcjNbuvHkUUGDZvWrsHjksPNbMniU TDoeuCFCR/kTB25ES/nYitGEQkPsoath3YU8zrk/SZoMJqoTr18mTBcF0iCqXn3GDA SOc9LmfLvfAvA40okb0xrJv6HQJEXc7BCVXkbgP2pfV0wqV5XNxX3BCEFdaV+L3D1R G9kzRfvEKF6y9jMZ0U0EYNljaAh5Ue5CGV8z92q/nKpO68vuK8YZ1s3yAoxp8kzHP6 wmiXJTSs4ofF2dARheR4xHhRMdydwZn4njkUrlccUq1bNdCL9So0GYvCqMLe2r9bwL Dd9kKTKWlfwdw== Subject: Re: [PATCH v2 00/10] erofs: add big pcluster compression support To: Gao Xiang , linux-erofs@lists.ozlabs.org, Chao Yu Cc: LKML References: <20210401032954.20555-1-xiang@kernel.org> From: Chao Yu Message-ID: <18509211-374c-be19-bae7-f2ce852bfb15@kernel.org> Date: Sat, 3 Apr 2021 11:52:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: <20210401032954.20555-1-xiang@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/4/1 11:29, Gao Xiang wrote: > Hi folks, > > This is the formal version of EROFS big pcluster support, which means > EROFS can compress data into more than 1 fs block after this patchset. > > {l,p}cluster are EROFS-specific concepts, standing for `logical cluster' > and `physical cluster' correspondingly. Logical cluster is the basic unit > of compress indexes in file logical mapping, e.g. it can build compress > indexes in 2 blocks rather than 1 block (currently only 1 block lcluster > is supported). Physical cluster is a container of physical compressed > blocks which contains compressed data, the size of which is the multiple > of lclustersize. > > Different from previous thoughts, which had fixed-sized pclusterblks > recorded in the on-disk compress index header, our on-disk design allows > variable-sized pclusterblks now. The main reasons are > - user data varies in compression ratio locally, so fixed-sized > clustersize approach is space-wasting and causes extra read > amplificationfor high CR cases; > > - inplace decompression needs zero padding to guarantee its safe margin, > but we don't want to pad more than 1 fs block for big pcluster; > > - end users can now customize the pcluster size according to data type > since various pclustersize can exist in a file, for example, using > different pcluster size for executable code and one-shot data. such > design should be more flexible than many other public compression fses > (Btw, each file in EROFS can have maximum 2 algorithms at the same time > by using HEAD1/2, which will be formally added with LZMA support.) > > In brief, EROFS can now compress from variable-sized input to > variable-sized pcluster blocks, as illustrated below: > > |<-_lcluster_->|________________________|<-_lcluster_->| > |____._________|_________ .. ___________|_______.______| > . . > . . > .__________________________________. > |______________| .. |______________| > |<- pcluster ->| > > The next step would be how to record the compressed block count in > lclusters. In compress indexes, there are 2 concepts called HEAD and > NONHEAD lclusters. The difference is that HEAD lcluster starts a new > pcluster in the lcluster, but NONHEAD not. It's easy to understand > that big pclusters at least have 2 pclusters, thus at least 2 lclusters > as well. > > Therefore, let the delta0 (distance to its HEAD lcluster) of first NONHEAD > compress index store the compressed block count with a special flag as a > new called CBLKCNT compress index. It's also easy to know its delta0 is > constantly 1, as illustrated below: > ________________________________________________________ > |_HEAD_|_CBLKCNT_|_NONHEAD_|_..._|_NONHEAD_|_HEAD | HEAD | > |<------ a pcluster with CBLKCNT --------->|<-- -->| > ^ a pcluster with 1 > > If another HEAD follows a HEAD lcluster, there is no room to record > CBLKCNT, but it's easy to know the size of pcluster will be 1. > > More implementation details about this and compact indexes are in the > commit message. > > On the runtime performance side, the current EROFS test results are: > ________________________________________________________________ > | file system | size | seq read | rand read | rand9m read | > |_______________|___________|_ MiB/s __|__ MiB/s __|___ MiB/s ___| > |___erofs_4k____|_556879872_|_ 781.4 __|__ 55.3 ___|___ 25.3 ___| > |___erofs_16k___|_452509696_|_ 864.8 __|_ 123.2 ___|___ 20.8 ___| > |___erofs_32k___|_415223808_|_ 899.8 __|_ 105.8 _*_|___ 16.8 ____| > |___erofs_64k___|_393814016_|_ 906.6 __|__ 66.6 _*_|___ 11.8 ____| > |__squashfs_8k__|_556191744_|_ 64.9 __|__ 19.3 ___|____ 9.1 ____| > |__squashfs_16k_|_502661120_|_ 98.9 __|__ 38.0 ___|____ 9.8 ____| > |__squashfs_32k_|_458784768_|_ 115.4 __|__ 71.6 _*_|___ 10.0 ____| > |_squashfs_128k_|_398204928_|_ 257.2 __|_ 253.8 _*_|___ 10.9 ____| > |____ext4_4k____|____()_____|_ 786.6 __|__ 28.6 ___|___ 27.8 ____| > > > * Squashfs grabs more page cache to keep all decompressed data with > grab_cache_page_nowait() than the normal requested readahead (see > squashfs_copy_cache and squashfs_readpage_block). > In principle, EROFS can also cache such all decompressed data > if necessary, yet it's low priority for now and has little use > (rand9m is actually a better rand read workload, since the amount > of I/O is 9m rather than full-sized 1000m). > > More details are in > https://lore.kernel.org/r/20210329053654.GA3281654@xiangao.remote.csb > > Also it's easy to know EROFS is not a fixed pcluster design, so users > can make several optimized strategy according to data type when mkfs. > And there is still room to optimize runtime performance for big pcluster > even further. > > Finally, it passes ro_fsstress and can also successfully boot buildroot > & Android system with android-mainline repo. > > current mkfs repo for big pcluster: > https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental-bigpcluster-compact > > Thanks for your time on reading this! Nice job! Acked-by: Chao Yu Thanks, > > Thanks, > Gao Xiang > > changes since v1: > - add a missing vunmap in erofs_pcpubuf_exit(); > - refine comments and commit messages. > > (btw, I'll apply this patchset for -next first for further integration > test, which will be aimed to 5.13-rc1.) > > Gao Xiang (10): > erofs: reserve physical_clusterbits[] > erofs: introduce multipage per-CPU buffers > erofs: introduce physical cluster slab pools > erofs: fix up inplace I/O pointer for big pcluster > erofs: add big physical cluster definition > erofs: adjust per-CPU buffers according to max_pclusterblks > erofs: support parsing big pcluster compress indexes > erofs: support parsing big pcluster compact indexes > erofs: support decompress big pcluster for lz4 backend > erofs: enable big pcluster feature > > fs/erofs/Kconfig | 14 --- > fs/erofs/Makefile | 2 +- > fs/erofs/decompressor.c | 216 +++++++++++++++++++++++++--------------- > fs/erofs/erofs_fs.h | 31 ++++-- > fs/erofs/internal.h | 31 ++---- > fs/erofs/pcpubuf.c | 134 +++++++++++++++++++++++++ > fs/erofs/super.c | 1 + > fs/erofs/utils.c | 12 --- > fs/erofs/zdata.c | 193 ++++++++++++++++++++++------------- > fs/erofs/zdata.h | 14 +-- > fs/erofs/zmap.c | 155 ++++++++++++++++++++++------ > 11 files changed, 560 insertions(+), 243 deletions(-) > create mode 100644 fs/erofs/pcpubuf.c >