Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp551804pxb; Thu, 19 Nov 2020 07:59:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJw1E8d5iUv+ywBWfyo45kuPw03oogBPy/mlY883oxDawAf/r54Y8if0faCD/bU+tsxoGyrK X-Received: by 2002:a17:906:c826:: with SMTP id dd6mr11420668ejb.191.1605801570237; Thu, 19 Nov 2020 07:59:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605801570; cv=none; d=google.com; s=arc-20160816; b=q3FVBvS9xPbcMFqEwhACqXXgq+QJ+njNYEaywobIi2J10kJPW8uYxEZvJ3uNZdN1W1 sI9dceqLCb+Qp1QvXZmlTiTFih56C6OaBAFGtDQG215uEd0bPErpli/zYyYOjuQfd+J2 cbJkj3HQgT4VK3oda6nul395O/YA/hpvKr0Ja0GPBws5ER12fsesWzCfN7WdIYvb0aVS Vcb62634TZyJDLDjXYjDxqOlkvrG/n8pB9kNnvcUZLTeBDa6DF3xYEwd+5r/pWXBb4qM KAT0rvR7DtR9SA4H/7uGJcRY9DV1OOTHwZ91PYLKbCKHQO7BOiSEosMY0RA4Jg20nEUu zDlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=6Bq/ygzlyMFZ9WDZjh1Vuq4J8HeeA+2IlEGsPxgjFls=; b=WCIyHn+l5OoB4Bcb7CZAHxaUP/T41Pn8F6WvFLtaEVCJcFB6YYwz8XhntrxzA0cOJu W3bAQ7/ZYuV8+H2b7aslUIFC07luRdwmPM/B2gjMple/2QuGAER4qdNptw0Maq+RDvSD /DlHwHJ0u+fjDd8Bh7lnp29izhcKi21/M1hzWmvCovGYpQZkpVRR3oJTa3iMjeSM+jAS 1eLd/rxgZi3takCwM6FZK6RR4tED9lbO2CS4rMl2TDCLbgwDEK1uSYLWCSZj/njrnijn AdS79KCv8WMnIkhswXA3vAQ9+4Gj4BDNWYa3sZRyHb290euQlaSDWeoUnKLhU44ydKZQ gYew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v28si82285edl.257.2020.11.19.07.58.56; Thu, 19 Nov 2020 07:59:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728307AbgKSP6w (ORCPT + 99 others); Thu, 19 Nov 2020 10:58:52 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:45423 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728265AbgKSP6v (ORCPT ); Thu, 19 Nov 2020 10:58:51 -0500 Received: from callcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 0AJFwhDs024862 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Nov 2020 10:58:44 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 88C1E420107; Thu, 19 Nov 2020 10:58:43 -0500 (EST) Date: Thu, 19 Nov 2020 10:58:43 -0500 From: "Theodore Y. Ts'o" To: Saranya Muruganandam Cc: linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca Subject: Re: [RFC PATCH v3 00/61] Introduce parallel fsck to e2fsck pass1 Message-ID: <20201119155843.GB609857@mit.edu> References: <20201118153947.3394530-1-saranyamohan@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201118153947.3394530-1-saranyamohan@google.com> Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Nov 18, 2020 at 07:38:46AM -0800, Saranya Muruganandam wrote: > Currently it has been popular that single disk could be more than TiB, > etc 16Tib with only one single disk, with this trend, one single > filesystem could be larger and larger and easily reach PiB with LUN system. > > The journal filesystem like ext4 need be offline to do regular > check and repair from time to time, however the problem is e2fsck > still do this using single thread, this could be challenging at scale > for two reasons: > > 1) even with readahead, IO speed still limits several tens MiB per second. > 2) could not utilize CPU cores. > > It could be challenging to try multh-threads for all phase of e2fsck, but as > first step, we might try this for most time-consuming pass1, according to > our benchmarking it cost of 80% time for whole e2fck phase. > > Pass1 is trying to scanning all valid inode of filesystem and check it one by > one, and the patchset idea is trying to split these to different threads and > trying to do this at the same time, we try to merge these inodes and corresponding > inode's extent information after threads finish. > > To simplify complexity and make it less error-prone, the fix is still serialized, > since most of time there will be only minor errors for filesystem, what's important > for us is parallel reading and checking. > > Here is a benchmarking on our Lustre filesystem with 1.2 PiB OSD ext4 based > filesystem: > > DDN SFA18KE StorageServer > DCR(DeClustering RAID) with 162 x HGST 10TB NL-SAS > Tested Server > A Virtual Machine running on SFA18KE > 8 x CPU cores (Xeon(R) Gold 6140) > 150GB memory > CentoOS7.7 (Lustre patched kernel) This introductory patch presumably came from the original patch series; hence "our Lustre file system". Just to make it clearer, it's probably better to make it clear who did which benchmarks. And Saranya, you might want to include your benchmark results since it will be easier for people to replicate. > I've tested the whole patch series using 'make test' of e2fsck itself, and i > manually set default threads to 4 which still pass almost of test suite, > failure cases are below: > > f_h_badroot f_multithread f_multithread_logfile f_multithread_no f_multithread_ok > > h_h_badroot failed because out of order checking output, and others are because > of extra multiple threads log output. And this "I" is Saranya, yes? > Andreas Dilger (2): > e2fsck: fix f_multithread_ok test > e2fsck: misc cleanups for pfsck > > Li Xi (18): > e2fsck: add -m option for multithread > e2fsck: copy context when using multi-thread fsck > e2fsck: copy fs when using multi-thread fsck > e2fsck: add assert when copying context > e2fsck: copy bitmaps when copying context > e2fsck: open io-channel when copying fs > e2fsck: create logs for mult-threads > e2fsck: optionally configure one pfsck thread > e2fsck: add start/end group for thread > e2fsck: split groups to different threads > e2fsck: print thread log properly > e2fsck: do not change global variables > e2fsck: optimize the inserting of dir_info_db > e2fsck: merge dir_info after thread finishes > e2fsck: merge icounts after thread finishes > e2fsck: merge dblist after thread finishes > e2fsck: add debug codes for multiple threads > e2fsck: merge fs flags when threads finish The fact that all of these patches are prefixed with e2fsck: hides the fact that some of these changes include changes to libext2fs. It's probably better to separate out the changes to libext2fs so we can pay special attention to issues of presering the ABI. I'll talk more about this in the individual patches. - Ted