Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1258228rwl; Sat, 25 Mar 2023 22:02:38 -0700 (PDT) X-Google-Smtp-Source: AK7set+OfqYvAPMltMgiIGimsGFC8XAYLADECa98Xfhbf2MWBs09PXlsHRm/l288Ao0ubGzWS1Ou X-Received: by 2002:a05:6a20:29b:b0:d9:e6a9:d3e2 with SMTP id 27-20020a056a20029b00b000d9e6a9d3e2mr8196308pza.3.1679806958351; Sat, 25 Mar 2023 22:02:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679806958; cv=none; d=google.com; s=arc-20160816; b=KrCNieP6BAfbW/jnbfXFoNzCX78HpDP1U9tmg62PP5IxS0CMA22kNMv2ZDvIJ5f/hd mofFAdHlV2/baMc7Xr4P1RH/9hJdDCPFHW5+xakAIw3NEi/fbA8CneKE0kKzzWkERACJ x7q2zHF7LGCTD/CHw3ST7u3VVN9Rod/wCgpxvDoc3YxJk42VK4iGwOTMPpH0HQu9BjZ8 5b6GSdZj+jy2bg0qSj148Ru7kOvMpb5SPXqoXqNC8n071BkTZDuxh9VHSUWHRnmAlC3p NMgc92A4k9Yp+klrXyTWsQws5LMFieDBeN8yqix145MVh9InZcYKvuqFAFXhZvzpTKxC pbRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=WWo2bF3fYztcOcL8hQceNWz6QQ5YZudqVE2VOlWqgwc=; b=Xor1HQtDfGSFbZML50erPwN33Qc8YjgsUzmNjZu3jLP5jSqZf9KvQUSJ3RPEfcLuEM uLc40TbiJIawGXetTDtE4YXJFTu654bv5CCRB4hhfdWsNyJeGAVDbYHizj/1Zhq3VACt VvvLtWVNE4quRs6le4skMxmCTlgN7XriCHQ5IGgjTlFs5Gg5DNgjhk8mHsIIxQWxqE6q u8joyGFkdkNBBqNns7BuOoamfnLTKnwOEEL9KUeCEJfvnoF9LseSf6EXuE23lNEbwcFZ AWIfaVMLVNNDk9eFhFpSIZJBh09ATcTtHaPg7iJ+oL67OvEeG5qn4RJTlpHjSX8JPi+j oX/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=Kz9fEgd+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x8-20020a634a08000000b004fbbbc81456si24101203pga.293.2023.03.25.22.02.24; Sat, 25 Mar 2023 22:02:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=Kz9fEgd+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231297AbjCZDyY (ORCPT + 99 others); Sat, 25 Mar 2023 23:54:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231716AbjCZDyU (ORCPT ); Sat, 25 Mar 2023 23:54:20 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE6FB461 for ; Sat, 25 Mar 2023 20:54:19 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-120-46.bstnma.fios.verizon.net [173.48.120.46]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 32Q3s3vZ023021 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 25 Mar 2023 23:54:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1679802845; bh=WWo2bF3fYztcOcL8hQceNWz6QQ5YZudqVE2VOlWqgwc=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=Kz9fEgd+mA6LU16FY2vXdTzZPJrKnDCPwVvL25XwvPcV8SF3jETqwCP7xcPZNDoRH 3yTeH91J1u/Bj44ae5Ipp1ebiqBnsaHLDiyUVcmJmuzammXRKqryFxuSS6uYUUTgL8 5qAI81lL/4ypY8I/K31ORNEN0SEjgQDnT4r0wq/GHSlYODg3go86SJcDV7EakYovw9 uWjpKyIDxy8tW4OVwNxOeDlksdNVWoFgqMIx8BW2oS+kHmeLP0aXmCXJgXZTVwKOvw CCuFig14CMO+K+TmVYCGQnjeMc5n9k7LO3g1e9gRtTELPsONo5URXkPfbqZ254U0Bi MKmV8BV5QmU0w== Received: by cwcc.thunk.org (Postfix, from userid 15806) id F1EBD15C46FF; Sat, 25 Mar 2023 23:54:02 -0400 (EDT) Date: Sat, 25 Mar 2023 23:54:02 -0400 From: "Theodore Ts'o" To: Ojaswin Mujoo Cc: Jan Kara , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ritesh Harjani , Andreas Dilger Subject: Re: [RFC 08/11] ext4: Don't skip prefetching BLOCK_UNINIT groups Message-ID: <20230326035402.GA323408@mit.edu> References: <4881693a4f5ba1fed367310b27c793e4e78520d3.1674822311.git.ojaswin@linux.ibm.com> <20230309141422.b2nbl554ngna327k@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 17, 2023 at 04:25:04PM +0530, Ojaswin Mujoo wrote: > > > This improves the accuracy of CR0/1 allocation as earlier, we could have > > > essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy > > > not being initialized, leading to slower CR2 allocations. With this patch CR0/1 > > > will be able to discover these groups as well, thus improving performance. > > > > The patch looks good. I just somewhat wonder - this change may result in > > uninitialized groups being initialized and used earlier (previously we'd > > rather search in other already initialized groups) which may spread > > allocations more. But I suppose that's fine and uninit groups are not > > really a feature meant to limit fragmentation and as the filesystem ages > > the differences should be minimal. So feel free to add: > > Another point I wanted to discuss wrt this patch series was why were the > BLOCK_UNINIT groups not being prefetched earlier. One point I can think > of is that this might lead to memory pressure when we have too many > empty BGs in a very large (say terabytes) disk. Originally the prefetch logic was simply something to optimize I/O --- that is, normally, all of the block bitmaps for a flex_bg are contiguous, so why not just read them all in a single I/O which is issued all at once, instead of doing them as separate 4k reads. Skipping block groups that hadn't yet been prefetched was something which was added later, in order to improve performance of the allocator for freshly mounted file systems where the prefetch hadn't yet had a chance to pull in block bitmaps; the problem was that if the block groups hadn't been prefetch yet, then the cr0 scan would fetch them, and if you have a storage device where blocks with monotonically increasing LBA numbers aren't necessarily stored adjacently on disk (for example, on a dm-thin volume, but if one were to do an experiment on certain emulated block devices on certain hyperscalar cloud environments, one might find a similar performance profile), resulting in a cr0 scan potentially issuing a series of 16 sequential 4k I/O's, that could be substantially worse from a performance standpoint than doing a single squential 64k I/O. When this change was made, the focus was on *initialized* bitmaps taking a long time if they were issued as individual sequential 4k I/O's; the fix was to skip scanning them initially, since the hope was that the prefetch would pull them in fairly quickly, and a few bad allocations when the file system was freshly mounted was an acceptable tradeoff. But prefetching prefetching BLOCK_UNINIT groups makes sense, that should fix the problem that you've identified (at least for BLOCK_UNINIT groups; for initialized block bitmaps, we'll still have less optimal allocation patterns until we've managed to prefetch those block groups). Cheers, 0 Ted