Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2193424ybc; Wed, 20 Nov 2019 10:14:41 -0800 (PST) X-Google-Smtp-Source: APXvYqwAC2MVcV/3A01Y9U+GsbtT8ubkxeuAVceMQLPsJ5J8Nfsmgc+GG1LO80mv1Wvs/H969HXC X-Received: by 2002:a2e:b163:: with SMTP id a3mr3986090ljm.72.1574273681606; Wed, 20 Nov 2019 10:14:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574273681; cv=none; d=google.com; s=arc-20160816; b=h+u4otW0k9Latjl22zsbsT3unhNdUCqv1laklPsCCwMuLePpTeEu8Fl+zzjF6it7ky +PjNkFRSajoce6ldAzrhg9v+wJmlRJr/38ANevFjC8Ng9ybXLXmegOQa4SdSvCAVSOVf R0rdWf/fenwVk8IXlds7YKQuBr4FNdHU5bhHtsQnpLBlXGWbu1TAkzQEcFnMdaPZILzf iNuw7PY+sgiVQRwa/4aoRET4U1u0wacjm4uOWGmgG2+0ncLCIEEicYQVV3hy/9XEO0WH feFnYP5QUy84ijoP0rfQU9rIW2sk/ssgdqfnDuY7W+p8UfE4WEDuKSS2rVJZu/x96Mjp bbzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=JS/vkcivUVoLH2TGx/qhx/Ng/nzUv+40PVskuODasQM=; b=CK+aRTFOFJ0fbcpDivqH+ek/MaQIj6zcFdkn4yeO/LNYogrVCMnGJuRc2en7fiZ0EB SdG0OZ6hI9b9nUNETf1d7D4Y0vF4ajmiNSiCf8RmvsosccFytCZ1Mo8VwuNMSBvUdfWC +4DzBIWh/k3KLzfyQFWlSB6jSRR756PvGU/b0XNnIaMJ8KNodl87mmEsV/9sfQhkNjW6 cDwi6XAB3kIvYqeEzLZtH8mtWu02nuPCFjempK2sVLVwQvT+UNqWIfonlHJKZHamF1rc WOeKbc9rrFJ/tqKqQcuS63Xyv1yNylXS+m5cmBNQ8Q8qAWKsXM7QV1jWA96Gz1zhaZT/ ikWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bt28si46811edb.270.2019.11.20.10.14.06; Wed, 20 Nov 2019 10:14:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726999AbfKTSOA (ORCPT + 99 others); Wed, 20 Nov 2019 13:14:00 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:42309 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727671AbfKTSOA (ORCPT ); Wed, 20 Nov 2019 13:14:00 -0500 Received: from callcc.thunk.org (guestnat-104-133-8-103.corp.google.com [104.133.8.103] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id xAKIDscE021502 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 20 Nov 2019 13:13:55 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id EAFBB4202FD; Wed, 20 Nov 2019 13:13:53 -0500 (EST) Date: Wed, 20 Nov 2019 13:13:53 -0500 From: "Theodore Y. Ts'o" To: Alex Zhuravlev Cc: "linux-ext4@vger.kernel.org" Subject: Re: [RFC] improve malloc for large filesystems Message-ID: <20191120181353.GG4262@mit.edu> References: <8738E8FF-820F-48A5-9150-7FF64219ED42@whamcloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8738E8FF-820F-48A5-9150-7FF64219ED42@whamcloud.com> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Alex, A couple of comments. First, please separate this patch so that these two separate pieces of functionality can be reviewed and tested separately: > 1) mballoc tries too hard to find the best chunk which is > counterproductive - it makes sense to limit this process > 2) during scanning the bitmaps are loaded one by one, synchronously > - it makes sense to prefetch few groups at once As far the prefetch is concerned, please note that the bitmap is first read into the buffer cache via read_block_bitmap_nowait(), but then it needs to be copied into buddy bitmap pages where it is cached along side the buddy bitmap. (The copy in the buddy bitmap is a combination of the on-disk block allocation bitmap plus any outstanding preallocations.) From that copy of block bitmap, we then generate the buddy bitmap and as a side effect, initialize the statistics (grp->bb_first_free, grp->bb_largest_free_order, grp->bb_counters[]). It is these statistics that we need to be able to make allocation decisions for a particular block group. So perhaps we should drive the readahead of the bitmaps from ext4_mb_init_group() / ext4_mb_init_cache(), and make sure that we actually initialize the ext4_group_info structure, and not just read the bitmap into buffer cache and hope it gets used before memory pressure pushes it out of the buddy cache. Andreas has suggested going even farther, and perhaps storing this derived information from the allocation bitmaps someplace convenient on disk. This is an on-disk format change, so we would want to think very carefully before going down that path. Especially since if we're going to go this far, perhaps we should consider using an on-disk b-tree to store the allocation information, which could be more efficient than using allocation bitmaps plus buddy bitmaps. Cheers, - Ted