Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3772767pxf; Mon, 29 Mar 2021 10:58:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyI24pc/nHlvH+Q4njDgnhMyVOOTB1yOC4ft6SsR3XcZMGOMpuaenl1lX92Tjp7HjeY1+RT X-Received: by 2002:a17:906:a413:: with SMTP id l19mr29921848ejz.421.1617040728801; Mon, 29 Mar 2021 10:58:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617040728; cv=none; d=google.com; s=arc-20160816; b=kSNWjCgmEaFj8J7AcSu8X3R8U/wBQHDxSNDfWAIY4iwOcWnxztcULDNp/7LszFvYUJ me9L465fp5QqTz/zQt3KVSM0dgE/H+yPbB8Y3VMYANh+ZPzW+20RrUOD3HX/ZYrYeGi1 AqIHfw6U6J4P9Up1tlose3nFMKh1YTPZjy5wSz6lwTzBvuHfECMrTHkeRwSX3WT4J3kq IR27F+QDhwg29aBIVfqarABnebqKKUIyjvBLJIsFyPFFFcc5mKfNwfZrATCIjGY6DjD4 UPDSvEvWDjr5AtradW/0RPCm/Vv3ADYtRs7VFR+oMJStO4FrfxeC0P8rdQma0PDp59rQ kVSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Cxw+JCcYcswc0qxSo1ev5vTzWZI927ejNHC9F/PjA+w=; b=d2mWpxpbo3s5AgIK9eEk5f2njXly4BHRiIhaXrWwd/qLU8pjCOdfFQZkI/oDjXWPFH 5U4kQGoSQReDHEIgt8H6uJtc9eCFlcSgTqmqWtAH9/2FjrXM3JZmeTNy2HUSnWrTqubV SwWoihxGYHaW56Z6fmnUH/RXluk+nzHbZMVJ8xPfNy3YhfreaNbe4VIYTbubDOvjIMN2 KFCz+hakQ7LdV6/J9FFlwB22tf5zEXvxD6Xj63/RYVUXrTe89oXFuAHbGUfXSQn61mIb WSNcuGp2UDFOS1kbDqx7V60TraFGfmuUEqEMRG5FlCxwwijEJ+MnRtQn1q8aQaD0Km09 qG1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="IrghJ/yH"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l24si13297211eds.501.2021.03.29.10.58.24; Mon, 29 Mar 2021 10:58:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="IrghJ/yH"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230449AbhC2R45 (ORCPT + 99 others); Mon, 29 Mar 2021 13:56:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230220AbhC2R4p (ORCPT ); Mon, 29 Mar 2021 13:56:45 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 085A3C061574; Mon, 29 Mar 2021 10:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Cxw+JCcYcswc0qxSo1ev5vTzWZI927ejNHC9F/PjA+w=; b=IrghJ/yHRWb00tupxZWeZEI7N2 vdPqRo1h4g8VyWVRkTF1LoibgO5L2icT4+NW6CQJJwx+gKCT1bUKQKRWBauWDgsoIhPh/YRuX5Lrk hqg11JeywdyFfKkKR3dvQ0kshQ5HRgenIDKOyQHH2ZZbfA1kLcRGBS3jdkbxmfR+oZ6NKR3aZZ6ue 2qJgvi4PrEg2u3HHVc5U6HoW8fk/ZCsnKiqC1TSze8+Lj31GWYUDZ4iV69OwdMUP2P0PHsfoGvVPv /QF+fBzLRmown/q19Tic/bs/Epie96f6YiQE6MAV4SrIarJCdLZ2vZGGd/jqprCL9m5ReaQpqPnrE KurVKA5Q==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lQw7k-001uZf-GA; Mon, 29 Mar 2021 17:56:27 +0000 Date: Mon, 29 Mar 2021 18:56:24 +0100 From: Matthew Wilcox To: Johannes Weiner Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org Subject: Re: [PATCH v5 00/27] Memory Folios Message-ID: <20210329175624.GI351017@casper.infradead.org> References: <20210320054104.1300774-1-willy@infradead.org> <20210322184744.GU1719932@casper.infradead.org> <20210324062421.GQ1719932@casper.infradead.org> <20210329165832.GG351017@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210329165832.GG351017@casper.infradead.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 29, 2021 at 05:58:32PM +0100, Matthew Wilcox wrote: > In broad strokes, I think that having a Power Of Two Allocator > with Descriptor (POTAD) is a useful foundational allocator to have. > The specific allocator that we call the buddy allocator is very clever for > the 1990s, but touches too many cachelines to be good with today's CPUs. > The generalisation of the buddy allocator to the POTAD lets us allocate > smaller quantities (eg a 512 byte block) and allocate descriptors which > differ in size from a struct page. For an extreme example, see xfs_buf > which is 360 bytes and is the descriptor for an allocation between 512 > and 65536 bytes. > > There are times when we need to get from the physical address to > the descriptor, eg memory-failure.c or get_user_pages(). This is the > equivalent of phys_to_page(), and it's going to have to be a lookup tree. > I think this is a role for the Maple Tree, but it's not ready yet. > I don't know if it'll be fast enough for this case. There's also the > need (particularly for memory-failure) to determine exactly what kind > of descriptor we're dealing with, and also its size. Even its owner, > so we can notify them of memory failure. A couple of things I forgot to mention ... I'd like the POTAD to be not necessarily tied to allocating memory. For example, I think it could be used to allocate swap space. eg the swap code could register the space in a swap file as allocatable through the POTAD, and then later ask the POTAD to allocate a POT from the swap space. The POTAD wouldn't need to be limited to MAX_ORDER. It should be perfectly capable of allocating 1TB if your machine has 1.5TB of RAM in it (... and things haven't got too fragmented) I think the POTAD can be used to replace the CMA. The CMA supports weirdo things like "Allocate 8MB of memory at a 1MB alignment", and I think that's doable within the data structures that I'm thinking about for the POTAD. It'd first try to allocate an 8MB chunk at 8MB alignment, and then if that's not possible, try to allocate two adjacent 4MB chunks; continuing down until it finds that there aren't 8x1MB chunks, at which point it can give up.