Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp4527087pxb; Mon, 27 Sep 2021 20:23:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyoNjNatwU+ffHpd8J8Rc5/yzRXjhZf/0BAwso1pTO2AWyyaw3rNIdu05ElRQXQxyeTeX4b X-Received: by 2002:a62:76c5:0:b0:44b:1f61:ccfa with SMTP id r188-20020a6276c5000000b0044b1f61ccfamr3246281pfc.31.1632799418834; Mon, 27 Sep 2021 20:23:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632799418; cv=none; d=google.com; s=arc-20160816; b=zjPJio5aXagqbLd2cp+/Y6heEkYoweODRkJLn8rPmxxWh0JYb3foXj3JfVooeqDg1/ tC1XLr3Xi0wmFeq8HF0UpoInCG1y61pfTJPOerc3PS/ZMK4SO34uL6hTuOj1EgimwQHu ibYFL02bV+Ls2TP7jwVl/NLkYUKb1pH0Fcy+TDeQgtA1fpHiO69+FZiesGAnRWKWr0hL UCCaACXQDWvf1qi6vtTdriCUZ8+otfKARqKcZw57Nicf1VMYKr6tdoI8rsncxIYRp0MS tkNvR3U9HvyXU5LXM2TihqIEsJkI7zNHTIieBcMYcCSdL0EyybWTkgNRNKvtsNa8Sm23 fXPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=sBFbi/If9ujFQsh6+OOsbzjxSKlHFm7peAgHiM3zc7w=; b=DL3kuIpQOBN9BBWM5f1Rsf/KMgndedpVO9bCnTtstEB6REUm6eYuX0dnPfkH2c2Myo QXWA1j1zdW9HZ1RW3pBiv/nAsgNLHhVYvxtWiS4xihoeoO5KoVX8dFazk7GAmqiR7VlG BbGIhavpNFcvFOC4778LkP8wwJz3F29nUKcbUPImRXf92R6fW80ogMZ7j1ihMJJHpHWe iCWf6bvj4HrKc+ZoZbBl9IrptX8ym1U9ZeU3Acu15I1p29sFdipxkp1SPO7Lf/yJ7Opf QP8cCDptBUFUvqP9380eH7IxVNwiHXyA8A8GoUgoU2V0LF75RPWtHPOPfYVYJP8sIis5 WOAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=kaVUFoCm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b2si25168309plk.366.2021.09.27.20.23.24; Mon, 27 Sep 2021 20:23:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=kaVUFoCm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238837AbhI1DWA (ORCPT + 99 others); Mon, 27 Sep 2021 23:22:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238825AbhI1DV6 (ORCPT ); Mon, 27 Sep 2021 23:21:58 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07A44C061575; Mon, 27 Sep 2021 20:20:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=sBFbi/If9ujFQsh6+OOsbzjxSKlHFm7peAgHiM3zc7w=; b=kaVUFoCmBwAtZgeh69BoiqQo3O PCwSDMDW8d3nJRX6Obf3ZQWznzLKHuHuC3yMQMQb2AnmPfhHuIONoS3ELCJWHYd8PiFVg3Q3xqh6e ew3mdA9lp3PqkT/mXOAlZmV4H7mYdcLU8POaRiBXZ2dViZRGJTCtUkwDk1dcS4QrapzgMopn8m6nd 49lMpilWetA4XJDSy5XwZpXm7by457RLwurUAZBp2kPlzVJgQ0t4Geq9Kt3hf106/oHLV4EujDquv pKUtpPLdJ+1FVP3634WN/btfQIYRxmBw/DWsVNVKN7FaCSRmmG7SUDGMz1DtGvibAGfJPvpXL64UP StZf9C6g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mV3eH-00ARfz-Fb; Tue, 28 Sep 2021 03:19:26 +0000 Date: Tue, 28 Sep 2021 04:19:17 +0100 From: Matthew Wilcox To: Kent Overstreet Cc: Vlastimil Babka , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Linus Torvalds , Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , David Howells Subject: Re: Struct page proposal Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 27, 2021 at 02:16:53PM -0400, Kent Overstreet wrote: > On Mon, Sep 27, 2021 at 07:12:19PM +0100, Matthew Wilcox wrote: > > On Mon, Sep 27, 2021 at 02:09:49PM -0400, Kent Overstreet wrote: > > > On Mon, Sep 27, 2021 at 07:05:26PM +0100, Matthew Wilcox wrote: > > > > On Mon, Sep 27, 2021 at 07:48:15PM +0200, Vlastimil Babka wrote: > > > > > On 9/23/21 03:21, Kent Overstreet wrote: > > > > > > So if we have this: > > > > > > > > > > > > struct page { > > > > > > unsigned long allocator; > > > > > > unsigned long allocatee; > > > > > > }; > > > > > > > > > > > > The allocator field would be used for either a pointer to slab/slub's state, if > > > > > > it's a slab page, or if it's a buddy allocator page it'd encode the order of the > > > > > > allocation - like compound order today, and probably whether or not the > > > > > > (compound group of) pages is free. > > > > > > > > > > The "free page in buddy allocator" case will be interesting to implement. > > > > > What the buddy allocator uses today is: > > > > > > > > > > - PageBuddy - determine if page is free; a page_type (part of mapcount > > > > > field) today, could be a bit in "allocator" field that would have to be 0 in > > > > > all other "page is allocated" contexts. > > > > > - nid/zid - to prevent merging accross node/zone boundaries, now part of > > > > > page flags > > > > > - buddy order > > > > > - a list_head (reusing the "lru") to hold the struct page on the appropriate > > > > > free list, which has to be double-linked so page can be taken from the > > > > > middle of the list instantly > > > > > > > > > > Won't be easy to cram all that into two unsigned long's, or even a single > > > > > one. We should avoid storing anything in the free page itself. Allocating > > > > > some external structures to track free pages is going to have funny > > > > > bootstrap problems. Probably a major redesign would be needed... > > > > > > > > Wait, why do we want to avoid using the memory that we're allocating? > > > > > > The issue is where to stick the state for free pages. If that doesn't fit in two > > > ulongs, then we'd need a separate allocation, which means slab needs to be up > > > and running before free pages are initialized. > > > > But the thing we're allocating is at least PAGE_SIZE bytes in size. > > Why is "We should avoid storing anything in the free page itself" true? > > Good point! > > Highmem and dax do complicate things though - would they make it too much of a > hassle? You want to get rid of struct page for dax (what's the right term for > that kind of memory?), but we're not there yet, right? DAX is used for persistent memory, often abbreviated to pmem. "Getting there" involves rooting out struct page from all kinds of data structures. sg lists are the obvious place to start so that we can do I/O to memory that's not backed by a struct page. Get that working and the rent-a-VM companies will love you. Right now, we either pay the 1.6% tax twice (once for the struct pages in the host, and once in the guest), or we have horrendous hacks to create struct pages on the fly so the host can do I/O to the guest's memory.