Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1788439pxb; Fri, 22 Oct 2021 07:41:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxu1RyF9uKOVQQMG9NuhH57MvY1AA/2dXfNaXDw2KoEN8IK+lV3vR0ztOYJrjcVrpgQ3/v+ X-Received: by 2002:a17:90a:f0d6:: with SMTP id fa22mr352781pjb.53.1634913705639; Fri, 22 Oct 2021 07:41:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634913705; cv=none; d=google.com; s=arc-20160816; b=wUyXcpfTmH358LiYJUyIxioU+e3Z9uOgQHJgHP7hssmplpqD9QD8+n7+opBw8ie78D 41sWDu05oZdhYIDNNRDjfORneO3gX5AeVOt6/rNqJlmCNAAZY9rvdyn3Mfd+SLKxpDuo vxTJFXNkVNwd/kd2ZhqsJcgDVcmdQDX/5nyOCjku/ePf8UpYpj00eac/ibIaJIWgULVp pegL0cg89EgujALD/Yw8z5pmWk4hZ9s7iYUm0dwHtKdDnvmrB43iJt4Vr3vG0zR8ngLC YjD5ioLeiykuyVNymQ2lO/+PhzKFhC0I+WPpr1NflIt0troV6NiAEh/IXy2MLzZxW3qh c1Iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=S5VYijYo0LM0UNwR49AOTflq9JooV8n64R7vGAicLbM=; b=KkXtSkej66lAYdCm8nvju0hcDs7FMKi6HeEvVtXroWTn6pzo2sT+fc7wbbCiwk3yU0 olrxWthH4g8AmCUt8cqrYa/y995TyTMP8WSS/1NUxOEA9MDifahWheNjcInSRjryBIpE 67llG3l9xgM5eRQ9jUKhkkyD0mk/tGDwxPFtnNKy8ztG7nUsMplKP6yfjy21YGWaRSsk wi6Djq3lTWo/URdHCfzHaPBrGz3S7H74dNeH9EsCW6h0knQ1zJoFZkN+btiOnoaoaZEq W/8OyfF6Mg/dsGEl964A1iGXlDazR8fFLr9Es6AKhBwN6eXNBrITSTLe7MbHLJ9yi9kW +t2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CWNCwUmp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v22si155353pgb.417.2021.10.22.07.41.32; Fri, 22 Oct 2021 07:41:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CWNCwUmp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233156AbhJVOmw (ORCPT + 99 others); Fri, 22 Oct 2021 10:42:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:37637 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233013AbhJVOmr (ORCPT ); Fri, 22 Oct 2021 10:42:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1634913629; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S5VYijYo0LM0UNwR49AOTflq9JooV8n64R7vGAicLbM=; b=CWNCwUmpUo/G0C57s8fRFxLGzxgyEV0+W9/SrsdayyEqxcT5d5GgJjP55Rr1EjmPAQA6Ya s2GRS6Ec7VY3OPzP/NHx4Bx+0cQPM4OH9FWD/jtje4KCQkt9M7z67hcvYig2TI7SDOH0gF i/iTgg/kkgw6HeSad7/eBbGsAtBaxEc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-307--SPXjBkYNzi-razGzc9B3w-1; Fri, 22 Oct 2021 10:40:28 -0400 X-MC-Unique: -SPXjBkYNzi-razGzc9B3w-1 Received: by mail-wm1-f70.google.com with SMTP id s25-20020a7bc399000000b0030da0f36afeso1126501wmj.1 for ; Fri, 22 Oct 2021 07:40:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=S5VYijYo0LM0UNwR49AOTflq9JooV8n64R7vGAicLbM=; b=OkNMiAsTcjNX9nTHf84g4E673cEVfKAhmbRmZWIH0LRwvtT5s++7Pcllc+oe99H1iC jEc876P+IBiOQ/QQxNiK0K/wgb7BOgTp37tqpdFI6gX6d4G+rsyBQyJB1yPotqjOwtqs je+BQ9rtDUnN6BB3oQLrE4PZv5lzbzRJDR6I0Wh7JLgn/+hYoGh3GTPn3bjWh25c83z/ dtV8Vd4xnh3EmSTXdyc3nN9LsMUfDb0HiLG5GXBIwI4gIjpeGZyDen7ZFx548AhWqJ8l GVXtx/6z3Wm5cI1Zd0DUaSJK944BMQo6lgqhhqJMBgX0vaPXAi/ozHOu6KY5yw5sy+jK EE9A== X-Gm-Message-State: AOAM531bEjczzNSw3rI6toBZ8p6JxMr/B5fkjYgtInJYmZURprT69Aa4 tlpbVAxTM8Tl/3ti1gQ3npxjtsIn4CoDddK32NoGh6NPhijEBT/z3JRbPQJG9d5YJwE6N5hmZyQ 5DCosBoG0ykhkJzYCoIm5+098 X-Received: by 2002:a1c:c90f:: with SMTP id f15mr180595wmb.78.1634913626857; Fri, 22 Oct 2021 07:40:26 -0700 (PDT) X-Received: by 2002:a1c:c90f:: with SMTP id f15mr180556wmb.78.1634913626541; Fri, 22 Oct 2021 07:40:26 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6324.dip0.t-ipconnect.de. [91.12.99.36]) by smtp.gmail.com with ESMTPSA id b207sm200396wmd.3.2021.10.22.07.40.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Oct 2021 07:40:25 -0700 (PDT) Message-ID: Date: Fri, 22 Oct 2021 16:40:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Content-Language: en-US To: Matthew Wilcox Cc: Johannes Weiner , Kent Overstreet , "Kirill A. Shutemov" , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , David Howells , Hugh Dickins References: <20211018231627.kqrnalsi74bgpoxu@box.shutemov.name> <326b5796-6ef9-a08f-a671-4da4b04a2b4f@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: Folios for 5.15 request - Was: re: Folio discussion recap - In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22.10.21 15:01, Matthew Wilcox wrote: > On Fri, Oct 22, 2021 at 09:59:05AM +0200, David Hildenbrand wrote: >> something like this would roughly express what I've been mumbling about: >> >> anon_mem file_mem >> | | >> ------|------ >> lru_mem slab >> | | >> ------------- >> | >> page >> >> I wouldn't include folios in this picture, because IMHO folios as of now >> are actually what we want to be "lru_mem", just which a much clearer >> name+description (again, IMHO). > > I think folios are a superset of lru_mem. To enhance your drawing: > In the picture below we want "folio" to be the abstraction of "mappable into user space", after reading your link below and reading your graph, correct? Like calling it "user_mem" instead. Because any of these types would imply that we're looking at the head page (if it's a compound page). And we could (or even already have?) have other types that cannot be mapped to user space that are actually a compound page. > page > folio > lru_mem > anon_mem > ksm > file_mem > netpool > devmem > zonedev > slab > pgtable > buddy > zsmalloc > vmalloc > > I have a little list of memory types here: > https://kernelnewbies.org/MemoryTypes > > Let me know if anything is missing. hugetlbfs pages might deserve a dedicated type, right? > >> Going from file_mem -> page is easy, just casting pointers. >> Going from page -> file_mem requires going to the head page if it's a >> compound page. >> >> But we expect most interfaces to pass around a proper type (e.g., >> lru_mem) instead of a page, which avoids having to lookup the compund >> head page. And each function can express which type it actually wants to >> consume. The filmap API wants to consume file_mem, so it should use that. >> >> And IMHO, with something above in mind and not having a clue which >> additional layers we'll really need, or which additional leaves we want >> to have, we would start with the leaves (e.g., file_mem, anon_mem, slab) >> and work our way towards the root. Just like we already started with slab. > > That assumes that the "root" layers already handle compound pages > properly. For example, nothing in mm/page-writeback.c does; it assumes > everything is an order-0 page. So working in the opposite direction > makes sense because it tells us what has already been converted and is > thus safe to call. Right, as long as the lower layers receive a "struct page", they have to assume it's "anything" -- IOW a random base page. We need some temporary logic when transitioning from "typed" code into "struct page" code that doesn't talk compound pages yet, I agree. And I think the different types used actually would tell us what has been converted and what not. Whenever you have to go from type -> "struct page" we have to be very careful. > > And starting with file_mem makes the supposition that it's worth splitting > file_mem from anon_mem. I believe that's one or two steps further than > it's worth, but I can be convinced otherwise. For example, do we have > examples of file pages being passed to routines that expect anon pages? That would be a BUG, so I hope we don't have it ;) > Most routines that I've looked at expect to see both file & anon pages, Right, many of them do. Which tells me that they share a common type in many places. Let's consider LRU code static inline int folio_is_file_lru(struct folio *folio) { return !folio_swapbacked(folio); } I would say we don't really want to pass folios here. We actually want to pass something reasonable, like "lru_mem". But yes, it's just doing what "struct page" used to do via page_is_file_lru(). Let's consider folio_wait_writeback(struct folio *folio) Do we actually want to pass in a folio here? Would we actually want to pass in lru_mem here or even something else? > and treat them either identically or do slightly different things. > But those are just the functions I've looked at; your experience may be > quite different. I assume when it comes to LRU, writeback, ... the behavior is very similar or at least the current functions just decide internally what to do based on e.g., ..._is_file_lru(). I don't know if it's best to keep hiding that functionality within an abstracted type or just provide two separate functions for anon and file. folios mostly mimic what the old struct page used to do, introducing similar functions. Maybe the reason we branch off within these functions is because it just made sense when passing around "struct page" and not having something clearer at hand that let the caller do the branch. For the cases of LRU I looked at it somewhat makes sense to just do it internally. Looking at some core MM code, like mm/huge_memory.c, and seeing all the PageAnon() specializations, having a dedicated anon_mem type might be valuable. But at this point it's hard to tell if splitting up these functions would actually be desirable. We're knee-deep in the type discussion now and I appreciate it. I can understand that folio are currently really just a "not a tail page" concept and mimic a lot of what we already inherited from the old "struct page" world. -- Thanks, David / dhildenb