Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp3069508rwl; Mon, 27 Mar 2023 08:51:26 -0700 (PDT) X-Google-Smtp-Source: AK7set95zCb71bdmHGCFs7jSUOHzikEmKQ7VzZ8co9n/auMkG2ygcb4OlmrNvJfC5aqDntW2YA2u X-Received: by 2002:a05:6a20:bf18:b0:dc:43e8:54b7 with SMTP id gc24-20020a056a20bf1800b000dc43e854b7mr9912304pzb.43.1679932285925; Mon, 27 Mar 2023 08:51:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679932285; cv=none; d=google.com; s=arc-20160816; b=GzWcElSdHBD3XiRiZ912Bl1ONh+mNDO7p6xl7dcZLT32o93CeMqjbHtLRr87Mxncac lr6bnpBUlWENVAVzy5ATUChd1unmHCVDcig6zm2SMUjA7TcFC88yFhyADEUHkUx6c2ib E7cKq0K1ygXgyO+x9/d1lhLiOYed6O2Qxn8Jm+cNHWhtdjfToCpmindpNA4GRMkiwPPI 8l8NfNaWnXzMv/arGU155YX8AnNpoQwTrsCeEp8Azw5iC1C1Qznf9kBlUqCNl6Ni1kMS ejqjPsAqyYrbtuYpt4kA07S1LmNK1B6sQdwHEtXXpsCAadZDhqzbWjdFo2we2fvy8kOw oXQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=GRhKwRTX6Br7lKsyTC0a3fYnqFY/Qaz1J4XbdIrXyxc=; b=ahMWWxBFna6qfattINCZSV/vH4Q+RZrIBOcCT3oqz2V0rF1wMKUop6BTLMdEOutRlz dihoOajngaLhUjit8by5vR0PkIqzLaDb82Yge34lMAEKm2zXqUEuk6alIaGt4PT+T/io NyZZ9lCDhh1FurxlBuw2USbMl5mmWXICXiNL9BpMQ0Nm1iLqwYK8Q13dQ0QWNWAlTW5u ihO4nFfj+P+7efwM43Yx+6OsVXp1IWvNifMrgae/cOrjXlE6Qm3siQR+RPQWt5sCzZqX IvnjCr6re8PYDFEZ7CMp4hDwxAa3q4TWc0H/q5ClCZcm+2iCjU7xz7xSsGyVgq2955D8 QoPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="nIR/RjOj"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=c361i71D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k190-20020a6384c7000000b0050bf0f0870esi27146412pgd.627.2023.03.27.08.51.13; Mon, 27 Mar 2023 08:51:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="nIR/RjOj"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=c361i71D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233058AbjC0Psb (ORCPT + 99 others); Mon, 27 Mar 2023 11:48:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233034AbjC0PsS (ORCPT ); Mon, 27 Mar 2023 11:48:18 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C5BA30E5 for ; Mon, 27 Mar 2023 08:48:16 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A8FA421EF9; Mon, 27 Mar 2023 15:48:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1679932094; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GRhKwRTX6Br7lKsyTC0a3fYnqFY/Qaz1J4XbdIrXyxc=; b=nIR/RjOjQ15MdxYK4KMGD7aUg93RsvPjDYDXvbGHGatfHYBkfDOXKEZePS0J2rF+c8w456 ukAfN9Q3s7biHlku4fDuOBjAM0XiwUBdqm3615a6Vf4Cz43upivJZVUMHaSd5S+j+tVsMp OqpOjvfbQwQnknaWhx04AvGSBUQfGGw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1679932094; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GRhKwRTX6Br7lKsyTC0a3fYnqFY/Qaz1J4XbdIrXyxc=; b=c361i71DRHrg8ilWM8B48sD3B1sR8CQZ1BIgaCd7tHfO63uSQDbzFQFzsfBmL2pq/D8eTZ m0vVBM6Rw/68IoDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9018D13329; Mon, 27 Mar 2023 15:48:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /rVbIr66IWSdQgAAMHmgww (envelope-from ); Mon, 27 Mar 2023 15:48:14 +0000 Message-ID: Date: Mon, 27 Mar 2023 17:48:14 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: What size anonymous folios should we allocate? To: Ryan Roberts , Matthew Wilcox , Yang Shi Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <022e1c15-7988-9975-acbc-e661e989ca4a@suse.cz> Content-Language: en-US From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_SOFTFAIL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/27/23 17:30, Ryan Roberts wrote: > On 27/03/2023 13:41, Vlastimil Babka wrote: >> On 2/22/23 04:52, Matthew Wilcox wrote: >>> On Tue, Feb 21, 2023 at 03:05:33PM -0800, Yang Shi wrote: >>> >>>>> C. We add a new wrinkle to the LRU handling code. When our scan of the >>>>> active list examines a folio, we look to see how many of the PTEs >>>>> mapping the folio have been accessed. If it is fewer than half, and >>>>> those half are all in either the first or last half of the folio, we >>>>> split it. The active half stays on the active list and the inactive >>>>> half is moved to the inactive list. >>>> >>>> With contiguous PTE, every PTE still maintains its own access bit (but >>>> it is implementation defined, some implementations may just set access >>>> bit once for one PTE in the contiguous region per arm arm IIUC). But >>>> anyway this is definitely feasible. >>> >>> If a CPU doesn't have separate access bits for PTEs, then we should just >>> not use the contiguous bits. Knowing which parts of the folio are >>> unused is more important than using the larger TLB entries. >> >> Hm but AFAIK the AMD aggregation is transparent, there are no bits. And IIUC >> the "Hardware Page Aggregation (HPA)" Ryan was talking about elsewhere in >> the thread, that sounds similar. So I IIUC there will be a larger TLB entry >> transparently, and then I don't expect the CPU to update individual bits as >> that would defeat the purpose. So I'd expect it will either set them all to >> active when forming the larger TLB entry, or set them on a single subpage >> and leave the rest at whatever state they were. Hm I wonder if the exact >> behavior is defined anywhere. > > For arm64, at least, there are 2 separate mechanisms: > > "The Contiguous Bit" (D8.6.1 in the Arm ARM) is a bit in the translation table > descriptor that SW can set to indicate that a set of adjacent entries are > contiguous and have same attributes and permissions etc. It is architectural. > The order of the contiguous range is fixed and depends on the base page size > that is in use. When in use, HW access and dirty reporting is only done at the > granularity of the contiguous block. > > "HPA" is a micro-architectural feature on some Arm CPUs, which aims to do a > similar thing, but is transparent to SW. In this case, the dirty and access bits > remain per-page. But when they differ, this affects the performance of the feature. > > Typically HPA can coalesce up to 4 adjacent entries, whereas for a 4KB base page > at least, the contiguous bit applies to 16 adjacent entries. Hm if it's 4 entries on arm64 and presumably 8 on AMD, maybe we can only care about how actively accessed are the individual "subpages" above that size, to avoid dealing with this uncertainty whether HW tracks them. At such smallish sizes we shouldn't induce massive overhead? > I'm hearing that there are workloads where being able to use the contiguous bit > really does make a difference, so I would like to explore solutions that can > work when we only have access/dirty at the folio level. And on the higher orders where we have explicit control via bits, we could split the explicitly contiguous mappings once in a while to determine if the sub-folios are still accessed? Although maybe with 16x4kB pages limit it may still be not worth the trouble? > Thanks, > Ryan