Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4786311iob; Mon, 9 May 2022 01:22:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxKu6hhMLoYEg6WIC3e5JJ+yzuOFUR9rvA8dAXQ31ZqhAice90OwhRxHCJQrOCXuoF2oBi X-Received: by 2002:a17:90b:17c3:b0:1dc:3f12:1dbc with SMTP id me3-20020a17090b17c300b001dc3f121dbcmr16936292pjb.169.1652084548509; Mon, 09 May 2022 01:22:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652084548; cv=none; d=google.com; s=arc-20160816; b=xfUMBGE9OZdw3sq/Fr2fDCkV55b9M3wylgrQHDrYS5ElxMnacSjQBXp0mVTFySvUol 9cBBWlq9avOT0Q1M9TwJdE/pVubS/Wjn2TGkTfyENKkT0jzPVxS8B+rtke70gNz/HV2h Qe2H2227kYMtJO4VUbeIJI9oa8lVWubC3YEimEN/Qfq5eB+u/YA0VrG166tPGPLayhtR ZkKK8saQQatWGuQNw55nEcFhtifEGFWp794dpei8PJBaOU6XAqIv5IA+LqNTTkewa0SO pjvVS5PLl1FahmjLWMWxPcjT2pj4diomFblx2QGYGoI6ul2oWh9l2fobaKbZxjSntXIG C+ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vj6XEbgIMbTr9+iA2bOkom4+D75tX0mB3fjbufS1Z10=; b=fiRxdWqIZak4leyq9EWLcvsc+mBWKG18ZnJmmO3mZXm/yaK0Pa4Ns+8WsjXZA467Hi 5XhRV25UKv9Vrw7S6nd7tofnq4Hn8xJgCH1VvVaxk1+zCqX5Xw3qrxrTUZ1y7HteS5M0 MctQwTj4w49cU6ioHZq1WUK3dgBjo1+G/6mRliBM/DPhGxUYO8la0ufZVgY8pDRqWdiu +3REKYD/TqQgEzMzZHtu3clQfRk9GatRpduORCj1UjfjdIvaZ/OWAmc1+Z5atTv3+dB3 brQzlbUpV6Z8hoblAPa2RyGaIW0C4wdxd9audRToD64BA2HxVrXviCZ1ytijNz8Q2pSC 2Vsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=feRUWTCl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id cm18-20020a056a00339200b005104b9d2b60si10004621pfb.296.2022.05.09.01.22.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 01:22:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=feRUWTCl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B18FB1E59E8; Mon, 9 May 2022 01:09:45 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1444016AbiEFRCu (ORCPT + 99 others); Fri, 6 May 2022 13:02:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444018AbiEFRCs (ORCPT ); Fri, 6 May 2022 13:02:48 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 905196D953 for ; Fri, 6 May 2022 09:59:04 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id c1-20020a17090a558100b001dca2694f23so7236439pji.3 for ; Fri, 06 May 2022 09:59:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=vj6XEbgIMbTr9+iA2bOkom4+D75tX0mB3fjbufS1Z10=; b=feRUWTCl2QMVdpXsrIM+3vZBL5P3h3/8Ynt3HSw+RVb8Y02b8BCzt/xxjdLZ+lm/rM MxAAwqTYNpSvPK6/3W9yBxyXkh9zW7dx2jk5najDR6j6DiCDrrz3BhFNkJVh9BpSQKE/ oh2OH0tjjQa/cOdARonIOidtJlBfLWgUxxHfj+Ghcs0+CZU90bK83a+SULYdKdNLlFy6 sZxdBhhkTG51p29v7+33fKAxQuKtL+YNZaB2y4zXM/0TZLrKce2isiaiFEyoqEnUzqbi daqyCXGSbWvY1LaZrbeKcEsTeYOkCuSsdUWWFcWc+JZ/O+dKbchnly84zJhQHHIbSyRc or7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vj6XEbgIMbTr9+iA2bOkom4+D75tX0mB3fjbufS1Z10=; b=fdVD5Ly0VgLqCm9wjztV5Zfi6Ck1jzZZaGJ3mZOJwj/vwggtyQvLCHVIHYu8GyBtNi q8bIPUa3NJFQOZ9uT6iKXlMKaQ6j7SBOyfr/WtdeKc/vDBAFjkn12ZRIBmFDjy0HySZj BpIK4R+7EfbRD6EPO6Ir6OfrQ83mf3t/y+ezbVuhn15vXKj0vZG7wtTgt/Tr6iwwNR7o B4bqmdqYc9X2PLXmWwgp4zzhSvaJ5mbh3KhnVzyp/WjujzxfqGSXgNHwfBBKW5ADPYlf bl7DWUZXgRApkFL2DTqZ58uB1excN8f6+sAXyackACiwIf7WcTr2j//zLZ+PGkRq3JhJ Be/A== X-Gm-Message-State: AOAM530zNKHEaFh18z3xIhfPCXDJJZdrRGW/fJ84LfUJm6oDJjAH4QG+ uaVjwaHmDrlqX3WKQLhFAqiFxqt3LTs= X-Received: by 2002:a17:90b:3884:b0:1dc:5838:1bea with SMTP id mu4-20020a17090b388400b001dc58381beamr13187175pjb.90.1651856343692; Fri, 06 May 2022 09:59:03 -0700 (PDT) Received: from hyeyoo ([114.29.24.243]) by smtp.gmail.com with ESMTPSA id q7-20020a170902dac700b0015e8d4eb2easm2008874plx.308.2022.05.06.09.58.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 May 2022 09:59:02 -0700 (PDT) Date: Sat, 7 May 2022 01:58:55 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Mike Rapoport Cc: linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Dave Hansen , Ira Weiny , Kees Cook , Mike Rapoport , Peter Zijlstra , Rick Edgecombe , Vlastimil Babka , linux-kernel@vger.kernel.org, x86@kernel.org Subject: Re: [RFC PATCH 0/3] Prototype for direct map awareness in page allocator Message-ID: References: <20220127085608.306306-1-rppt@kernel.org> <20220430134415.GA25819@ip-172-31-27-201.ap-northeast-1.compute.internal> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 02, 2022 at 09:44:48PM -0700, Mike Rapoport wrote: > On Sat, Apr 30, 2022 at 01:44:16PM +0000, Hyeonggon Yoo wrote: > > On Tue, Apr 26, 2022 at 06:21:57PM +0300, Mike Rapoport wrote: > > > Hello Hyeonggon, > > > > > > On Tue, Apr 26, 2022 at 05:54:49PM +0900, Hyeonggon Yoo wrote: > > > > On Thu, Jan 27, 2022 at 10:56:05AM +0200, Mike Rapoport wrote: > > > > > From: Mike Rapoport > > > > > > > > > > Hi, > > > > > > > > > > This is a second attempt to make page allocator aware of the direct map > > > > > layout and allow grouping of the pages that must be mapped at PTE level in > > > > > the direct map. > > > > > > > > > > > > > Hello mike, It may be a silly question... > > > > > > > > Looking at implementation of set_memory*(), they only split > > > > PMD/PUD-sized entries. But why not _merge_ them when all entries > > > > have same permissions after changing permission of an entry? > > > > > > > > I think grouping __GFP_UNMAPPED allocations would help reducing > > > > direct map fragmentation, but IMHO merging split entries seems better > > > > to be done in those helpers than in page allocator. > > > > > > Maybe, I didn't got as far as to try merging split entries in the direct > > > map. IIRC, Kirill sent a patch for collapsing huge pages in the direct map > > > some time ago, but there still was something that had to initiate the > > > collapse. > > > > But in this case buddy allocator's view of direct map is quite limited. > > It cannot merge 2M entries to 1G entry as it does not support > > big allocations. Also it cannot merge entries of pages freed in boot process > > as they weren't allocated from page allocator. > > > > And it will become harder when pages in MIGRATE_UNMAPPED is borrowed > > from another migrate type.... > > > > So it would be nice if we can efficiently merge mappings in > > change_page_attr_set(). this approach can handle cases above. > > > > I think in this case grouping allocations and merging mappings > > should be done separately. > > I've added the provision to merge the mappings in __free_one_page() because > at that spot we know for sure we can replace multiple PTEs with a single > PMD. Actually no external merging mechanism is needed if CPA supports merging mappings. Recently I started to implement similar idea I described above. The approach is slightly different as it does not scan the page table but updates count of number of mappings that has non-standard protection bits. (being "non-standard" means pgprot is not equal to PAGE_KERNEL.) It increases split_count when standard mapping becomes non-standard and decreases split_count in the opposite case. It merges mappings when the count become zero. Updating counts and merging is invoked in __change_page_attr(), which is called by set_memory_{rw,ro}(), set_direct_map_{default,invalid}_noflush(), ... etc. The implementation looks like revert_page() function that existed in arch/i386/mm/pageattr.c decades ago... There are some issues like 1) set_memory_4k()-ed memory should not be merged and 2) we need to be extremely sure that the count is always valid. But I think this approach is definitely worth trying. I'll send a RFC versionin to list after a bit of more work. And still, I think grouping allocations using migrate type would work well with adding merging feature in CPA. Thanks! Hyeonggon > I'm not saying there should be no additional mechanism for collapsing > direct map pages, but I don't know when and how it should be invoked. > > > > > For example: > > > > 1) set_memory_ro() splits 1 RW PMD entry into 511 RW PTE > > > > entries and 1 RO PTE entry. > > > > > > > > 2) before freeing the pages, we call set_memory_rw() and we have > > > > 512 RW PTE entries. Then we can merge it to 1 RW PMD entry. > > > > > > For this we need to check permissions of all 512 pages to make sure we can > > > use a PMD entry to map them. > > > > Of course that may be slow. Maybe one way to optimize this is using some bits > > in struct page, something like: each bit of page->direct_map_split (unsigned long) > > is set when at least one entry in (PTRS_PER_PTE = 512)/(BITS_PER_LONG = 64) = 8 entries > > has special permissions. > > > > Then we just need to set the corresponding bit when splitting mappings and > > iterate 8 entries when changing permission back again. (and then unset the bit when 8 entries has > > usual permissions). we can decide to merge by checking if page->direct_map_split is zero. > > > > When scanning, 8 entries would fit into one cacheline. > > > > Any other ideas? > > > > > Not sure that doing the scan in each set_memory call won't cause an overall > > > slowdown. > > > > I think we can evaluate it by measuring boot time and bpf/module > > load/unload time. > > > > Is there any other workload that is directly affected > > by performance of set_memory*()? > > > > > > 3) after 2) we can do same thing about PMD-sized entries > > > > and merge them into 1 PUD entry if 512 PMD entries have > > > > same permissions. > > > > [...] > > > > > Mike Rapoport (3): > > > > > mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED > > > > > mm/secretmem: use __GFP_UNMAPPED to allocate pages > > > > > EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc > > > > -- > > > > Thanks, > > > > Hyeonggon > > > > > > -- > > > Sincerely yours, > > > Mike. > > -- > Sincerely yours, > Mike. -- Thanks, Hyeonggon