Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2091916pxb; Mon, 8 Mar 2021 14:02:40 -0800 (PST) X-Google-Smtp-Source: ABdhPJxOhNz8kJOtal41qLUbgVzzsmlrrcp4auXLN1xUVBEuHwahlhiyR4QmJzP04LDVZ9GLKYGk X-Received: by 2002:a05:6402:3596:: with SMTP id y22mr659465edc.207.1615240960127; Mon, 08 Mar 2021 14:02:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615240960; cv=none; d=google.com; s=arc-20160816; b=FH2pRTL+rIoUFRrMvz3UX4M+xvH2qy+X9KketEbAqt1brXJtDWYdsbjPqW70WptHQE GVmofu1fQ5eO3vZ3g2Gm38c3SfvfKL6d/0DmxqIdjWG9IVJD+A4qn2ZWZ4R0quonA7my Uh6ncSU1JPgxWcm9s09ZI1WKCmOP7cI63T2hWein46atD2OesNDGoS530lTOWnC2SGTP bkI25SG+0Om7UDErfvdrFYEMIDliAvBCxzs23twtcFdqX3dUOsaf1t83ZCpWWrFh8qHB L5Wn95oUM9tz8iDbGNl+nx7dyATpxLQwJqpP853qQeSXar88OE0pU9+GzeUh5n9wLASB 9NQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=KrspNk2rfZj7/whsGzU05gApqtQxCSoDmRFTVen5Hf8=; b=GbJviAFrwdGuwu/VExjVkMTO2tJjxDQEgUwj5vJGXd2VnpZ9XzDomUWPZUiYq1Z88z cyTQr09Gc+vFXpVU6ks+RGWRiBx15TRzdoUfLovQhen9OeLSCzei91A6p3lRc2KsENqo f4f0rXPbsm1eZoUdFPLS3R0yCZEnGywf/ldMfmg12DdGYSgnzOW++tz4le7oMrJ6qbYJ y34aUULeRKVfcvxSMC6mLt7chvelMxTIrmTnLqJZqtcEJYYqPHpbc1lLL36jOwZc5cjG v5Fdi1X46ar0B92gwmvPjTUICzaQnlZfsdo5WkP9lUE5e5uugrcd0zhX7PbpTV8jeIer 3d/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkB4Qo2k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ec23si5341548ejb.710.2021.03.08.14.02.16; Mon, 08 Mar 2021 14:02:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkB4Qo2k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbhCHV7B (ORCPT + 99 others); Mon, 8 Mar 2021 16:59:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51635 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229704AbhCHV67 (ORCPT ); Mon, 8 Mar 2021 16:58:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615240739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KrspNk2rfZj7/whsGzU05gApqtQxCSoDmRFTVen5Hf8=; b=AkB4Qo2keFFZYS9KDs5MdGPTt06GcV3+TN+MEU7eLrW+wUxEYXd0xo3cztAy78wJ9imWVI nAIHkdZm1b8hqBvYMCe/tWTXVHll0uVEFzoofsq4TnL17PRR7CCYNbhYgMAAWDhsELiUGJ B3v6jmJ8g7AVrz6kzSrE9zAYPrFaMg0= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-227-XOAlk8rbOE6ZSDJ7K11ZDw-1; Mon, 08 Mar 2021 16:58:57 -0500 X-MC-Unique: XOAlk8rbOE6ZSDJ7K11ZDw-1 Received: by mail-wm1-f69.google.com with SMTP id c7so260299wml.8 for ; Mon, 08 Mar 2021 13:58:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=KrspNk2rfZj7/whsGzU05gApqtQxCSoDmRFTVen5Hf8=; b=dM8by59Yye8tNmU8rZwh0p3NncIUWDouCvm0OzqOPPoRxgNU+HxAaYZORk4PM7EFtg wJbrzZ54yFGpKFYd603rK49ihTiccyZtXNk1EDwSilItIE6Gqlq1XYKyXgaltB4K0hwE U454XVud8nfR0Cn78N4AfplsE6zukOWcM8Jn61I7/+7nsT4be/P2OUMHY/9VF1PNXohk vV2j46jxQChRahgUrEjiWbLxSf0Jk/5J/Ey41Gm9MEztVOj3RMQ2kgMX2d8sRPtln1QX 4LOlLoU4VRlDvxQc8V06mdRc233wXFlgZIPg0zFh6F1ht5kV2bbuqu6YsdBA1nW3JV2W 5HJA== X-Gm-Message-State: AOAM532vyQl0peHR37NbekVhURMU1M5VnOYMSHegXkKsPu6t2HDutX4P CHCnV4zaluGb9rPBUzLfaq5AOXuoRH61oT3X7CryhZznPkf50ZN+Y9lEoj+dd1cu1jf2DmYYOWn JiuSl0J6nJYf+NwuSPp7JJbEo X-Received: by 2002:a1c:66c4:: with SMTP id a187mr769344wmc.164.1615240736304; Mon, 08 Mar 2021 13:58:56 -0800 (PST) X-Received: by 2002:a1c:66c4:: with SMTP id a187mr769332wmc.164.1615240736072; Mon, 08 Mar 2021 13:58:56 -0800 (PST) Received: from [192.168.3.108] (p5b0c6c02.dip0.t-ipconnect.de. [91.12.108.2]) by smtp.gmail.com with ESMTPSA id b186sm856863wmc.44.2021.03.08.13.58.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 08 Mar 2021 13:58:55 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] mm: huge_memory: a new debugfs interface for splitting THP tests. Date: Mon, 8 Mar 2021 22:58:54 +0100 Message-Id: References: Cc: David Hildenbrand , Zi Yan , Linux MM , Linux Kernel Mailing List , linux-kselftest@vger.kernel.org, "Kirill A . Shutemov" , Andrew Morton , Shuah Khan , John Hubbard , Sandipan Das , David Rientjes , Alex Shi In-Reply-To: To: Yang Shi X-Mailer: iPhone Mail (18D52) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Am 08.03.2021 um 22:25 schrieb Yang Shi : >=20 > =EF=BB=BFOn Mon, Mar 8, 2021 at 12:36 PM David Hildenbrand wrote: >>=20 >>=20 >>>> Am 08.03.2021 um 21:18 schrieb Yang Shi : >>>=20 >>> =EF=BB=BFOn Mon, Mar 8, 2021 at 11:30 AM David Hildenbrand wrote: >>>>=20 >>>>> On 08.03.21 20:11, Yang Shi wrote: >>>>> On Mon, Mar 8, 2021 at 11:01 AM Zi Yan wrote: >>>>>>=20 >>>>>> On 8 Mar 2021, at 13:11, David Hildenbrand wrote: >>>>>>=20 >>>>>>> On 08.03.21 18:49, Zi Yan wrote: >>>>>>>> On 8 Mar 2021, at 11:17, David Hildenbrand wrote: >>>>>>>>=20 >>>>>>>>> On 08.03.21 16:22, Zi Yan wrote: >>>>>>>>>> From: Zi Yan >>>>>>>>>>=20 >>>>>>>>>> By writing ",," to >>>>>>>>>> /split_huge_pages_in_range_pid, THPs in the process with= the >>>>>>>>>> given pid and virtual address range are split. It is used to test= >>>>>>>>>> split_huge_page function. In addition, a selftest program is adde= d to >>>>>>>>>> tools/testing/selftests/vm to utilize the interface by splitting >>>>>>>>>> PMD THPs and PTE-mapped THPs. >>>>>>>>>=20 >>>>>>>>> Won't something like >>>>>>>>>=20 >>>>>>>>> 1. MADV_HUGEPAGE >>>>>>>>>=20 >>>>>>>>> 2. Access memory >>>>>>>>>=20 >>>>>>>>> 3. MADV_NOHUGEPAGE >>>>>>>>>=20 >>>>>>>>> Have a similar effect? What's the benefit of this? >>>>>>>>=20 >>>>>>>> Thanks for checking the patch. >>>>>>>>=20 >>>>>>>> No, MADV_NOHUGEPAGE just replaces VM_HUGEPAGE with VM_NOHUGEPAGE, >>>>>>>> nothing else will be done. >>>>>>>=20 >>>>>>> Ah, okay - maybe my memory was tricking me. There is some s390x KVM c= ode that forces MADV_NOHUGEPAGE and force-splits everything. >>>>>>>=20 >>>>>>> I do wonder, though, if this functionality would be worth a proper u= ser interface (e.g., madvise), though. There might be actual benefit in havi= ng this as a !debug interface. >>>>>>>=20 >>>>>>> I think you aware of the discussion in https://lkml.kernel.org/r/d09= 8c392-273a-36a4-1a29-59731cdf5d3d@google.com >>>>>>=20 >>>>>> Yes. Thanks for bringing this up. >>>>>>=20 >>>>>>>=20 >>>>>>> If there will be an interface to collapse a THP -- "this memory area= is worth extra performance now by collapsing a THP if possible" -- it might= also be helpful to have the opposite functionality -- "this memory area is n= ot worth a THP, rather use that somehwere else". >>>>>>>=20 >>>>>>> MADV_HUGE_COLLAPSE vs. MADV_HUGE_SPLIT >>>>>>=20 >>>>>> I agree that MADV_HUGE_SPLIT would be useful as the opposite of COLLA= PSE when user might just want PAGESIZE mappings. >>>>>> Right now, HUGE_SPLIT is implicit from mapping changes like mprotect o= r MADV_DONTNEED. >>>>>=20 >>>>> IMHO, it sounds not very useful. MADV_DONTNEED would split PMD for any= >>>>> partial THP. If the range covers the whole THP, the whole THP is going= >>>>> to be freed anyway. All other places in kernel which need split THP >>>>> have been covered. So I didn't realize any usecase from userspace for >>>>> just splitting PMD to PTEs. >>>>=20 >>>> THP are a limited resource. So indicating which virtual memory regions >>>> are not performance sensitive right now (e.g., cold pages in a databse)= >>>> and not worth a THP might be quite valuable, no? >>>=20 >>> Such functionality could be achieved by MADV_COLD or MADV_PAGEOUT, >>> right? Then a subsequent call to MADV_NOHUGEPAGE would prevent from >>> collapsing or allocating THP for that area. >>>=20 >>=20 >> I remember these deal with optimizing swapping. Not sure how they interac= t with THP, especially on systems without swap - I would guess they don=E2=80= =98t as of now. >=20 > Yes, MADV_PAGEOUT would just swap the THP or sub pages out. I think I > just forgot to mention MADV_FREE which would be more suitable for this > usecase. >=20 >>=20 Can you elaborate? MADV_FREE is destructive, just like a delayed MADV_DONTNE= ED. How would that help here? >>>>=20 >>>> -- >>>> Thanks, >>>>=20 >>>> David / dhildenb >>>>=20 >>>=20 >>=20 >=20