Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp759353rdb; Mon, 29 Jan 2024 18:39:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFTAIBSMATnb7eUvZ9O6lzlZPZf17w5J30I3mANRDYx1dKun3hewhm5AcxgV7vecyfvw9hY X-Received: by 2002:a05:6359:4c21:b0:178:89b4:5199 with SMTP id kj33-20020a0563594c2100b0017889b45199mr747687rwc.48.1706582383030; Mon, 29 Jan 2024 18:39:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706582382; cv=pass; d=google.com; s=arc-20160816; b=xxjK0lYsDemc1p8JNlNdUaZiA87azzAnHB8rXwmpPaZ/Fq+neD1ugoB8E2menbhsty RJ1qg9/S9JOGJ+RQnSf/hEvZg2Y3pVNXOX7cll3WFtv0whNr0JFmla1YDkuRaRPwgZdJ BROti8Y2WY2LPFUUcvU0hB8c2G1GUUyb2DhZTlia1hy1KEq26MW1jnENaCPK4Uci4Hgi wa9nD4gow50omEMS+91k62QT7GBuRthllNZHGXIqPWoVaJa8YAmUzJzy0KvvxXm6NueH 7biQcQHKakJ0E16u9/BUn3V80X9CVWE1fq9Fpr4wCQPLdTPpU/0d/2yNKcgc8xj7oUse FhVQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=1Nf7ZAE4lUdPHPJYVXVTuVeF2gxZD1vrM/mMUbDvf/w=; fh=krwYHp2JTfZ4pn3gTf5LC8V6H1G4ozhsPWWoTpODhNo=; b=hEYT47LNZeNAFWN82agU2eJhF0cx8qFKq6dfgibItBMTQ/JsJ/0nLiq+H7gvUOumoV 20xzdOyZJiAdvRHWfcz0pffQuCXgPuaIe8zZIaVvBYsToSGAdj1U5iPb4BAd4qEvDiQ6 OJCsx87yfPGtewWt+umvejTwI6NTd/P4OlZfTQoAPaFcERU/O4PZQR92ZntiNKYl7v3m HpJMHf6P1nfPEKMvFPFfKYRUPo52tpFosyNdlWiDd4iQX9/kE4nB545Gsu8l0jTob6mq tQupcQGu/axGwHaTgNzj8NyJ4DH3AqeB6vC050x7kuYvvlz4KMTm7wGyvnBCtmlHMEYR BKXg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="D+U/yqOO"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-43834-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43834-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id e17-20020a656791000000b005d5d32bf0cdsi6611157pgr.256.2024.01.29.18.39.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 18:39:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43834-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="D+U/yqOO"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-43834-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43834-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E0DA728314A for ; Tue, 30 Jan 2024 02:37:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B8B522EAE1; Tue, 30 Jan 2024 02:37:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D+U/yqOO" Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19C11364BF for ; Tue, 30 Jan 2024 02:37:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706582263; cv=none; b=UvUNjYtu2CzmjmgUOUrgl2+lRwStba2J5q/J4IL3drYHBF0xpmMIVEULSc/vHDsbY85oy0rT6qY17Z2kp9MqEklzoojcqLD/EyjVIGjxqig4t4SD5nR795LwmbD4CV7hUj2tCAgqCTSWPl+UuL+rIN9usUCJ9WMHJBOihEyb6T8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706582263; c=relaxed/simple; bh=z9bBDJTIQyCBFfcB8DtDInc+WWmR0DvpkieJF1S1aGY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=H3IT7b5cyV1LNPfFdnUbPxngwBQs4pDCCZEPWcZUH6o2V/Gqn1xdZkfF3G8WKKY6IwK4jOaMjL1awtwPys1CA4EBLo3pMg98l0yEwYEZkcRDEaI0Ah/NVOd6jwTlTFckArnubgm2xJqo+J1JBVM74vkigm2eg518hMIGQBqBXw8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D+U/yqOO; arc=none smtp.client-ip=209.85.219.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-dc6a2e63203so90818276.2 for ; Mon, 29 Jan 2024 18:37:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706582261; x=1707187061; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1Nf7ZAE4lUdPHPJYVXVTuVeF2gxZD1vrM/mMUbDvf/w=; b=D+U/yqOO3gQgCf+VZHAQal1dByeUSnKma9UI7AK+avep/20rF6RptZzbSqbXQ8Wn8v ZgqFJ1h5NPztaAMQWRNdgKCZj2f4ufNZeC9V2KRHAs8azRWEGwccbD3ZjyfjBUyCGX3E YlN5LG1BwZ7rNYtkSBPyBrKMYZfcU6YytOXtUDNdZTZ3UWuiIBb38RPokSJcJVw8MfdW l5MbZtp68pjJPHLxJcPJ2LdptMtDZtTjY97bWwkBzMvd8aKGnVT5kg+1kmIjmeIvq4gZ bNwjfHkz4FkNSoi60NK5EJnDlKSQ98tGIx7B52bb3HqQfxVAR9lDU/K4icCYVBx19mzx poiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706582261; x=1707187061; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1Nf7ZAE4lUdPHPJYVXVTuVeF2gxZD1vrM/mMUbDvf/w=; b=TJFWlKIKrMnDBu7D0ufr/B7u/rYmPoDMptKnNPeWzkYLluDHQcofJf4fxihAmFlz6n xhXDVN4nQqJupEgh8c8NZ7ox+m3dfZYZq1r9Lk12CgfZ1wge1OVt+AWHwKV+/y/MYGyI Wen/JoHjkf5scayt+g1NPhIuE1jCryYK0wwcCOBxkscR1e7bApvOXNPvvDdaOZz9HYkT 9UE3fSp9g+VQm8NbZgVD0FPv8V8B9oqX89JbcanG1tKtsLxNLEeI1A8hDyjbveSoKLLy VuC8xf2E1xAxpTDQf1BZ534NVX1ToyE6XB06UTHnJlSRUl0k5y7LwjZ/bYmYbyALoaKs KD1Q== X-Gm-Message-State: AOJu0YzDTefBu/EcjXkkzfbLwJ7yxa4ifVqkPLS1Zgz7Fojv5soxuF9g 16PAA0cqklls1kdTZX53hXYTYiwsfjxN+UDKZfbj5Y3yv20lDFThwDfa+9p65CjGe9eeQkuAkXV X3lV0vs0pYwPJ0YTmPX11pCDUf5msgrJjL55kufYMIMs= X-Received: by 2002:a25:abc3:0:b0:dc6:7156:d2cc with SMTP id v61-20020a25abc3000000b00dc67156d2ccmr2857608ybi.82.1706582261010; Mon, 29 Jan 2024 18:37:41 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240129054551.57728-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Tue, 30 Jan 2024 10:37:26 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP check To: "Zach O'Keefe" Cc: Yang Shi , akpm@linux-foundation.org, mhocko@suse.com, david@redhat.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Zach, Thanks for taking time to review! On Tue, Jan 30, 2024 at 3:04=E2=80=AFAM Zach O'Keefe w= rote: [...] > IIUC, there really isn't any correctness race. Claim is just that we Yes, there is indeed no correctness race. > can avoid a number of per-vma checks. AFAICT, any task w/ > MMF_DISABLE_THP set will always have each and every vma checked > (albeit, with a very inexpensive ->vm_mm->flags check) [...] IMO, for any task with MMF_DISABLE_THP set, the check for each VMA can be skipped to avoid redundant operations, (with a very inexpensive ->mm->flags check) especially in scenarios with a large address space. BR, Lance On Tue, Jan 30, 2024 at 3:04=E2=80=AFAM Zach O'Keefe w= rote: > > On Mon, Jan 29, 2024 at 10:53=E2=80=AFAM Yang Shi w= rote: > > > > On Sun, Jan 28, 2024 at 9:46=E2=80=AFPM Lance Yang wrote: > > > > > > khugepaged scans the entire address space in the > > > background for each given mm, looking for > > > opportunities to merge sequences of basic pages > > > into huge pages. However, when an mm is inserted > > > to the mm_slots list, and the MMF_DISABLE_THP flag > > > is set later, this scanning process becomes > > > unnecessary for that mm and can be skipped to avoid > > > redundant operations, especially in scenarios with > > > a large address space. > > > > > > This commit introduces a check before each scanning > > > process to test the MMF_DISABLE_THP flag for the > > > given mm; if the flag is set, the scanning process > > > is bypassed, thereby improving the efficiency of > > > khugepaged. > > > > > > Signed-off-by: Lance Yang > > > --- > > > mm/khugepaged.c | 18 ++++++++++++------ > > > 1 file changed, 12 insertions(+), 6 deletions(-) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index 2b219acb528e..d6a700834edc 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -410,6 +410,12 @@ static inline int hpage_collapse_test_exit(struc= t mm_struct *mm) > > > return atomic_read(&mm->mm_users) =3D=3D 0; > > > } > > > > > > +static inline int hpage_collapse_test_exit_or_disable(struct mm_stru= ct *mm) > > > +{ > > > + return hpage_collapse_test_exit(mm) || > > > + test_bit(MMF_DISABLE_THP, &mm->flags); > > > +} > > > + > > > void __khugepaged_enter(struct mm_struct *mm) > > > { > > > struct khugepaged_mm_slot *mm_slot; > > > @@ -1422,7 +1428,7 @@ static void collect_mm_slot(struct khugepaged_m= m_slot *mm_slot) > > > > > > lockdep_assert_held(&khugepaged_mm_lock); > > > > > > - if (hpage_collapse_test_exit(mm)) { > > > + if (hpage_collapse_test_exit_or_disable(mm)) { > > > /* free mm_slot */ > > > hash_del(&slot->hash); > > > list_del(&slot->mm_node); > > > @@ -2360,7 +2366,7 @@ static unsigned int khugepaged_scan_mm_slot(uns= igned int pages, int *result, > > > goto breakouterloop_mmap_lock; > > > > > > progress++; > > > - if (unlikely(hpage_collapse_test_exit(mm))) > > > + if (unlikely(hpage_collapse_test_exit_or_disable(mm))) > > > goto breakouterloop; > > > > > > vma_iter_init(&vmi, mm, khugepaged_scan.address); > > > @@ -2368,7 +2374,7 @@ static unsigned int khugepaged_scan_mm_slot(uns= igned int pages, int *result, > > > unsigned long hstart, hend; > > > > > > cond_resched(); > > > - if (unlikely(hpage_collapse_test_exit(mm))) { > > > + if (unlikely(hpage_collapse_test_exit_or_disable(mm))= ) { > > > > The later thp_vma_allowable_order() does check whether MMF_DISABLE_THP > > is set or not. And the hugepage_vma_revalidate() after re-acquiring > > mmap_lock does the same check too. The checking in khugepaged should > > be already serialized with prctl, which takes mmap_lock in write. > > IIUC, there really isn't any correctness race. Claim is just that we > can avoid a number of per-vma checks. AFAICT, any task w/ > MMF_DISABLE_THP set will always have each and every vma checked > (albeit, with a very inexpensive ->vm_mm->flags check) > > Thanks, > Zach > > > > progress++; > > > break; > > > } > > > @@ -2390,7 +2396,7 @@ static unsigned int khugepaged_scan_mm_slot(uns= igned int pages, int *result, > > > bool mmap_locked =3D true; > > > > > > cond_resched(); > > > - if (unlikely(hpage_collapse_test_exit(mm))) > > > + if (unlikely(hpage_collapse_test_exit_or_disa= ble(mm))) > > > goto breakouterloop; > > > > > > VM_BUG_ON(khugepaged_scan.address < hstart || > > > @@ -2408,7 +2414,7 @@ static unsigned int khugepaged_scan_mm_slot(uns= igned int pages, int *result, > > > fput(file); > > > if (*result =3D=3D SCAN_PTE_MAPPED_HU= GEPAGE) { > > > mmap_read_lock(mm); > > > - if (hpage_collapse_test_exit(= mm)) > > > + if (hpage_collapse_test_exit_= or_disable(mm)) > > > goto breakouterloop; > > > *result =3D collapse_pte_mapp= ed_thp(mm, > > > khugepaged_scan.addre= ss, false); > > > @@ -2450,7 +2456,7 @@ static unsigned int khugepaged_scan_mm_slot(uns= igned int pages, int *result, > > > * Release the current mm_slot if this mm is about to die, or > > > * if we scanned all vmas of this mm. > > > */ > > > - if (hpage_collapse_test_exit(mm) || !vma) { > > > + if (hpage_collapse_test_exit_or_disable(mm) || !vma) { > > > /* > > > * Make sure that if mm_users is reaching zero while > > > * khugepaged runs here, khugepaged_exit will find > > > -- > > > 2.33.1 > > >