Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp725554rdb; Thu, 18 Jan 2024 18:05:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IHEIddJVpMalSRvIVTubk5jxuPRZmHOd+z3CZQ9uaiZtcWatLzo/vQ6ryOYSHoV8U2J3yq1 X-Received: by 2002:a17:90a:3ee4:b0:290:dd7:a807 with SMTP id k91-20020a17090a3ee400b002900dd7a807mr1311902pjc.92.1705629939776; Thu, 18 Jan 2024 18:05:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705629939; cv=pass; d=google.com; s=arc-20160816; b=O1WYVlQSneqspAufdMW5ckit8386XkRJYuM0WktPV7kxv4TIKcXY5I7Xc3bNMGmSfZ Pn+R/GtV6zyu10yKd0/d+z9fgVKu0YN5j7sy2m1YtWi6LpOom95wjoKDDhd0eOSpBapI pFcHWdl2OmCFTKTIM9M2poHtoYKwYUO0m3s30/fCPP5rYOPQad3W3gqjkV07+e06sEba aqKCItmdHRXZX2OcSsiG6u+D1+Mbx+MYCzPJj27CsdBn7pZpwdgn11pgVEIphfhEB031 ac2N5brqaS6V3q300YfRimizOjy4C1bKLB5ZDlpz/nFW2aNYBOdwHiVxYsTxT+/uYgwn v2uw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=6HZowCDUKbTNui0+4VUQFlKevLPD2Euqu7Sh3yGRugk=; fh=21iFKAZOIe+0XcdDY+59IdYdwFQ+1lf9qT8dvfE3tIE=; b=fHz2JHx2/NmolXl3GhLxwXFvXQS2/IJrkEPlksZ+bflOD4xAfT08U9x9MoYRkEYGH2 OOtOpkj5tdfarfHkf+wotXiaQectztWqexfWR7RdYwSN8Btc4PEHdhSn2mLm0aiAMOoM fC4YbFwvh1o4Y7iNKmUU/Kb2M8PKp+30kOpkPiMBIm/PLm6tOwes3ZLzTtg8xJ8+dR7X i7/Hr0hB9cOPcT1hSP5Dnozc9nug8/uqB012Y7jnvZzDtYUsPM4aG7PSzirS+JuUiYuG SNIDl0lJj9cb4xk+BTo4uyRrnEJ51hB5uJq1hgxWeOczQaXAbxGDfTfpuOkt4GeQxJy3 JbWQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-30702-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30702-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id gk1-20020a17090b118100b0028ff7941b5fsi2393783pjb.38.2024.01.18.18.05.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 18:05:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30702-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-30702-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30702-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 0DB5FB21E14 for ; Fri, 19 Jan 2024 02:05:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E56AE7484; Fri, 19 Jan 2024 02:05:28 +0000 (UTC) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B3686FA9 for ; Fri, 19 Jan 2024 02:05:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705629928; cv=none; b=ICuX7m2zW00gy56cm5bS58wmvDw61FmhaGHS3j848H+gffMXE3/xKDy0ryePm7oTumlI4lfPvc+/OQlBIOzM0I8ij/OnnvwNqQ++/dzg4pZHPkaX9rvaRC6Oq1rEsrkFmIvwN+BAJi654XyntrcgvHGIrprTe1KMT+il8kIU/6Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705629928; c=relaxed/simple; bh=rXRStnnMh2Ym0UlNgR9pnyxCPDBNk+CGDr1uCMvUlDg=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=NvrNBKrCNaohc/3s9y73ETqHT3YWyA9+n713JccwENI1IPlUHhbQRS8y0193qyn8bh6jM5SdR/D/jWBzvJy3sA0X7RqTIGFvtKkMwG21kwDxLgyUdUBInDTIlqC+hOvpKA+EH9e2ZlsjRDrPzl8mQ9VwCIcPDOrROK6ITqFthIw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4TGNGP43ZBzNlLp; Fri, 19 Jan 2024 10:04:29 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id A1C6A180076; Fri, 19 Jan 2024 10:05:16 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 19 Jan 2024 10:05:16 +0800 Message-ID: Date: Fri, 19 Jan 2024 10:05:15 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: memory: move mem_cgroup_charge() into alloc_anon_folio() Content-Language: en-US To: Michal Hocko CC: Andrew Morton , , , , Matthew Wilcox , David Hildenbrand References: <20240117103954.2756050-1-wangkefeng.wang@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm100001.china.huawei.com (7.185.36.93) On 2024/1/18 23:59, Michal Hocko wrote: > On Wed 17-01-24 18:39:54, Kefeng Wang wrote: >> mem_cgroup_charge() uses the GFP flags in a fairly sophisticated way. >> In addition to checking gfpflags_allow_blocking(), it pays attention >> to __GFP_NORETRY and __GFP_RETRY_MAYFAIL to ensure that processes within >> this memcg do not exceed their quotas. Using the same GFP flags ensures >> that we handle large anonymous folios correctly, including falling back >> to smaller orders when there is plenty of memory available in the system >> but this memcg is close to its limits. > > The changelog is not really clear in the actual problem you are trying > to fix. Is this pure consistency fix or have you actually seen any > misbehavior. From the patch I suspect you are interested in THPs much > more than regular order-0 pages because those are GFP_KERNEL like when > it comes to charging. THPs have a variety of options on how aggressive > the allocation should try. From that perspective NORETRY and > RETRY_MAYFAIL are not all that interesting because costly allocations > (which THPs are) already do imply MAYFAIL and NORETRY. I don't meet actual issue, it founds from code inspection. mTHP is introduced by Ryan(19eaf44954df "mm: thp: support allocation of anonymous multi-size THP"),so we have similar check for mTHP like PMD THP in alloc_anon_folio(), it will try to allocate large order folio below PMD_ORDER, and fallback to order-0 folio if fails, meanwhile, it get GFP flags from vma_thp_gfp_mask() according to user configuration like PMD THP allocation, so 1) the memory charge failure check should be moved into fallback logical, because it will make us to allocated as much as possible large order folio, although the memcg's memory usage is close to its limits. 2) using seem GFP flags for allocate/mem charge, be consistent with PMD THP firstly, in addition, according to GFP flag returned for vma_thp_gfp_mask(), GFP_TRANSHUGE_LIGHT could make us skip direct reclaim, _GFP_NORETRY will make us skip mem_cgroup_oom and won't kill any progress from large order folio charging. > > GFP_TRANSHUGE_LIGHT is more interesting though because those do not dive > into the direct reclaim at all. With the current code they will reclaim > charges to free up the space for the allocated THP page and that defeats > the light mode. I have a vague recollection of preparing a patch to We are interesting to GFP_TRANSHUGE_LIGHT and _GFP_NORETRY as mentioned above. > address that in the past. Let me have a look at the current code... Yes, commit 3b3636924dfe ("mm, memcg: sync allocation and memcg charge gfp flags for THP") for PMD THP from you :) > > ... So yes, we still do THP charging the way I remember > (do_huge_pmd_anonymous_page). Your patch touches handle_pte_fault -> > do_anonymous_page path which is not THP AFAICS. Or am I missing > something? mTHP is one kind of THP. Thanks. > >> Signed-off-by: Kefeng Wang >> --- >> v2: >> - fix built when !CONFIG_TRANSPARENT_HUGEPAGE >> - update changelog suggested by Matthew Wilcox >> >> mm/memory.c | 16 ++++++++-------- >> 1 file changed, 8 insertions(+), 8 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 5e88d5379127..551f0b21bc42 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4153,8 +4153,8 @@ static bool pte_range_none(pte_t *pte, int nr_pages) >> >> static struct folio *alloc_anon_folio(struct vm_fault *vmf) >> { >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> struct vm_area_struct *vma = vmf->vma; >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> unsigned long orders; >> struct folio *folio; >> unsigned long addr; >> @@ -4206,15 +4206,21 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) >> addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); >> folio = vma_alloc_folio(gfp, order, vma, addr, true); >> if (folio) { >> + if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { >> + folio_put(folio); >> + goto next; >> + } >> + folio_throttle_swaprate(folio, gfp); >> clear_huge_page(&folio->page, vmf->address, 1 << order); >> return folio; >> } >> +next: >> order = next_order(&orders, order); >> } >> >> fallback: >> #endif >> - return vma_alloc_zeroed_movable_folio(vmf->vma, vmf->address); >> + return folio_prealloc(vma->vm_mm, vma, vmf->address, true); >> } >> >> /* >> @@ -4281,10 +4287,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) >> nr_pages = folio_nr_pages(folio); >> addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); >> >> - if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) >> - goto oom_free_page; >> - folio_throttle_swaprate(folio, GFP_KERNEL); >> - >> /* >> * The memory barrier inside __folio_mark_uptodate makes sure that >> * preceding stores to the page contents become visible before >> @@ -4338,8 +4340,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) >> release: >> folio_put(folio); >> goto unlock; >> -oom_free_page: >> - folio_put(folio); >> oom: >> return VM_FAULT_OOM; >> } >> -- >> 2.27.0 >> >