Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4051327pxf; Tue, 23 Mar 2021 00:59:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyeLpxK5E56KkdfiSJc5Z9lH9vfF/ebq8qMUTUr7kIfTy2C/Y59FXMbkr5uo1scaJLzoFYf X-Received: by 2002:aa7:dd05:: with SMTP id i5mr3335228edv.300.1616486382647; Tue, 23 Mar 2021 00:59:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616486382; cv=none; d=google.com; s=arc-20160816; b=Purr/4SPxbny8ggcEZqjnVI0Q6LRrJg7eCjA9FvNH6tNtlVonNqMW0TrjPMXAyusJm QYGt6DBUx+Y4Tr6c1qJxC/S+kLy+lTc2Ay81Ub4iTkta5vK9kDGEGjGR0f+8S6VHUigw pDaU+PTT/2gu/pqBLVR6nIUJY6Y2ThUb3we94G9XLbwFnc3rB90TWnTp+7NOV/MQbaC+ tmdqgtnyCXi9ra3avhyqlm7FQ+CCJ50FmnHnA+WK6+D4HcIPaSJycRNWKvSRNUDrNmXp GxJ9StUVzkiL2+KOfq7tTUXMgiqt7TLnJo5rsd7dDBX4rZFoP+aJPumr2FplbqV0e+Z/ 8eZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=m+xTnYW1KWqoinuxGp7EGKTcA3wA3snicXhoUVTBaLY=; b=iPXK8sPm9MRCLDc3c2IsM+NjRwZ+qx+Yi4cFH8spRNlSc9V+YmCDHaJBATl2F5Nxd0 8XmqlpRCgLPp9go9N08aVsUsTsC2E7I+nZK5i0rAAu3R7WEtDl+F1adtRt0kluUD8KIC RaWmFmHnK/hV8Wv0V85lw2vvIm5cVoklJadl+yrHPy3HSCd2jPkuojcxroHAntlAzDga WPUuWEqhv+drFm50nMxLI6bW5hcZw2kHq3a4kyqzbmUqzPmhbtNztqjVtvVvaXP5pXK+ fCEs2scYd+F5OeAiA649OIp6G/+2chudBwxrnD8rwEMId7uJG6OtByReMm40f5ZX5ApY Akjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=gBI7u2tx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id kf17si12872412ejc.308.2021.03.23.00.59.18; Tue, 23 Mar 2021 00:59:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=gBI7u2tx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229472AbhCWH6Q (ORCPT + 99 others); Tue, 23 Mar 2021 03:58:16 -0400 Received: from mx2.suse.de ([195.135.220.15]:45384 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbhCWH5y (ORCPT ); Tue, 23 Mar 2021 03:57:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1616486273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m+xTnYW1KWqoinuxGp7EGKTcA3wA3snicXhoUVTBaLY=; b=gBI7u2txerAlccab8XBKnv33BbpVCouLB++S/f+B0QjRW5IiiVv71zU/PRXmDO98S/q+Ji 0nL/hGlX1rc1pdUc4EAME83m79HAcLyE5NnE8M+f32cTcreLVwooL4ljboejCZxirnWUDx rlnb8xLk9dADwrdKE9+UHE1cLuzBLPc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 86F56AC1D; Tue, 23 Mar 2021 07:57:53 +0000 (UTC) Date: Tue, 23 Mar 2021 08:57:46 +0100 From: Michal Hocko To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , Oscar Salvador , David Hildenbrand , Muchun Song , David Rientjes , Miaohe Lin , Peter Zijlstra , Matthew Wilcox , HORIGUCHI NAOYA , "Aneesh Kumar K . V" , Waiman Long , Peter Xu , Mina Almasry , Andrew Morton Subject: Re: [RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page Message-ID: References: <20210319224209.150047-1-mike.kravetz@oracle.com> <20210319224209.150047-6-mike.kravetz@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 22-03-21 16:28:07, Mike Kravetz wrote: > On 3/22/21 7:31 AM, Michal Hocko wrote: > > On Fri 19-03-21 15:42:06, Mike Kravetz wrote: > > [...] > >> @@ -2090,9 +2084,15 @@ static void return_unused_surplus_pages(struct hstate *h, > >> while (nr_pages--) { > >> h->resv_huge_pages--; > >> unused_resv_pages--; > >> - if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1)) > >> + page = remove_pool_huge_page(h, &node_states[N_MEMORY], 1); > >> + if (!page) > >> goto out; > >> - cond_resched_lock(&hugetlb_lock); > >> + > >> + /* Drop lock and free page to buddy as it could sleep */ > >> + spin_unlock(&hugetlb_lock); > >> + update_and_free_page(h, page); > >> + cond_resched(); > >> + spin_lock(&hugetlb_lock); > >> } > >> > >> out: > > > > This is likely a matter of taste but the repeated pattern of unlock, > > update_and_free_page, cond_resched and lock seems rather clumsy. > > Would it be slightly better/nicer to remove_pool_huge_page into a > > list_head under a single lock invocation and then free up the whole lot > > after the lock is dropped? > > Yes, we can certainly do that. > One downside I see is that the list can contain a bunch of pages not > accounted for in hugetlb and not free in buddy (or cma). Ideally, we > would want to keep those in sync if possible. Also, the commit that > added the cond_resched talked about freeing up 12 TB worth of huge pages > and it holding the lock for 150 seconds. The new code is not holding > the lock while calling free to buddy, but I wonder how long it would > take to remove 12 TB worth of huge pages and add them to a separate list? Well, the remove_pool_huge_page is just a accounting part and that should be pretty invisible even when the number of pages is large. The lockless nature (from hugetlb POV) of the final page release is the heavy weight operation and whether you do it in chunks or in a single go (with cond_resched) should be visible either. We already do the same thing when uncharging memcg pages (mem_cgroup_uncharge_list). So I would agree with you that this would be a much bigger problem if both the hugetlb and freeing path were equally heavy weight and the delay between first pages uncaccounted and freed would be noticeable. But I do not want to push for this. I just hated the hugetlb_lock dances as this is ugly and repetitive pattern. -- Michal Hocko SUSE Labs