Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp6535130pxb; Wed, 17 Feb 2021 07:05:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJx7NkFDk6kUIc9XiU4fhlDRhl5DDva45kxEARWL4BtZX0NNpyZ5/KRKbe5X9ZchscGGa/RR X-Received: by 2002:a17:907:1186:: with SMTP id uz6mr2159433ejb.330.1613574307626; Wed, 17 Feb 2021 07:05:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613574307; cv=none; d=google.com; s=arc-20160816; b=ylT/Br9pbxfnIr9y9QXi8FHa5QahCDT/qm8K1iPQJRebc7GjdThr6sIxNvNQ4yTrzj 4IdhI+HlKyjfgs7lyEQV9obMgMv5zMAXZ8e9bL1N1qla6disCuCL6mg68lDb+xeVdR6F kmO2rnRmKm7qQGPaY8LPrn8S9avrtuMbOI5dYHF1ri9zEPudcIvo+h7Cc7mCC1aBhNU2 lRwbqEq82SyehjavftVhUrHtpnef4RSO+eOdidISDhXuuKN8BejNkn+HoHghJpm2mYrC JWImtqc5YKUiM54PwqftaNynj/TWDa3NivppWKdHYlk2FIFuZOCvB/VTVLMldnHB46RO VPjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=zNdiGqAoz3ZwhvlUdJkNCXpe7aX8eZygEhCr3tOEEWw=; b=G3SYjW4P+R/lBxt+FP5gZfWFFLSi+pWlYJ/638bXMsQHPQkw7i/IbNz3LkzDrcg1DP i3VjNNmVF6gXCJJ6Z+XRCA/uMY+fDbajW8MRTnzV9o4O5rihElj8YFuEouEvoIijzcqJ 1TZQ8MriHsbXOePjW5A7tU0lzZGPQUjC93CeZs8Tld0zoCIp5cGou+04KTpvmjx9m/X0 wnXaxa779sYNFsfsyheowwvbbN75a3wC6fm4Nm+Sb+cPSwBOKfCbYRaPoAFdbMcaqVbR rBfshOCpHg8S4hH6psBTZzG9YDOoeGaVfvC4yu2Wat/891TAym0g1aVAjrh/PmDlNaBO 7BUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=eJ64cNAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l27si1711121eji.371.2021.02.17.07.04.41; Wed, 17 Feb 2021 07:05:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=eJ64cNAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233613AbhBQPCB (ORCPT + 99 others); Wed, 17 Feb 2021 10:02:01 -0500 Received: from mx2.suse.de ([195.135.220.15]:40642 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233743AbhBQPBG (ORCPT ); Wed, 17 Feb 2021 10:01:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1613574018; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zNdiGqAoz3ZwhvlUdJkNCXpe7aX8eZygEhCr3tOEEWw=; b=eJ64cNAG041bxJF2KFRBBXXn727Q2vCB/jsaissiJL9tVdOl0408WcejPxq1wISIB11yAh 3bcxpmrBpQw1RZhQoYnHQMY1Ry9pO6CUsa8UvPvyYli1ScXKJiU36gScBtZlW/SG6/cc8R H1IurCthlR+0+3ClJ0ZsyFhyLONGLnI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9AF0EB8FE; Wed, 17 Feb 2021 15:00:18 +0000 (UTC) Date: Wed, 17 Feb 2021 16:00:11 +0100 From: Michal Hocko To: Oscar Salvador Cc: Andrew Morton , Mike Kravetz , David Hildenbrand , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages Message-ID: References: <20210217100816.28860-1-osalvador@suse.de> <20210217100816.28860-2-osalvador@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210217100816.28860-2-osalvador@suse.de> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 17-02-21 11:08:15, Oscar Salvador wrote: [...] > +static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page) > +{ > + gfp_t gfp_mask = htlb_alloc_mask(h); > + nodemask_t *nmask = &node_states[N_MEMORY]; > + struct page *new_page; > + bool ret = false; > + int nid; > + > + spin_lock(&hugetlb_lock); > + /* > + * Check one more time to make race-window smaller. > + */ > + if (!PageHuge(page)) { > + /* > + * Dissolved from under our feet. > + */ > + spin_unlock(&hugetlb_lock); > + return true; > + } Is this really necessary? dissolve_free_huge_page will take care of this and the race windown you are covering is really tiny. > + > + nid = page_to_nid(page); > + spin_unlock(&hugetlb_lock); > + > + /* > + * Before dissolving the page, we need to allocate a new one, > + * so the pool remains stable. > + */ > + new_page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); wrt. fallback to other zones, I haven't realized that the primary usecase is a form of memory offlining (from virt-mem). I am not yet sure what the proper behavior is in that case but if breaking hugetlb pools, similar to the normal hotplug operation, is viable then this needs a special mode. We do not want a random alloc_contig_range user to do the same. So for starter I would go with __GFP_THISNODE here. > + if (new_page) { > + /* > + * Ok, we got a new free hugepage to replace this one. Try to > + * dissolve the old page. > + */ > + if (!dissolve_free_huge_page(page)) { > + ret = true; > + } else if (dissolve_free_huge_page(new_page)) { > + /* > + * Seems the old page could not be dissolved, so try to > + * dissolve the freshly allocated page. If that fails > + * too, let us count the new page as a surplus. Doing so > + * allows the pool to be re-balanced when pages are freed > + * instead of enqueued again. > + */ > + spin_lock(&hugetlb_lock); > + h->surplus_huge_pages++; > + h->surplus_huge_pages_node[nid]++; > + spin_unlock(&hugetlb_lock); > + } > + /* > + * Free it into the hugepage allocator > + */ > + put_page(new_page); > + } > + > + return ret; > +} > + > +bool isolate_or_dissolve_huge_page(struct page *page) > +{ > + struct hstate *h = NULL; > + struct page *head; > + bool ret = false; > + > + spin_lock(&hugetlb_lock); > + if (PageHuge(page)) { > + head = compound_head(page); > + h = page_hstate(head); > + } > + spin_unlock(&hugetlb_lock); > + > + if (!h) > + /* > + * The page might have been dissolved from under our feet. > + * If that is the case, return success as if we dissolved it > + * ourselves. > + */ > + return true; nit I would put the comment above the conditin for both cases. It reads more easily that way. At least without { }. > + > + if (hstate_is_gigantic(h)) > + /* > + * Fence off gigantic pages as there is a cyclic dependency > + * between alloc_contig_range and them. > + */ > + return ret; > + > + if(!page_count(head) && alloc_and_dissolve_huge_page(h, head)) > + ret = true; > + > + return ret; > +} > + > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { Other than that I haven't noticed any surprises. -- Michal Hocko SUSE Labs