Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751881AbdLNAqy (ORCPT ); Wed, 13 Dec 2017 19:46:54 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:58024 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751762AbdLNAqv (ORCPT ); Wed, 13 Dec 2017 19:46:51 -0500 Subject: Re: [RFC PATCH 4/5] mm, hugetlb: get rid of surplus page accounting tricks To: Michal Hocko , linux-mm@kvack.org Cc: Naoya Horiguchi , Andrew Morton , LKML , Michal Hocko References: <20171204140117.7191-1-mhocko@kernel.org> <20171204140117.7191-5-mhocko@kernel.org> From: Mike Kravetz Message-ID: <8b07431a-547e-4330-4276-570ef861bb35@oracle.com> Date: Wed, 13 Dec 2017 16:45:55 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171204140117.7191-5-mhocko@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8744 signatures=668646 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712140008 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5060 Lines: 134 On 12/04/2017 06:01 AM, Michal Hocko wrote: > From: Michal Hocko > > alloc_surplus_huge_page increases the pool size and the number of > surplus pages opportunistically to prevent from races with the pool size > change. See d1c3fb1f8f29 ("hugetlb: introduce nr_overcommit_hugepages > sysctl") for more details. > > The resulting code is unnecessarily hairy, cause code duplication and > doesn't allow to share the allocation paths. Moreover pool size changes > tend to be very seldom so optimizing for them is not really reasonable. > Simplify the code and allow to allocate a fresh surplus page as long as > we are under the overcommit limit and then recheck the condition after > the allocation and drop the new page if the situation has changed. This > should provide a reasonable guarantee that an abrupt allocation requests > will not go way off the limit. > > If we consider races with the pool shrinking and enlarging then we > should be reasonably safe as well. In the first case we are off by one > in the worst case and the second case should work OK because the page is > not yet visible. We can waste CPU cycles for the allocation but that > should be acceptable for a relatively rare condition. > > Signed-off-by: Michal Hocko > --- > mm/hugetlb.c | 60 +++++++++++++++++++++--------------------------------------- > 1 file changed, 21 insertions(+), 39 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a1b8b2888ec9..0c7dc269b6c0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1538,62 +1538,44 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > int nid, nodemask_t *nmask) > { > - struct page *page; > - unsigned int r_nid; > + struct page *page = NULL; > > if (hstate_is_gigantic(h)) > return NULL; > > - /* > - * Assume we will successfully allocate the surplus page to > - * prevent racing processes from causing the surplus to exceed > - * overcommit > - * > - * This however introduces a different race, where a process B > - * tries to grow the static hugepage pool while alloc_pages() is > - * called by process A. B will only examine the per-node > - * counters in determining if surplus huge pages can be > - * converted to normal huge pages in adjust_pool_surplus(). A > - * won't be able to increment the per-node counter, until the > - * lock is dropped by B, but B doesn't drop hugetlb_lock until > - * no more huge pages can be converted from surplus to normal > - * state (and doesn't try to convert again). Thus, we have a > - * case where a surplus huge page exists, the pool is grown, and > - * the surplus huge page still exists after, even though it > - * should just have been converted to a normal huge page. This > - * does not leak memory, though, as the hugepage will be freed > - * once it is out of use. It also does not allow the counters to > - * go out of whack in adjust_pool_surplus() as we don't modify > - * the node values until we've gotten the hugepage and only the > - * per-node value is checked there. > - */ > spin_lock(&hugetlb_lock); > - if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { > - spin_unlock(&hugetlb_lock); > - return NULL; > - } else { > - h->nr_huge_pages++; > - h->surplus_huge_pages++; > - } > + if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) > + goto out_unlock; > spin_unlock(&hugetlb_lock); > > page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask); > + if (!page) > + goto out_unlock; > > spin_lock(&hugetlb_lock); > - if (page) { > + /* > + * We could have raced with the pool size change. > + * Double check that and simply deallocate the new page > + * if we would end up overcommiting the surpluses. Abuse > + * temporary page to workaround the nasty free_huge_page > + * codeflow > + */ > + if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { > + SetPageHugeTemporary(page); > + put_page(page); > + page = NULL; > + } else { > + h->surplus_huge_pages_node[page_to_nid(page)]++; > + h->surplus_huge_pages++; > INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > set_hugetlb_cgroup(page, NULL); > - /* > - * We incremented the global counters already > - */ > h->nr_huge_pages_node[r_nid]++; > h->surplus_huge_pages_node[r_nid]++; > - } else { > - h->nr_huge_pages--; > - h->surplus_huge_pages--; In the case of a successful surplus allocation, the following counters are incremented: h->surplus_huge_pages_node[page_to_nid(page)]++; h->surplus_huge_pages++; h->nr_huge_pages_node[r_nid]++; h->surplus_huge_pages_node[r_nid]++; Looks like per-node surplus_huge_pages_node is incremented twice, and global nr_huge_pages is not incremented at all. Also, you removed r_nid so I'm guessing this will not compile? -- Mike Kravetz > } > + > +out_unlock: > spin_unlock(&hugetlb_lock); > > return page; >