Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757486AbcJXUln (ORCPT ); Mon, 24 Oct 2016 16:41:43 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:40007 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755001AbcJXUll (ORCPT ); Mon, 24 Oct 2016 16:41:41 -0400 Subject: Re: [PATCH 1/1] mm/hugetlb: fix huge page reservation leak in private mapping error paths To: "Aneesh Kumar K.V" , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1476933077-23091-1-git-send-email-mike.kravetz@oracle.com> <1476933077-23091-2-git-send-email-mike.kravetz@oracle.com> <87vawjthzr.fsf@linux.vnet.ibm.com> Cc: Naoya Horiguchi , Michal Hocko , "Kirill A . Shutemov" , Hillf Danton , Dave Hansen , Jan Stancek , stable@vger.kernel.org From: Mike Kravetz Message-ID: <4ec1a385-396a-7aab-1981-6dce58a12109@oracle.com> Date: Mon, 24 Oct 2016 13:40:23 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <87vawjthzr.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3650 Lines: 90 On 10/23/2016 04:36 AM, Aneesh Kumar K.V wrote: > Mike Kravetz writes: > >> Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly >> allocated huge page. If a reservation was associated with the huge >> page, alloc_huge_page() consumed the reservation while allocating. >> When the newly allocated page is freed in free_huge_page(), it will >> increment the global reservation count. However, the reservation entry >> in the reserve map will remain. This is not an issue for shared >> mappings as the entry in the reserve map indicates a reservation exists. >> But, an entry in a private mapping reserve map indicates the reservation >> was consumed and no longer exists. This results in an inconsistency >> between the reserve map and the global reservation count. This 'leaks' >> a reserved huge page. >> >> Create a new routine restore_reserve_on_error() to restore the reserve >> entry in these specific error paths. This routine makes use of a new >> function vma_add_reservation() which will add a reserve entry for a >> specific address/page. >> >> In general, these error paths were rarely (if ever) taken on most >> architectures. However, powerpc contained arch specific code that >> that resulted in an extra fault and execution of these error paths >> on all private mappings. >> >> Fixes: 67961f9db8c4 ("mm/hugetlb: fix huge page reserve accounting for private mappings) >> >> Cc: stable@vger.kernel.org >> Reported-by: Jan Stancek >> Signed-off-by: Mike Kravetz >> --- >> mm/hugetlb.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 66 insertions(+) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index ec49d9e..418bf01 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1826,11 +1826,17 @@ static void return_unused_surplus_pages(struct hstate *h, >> * is not the case is if a reserve map was changed between calls. It >> * is the responsibility of the caller to notice the difference and >> * take appropriate action. >> + * >> + * vma_add_reservation is used in error paths where a reservation must >> + * be restored when a newly allocated huge page must be freed. It is >> + * to be called after calling vma_needs_reservation to determine if a >> + * reservation exists. >> */ >> enum vma_resv_mode { >> VMA_NEEDS_RESV, >> VMA_COMMIT_RESV, >> VMA_END_RESV, >> + VMA_ADD_RESV, >> }; >> static long __vma_reservation_common(struct hstate *h, >> struct vm_area_struct *vma, unsigned long addr, >> @@ -1856,6 +1862,14 @@ static long __vma_reservation_common(struct hstate *h, >> region_abort(resv, idx, idx + 1); >> ret = 0; >> break; >> + case VMA_ADD_RESV: >> + if (vma->vm_flags & VM_MAYSHARE) >> + ret = region_add(resv, idx, idx + 1); >> + else { >> + region_abort(resv, idx, idx + 1); >> + ret = region_del(resv, idx, idx + 1); >> + } > > It is confusing to find ADD_RESV doing region_del, but I don't have > suggestion for a better name. Thanks for the review Aneesh. Of course, this naming is the result of shared and private mappings having completely opposite reserve map semantics. In shared mappings, an entry in the reserve map indicates a reservation exists. For private mappings, the absence of an entry in the reserve map indicates a reservation exists. -- Mike Kravetz > > -aneesh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org >