Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751297AbbEYT6b (ORCPT ); Mon, 25 May 2015 15:58:31 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45038 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751001AbbEYT6a (ORCPT ); Mon, 25 May 2015 15:58:30 -0400 Message-ID: <1432583887.2185.53.camel@stgolabs.net> Subject: Re: [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race From: Davidlohr Bueso To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Naoya Horiguchi , David Rientjes , Luiz Capitulino , Andrew Morton In-Reply-To: <1432353304-12767-1-git-send-email-mike.kravetz@oracle.com> References: <1432353304-12767-1-git-send-email-mike.kravetz@oracle.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 25 May 2015 12:58:07 -0700 Mime-Version: 1.0 X-Mailer: Evolution 3.12.11 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3971 Lines: 94 On Fri, 2015-05-22 at 20:55 -0700, Mike Kravetz wrote: > This updated patch set includes new documentation for the region/ > reserve map routines. Since I am not the original author of this > code, comments would be appreciated. > > While working on hugetlbfs fallocate support, I noticed the following > race in the existing code. It is unlikely that this race is hit very > often in the current code. Have you actually run into this issue? Can you produce a testcase? > However, if more functionality to add and > remove pages to hugetlbfs mappings (such as fallocate) is added the > likelihood of hitting this race will increase. > > alloc_huge_page and hugetlb_reserve_pages use information from the > reserve map to determine if there are enough available huge pages to > complete the operation, as well as adjust global reserve and subpool > usage counts. The order of operations is as follows: > - call region_chg() to determine the expected change based on reserve map > - determine if enough resources are available for this operation > - adjust global counts based on the expected change > - call region_add() to update the reserve map > The issue is that reserve map could change between the call to region_chg > and region_add. In this case, the counters which were adjusted based on > the output of region_chg will not be correct. > > In order to hit this race today, there must be an existing shared hugetlb > mmap created with the MAP_NORESERVE flag. A page fault to allocate a huge > page via this mapping must occur at the same another task is mapping the > same region without the MAP_NORESERVE flag. In the past file regions were serialized by either mmap_sem (exclusive) or the hugetlb instantiation mutex (when mmap_sem was shared). With finer grained locking, however, we now rely on the resv_map->lock. So I guess you are referring to something like this, no? CPU0 (via vma_[needs/commit]_reservation) CPU1 hugetlb_fault mutex_lock(hash_A) hugetlb_no_page alloc_huge_page shm_get region_chg hugetlb_file_setup hugetlb_reserve_pages region_chg region_add region_add Couldn't this race also occur upon concurrent faults on two different hashes backed by the same vma? Anyway, it's memorial day, so I'll take a closer look during the week, but you seem to be correct. An alternative could be to continue holding the spinlock until the after region_add, but I like your "fixup" approach. > The patch set does not prevent the race from happening. Rather, it adds > simple functionality to detect when the race has occurred. If a race is > detected, then the incorrect counts are adjusted. > > v2: > Added documentation for the region/reserve map routines Thanks for doing this, as akpm mentioned, it is much needed. However, this should be a new, separate patch. > Created common routine for vma_commit_reservation and > vma_commit_reservation to help prevent them from drifting > apart in the future. > > Mike Kravetz (2): > mm/hugetlb: compute/return the number of regions added by region_add() > mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages Ah, so these two patches are duplicates from your fallocate series, right? You should drop those from that patchset then, as bugfixes should be separate. Could you rename patch 2 to something more meaningful? ie: mm/hugetlb: account for races between region_chg and region_add Also, gosh those function names are nasty and unclear -- I would change them to region_prepare and region_commit, or something like that where the purpose is more obvious. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/