Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752369Ab3I3Hqy (ORCPT ); Mon, 30 Sep 2013 03:46:54 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:58156 "EHLO LGEAMRELO01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751297Ab3I3Hqw (ORCPT ); Mon, 30 Sep 2013 03:46:52 -0400 X-AuditID: 9c93017d-b7c2dae000000ea6-e7-52492c6a7526 Date: Mon, 30 Sep 2013 16:47:44 +0900 From: Joonsoo Kim To: David Gibson Cc: Andrew Morton , Rik van Riel , Mel Gorman , Michal Hocko , "Aneesh Kumar K.V" , KAMEZAWA Hiroyuki , Hugh Dickins , Davidlohr Bueso , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Wanpeng Li , Naoya Horiguchi , Hillf Danton Subject: Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and there is concurrent user Message-ID: <20130930074744.GA15351@lge.com> References: <1376040398-11212-1-git-send-email-iamjoonsoo.kim@lge.com> <1376040398-11212-20-git-send-email-iamjoonsoo.kim@lge.com> <20130905011553.GA10158@voom.redhat.com> <20130905054357.GA23597@lge.com> <20130916120909.GA2706@voom.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130916120909.GA2706@voom.fritz.box> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2973 Lines: 60 On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote: > > > > > > > + *do_dequeue = false; > > > > spin_unlock(&hugetlb_lock); > > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > > > if (!page) { > > > > > > I think the counter also needs to be incremented in the case where we > > > call alloc_buddy_huge_page() from alloc_huge_page(). Even though it's > > > new, it gets added to the hugepage pool at this point and could still > > > be a contended page for the last allocation, unless I'm missing > > > something. > > > > Your comment has reasonable point to me, but I have a different opinion. > > > > As I already mentioned, the point is that we want to avoid the race > > which kill the legitimate users of hugepages by out of resources. > > I increase 'h->nr_dequeue_users' when the hugepage allocated by > > administrator is dequeued. It is because what the hugepage I want to > > protect from the race is the one allocated by administrator via > > kernel param or /proc interface. Administrator may already know how many > > hugepages are needed for their application so that he may set nr_hugepage > > to reasonable value. I want to guarantee that these hugepages can be used > > for his application without any race, since he assume that the application > > would work fine with these hugepages. > > > > To protect hugepages returned from alloc_buddy_huge_page() from the race > > is different for me. Although it will be added to the hugepage pool, this > > doesn't guarantee certain application's success more. If certain > > application's success depends on the race of this new hugepage, it's death > > by the race doesn't matter, since nobody assume that it works fine. > > Hrm. I still think this path should be included. Although I'll agree > that failing in this case is less bad. > > However, it can still lead to a situation where with two processes or > threads, faulting on exactly the same shared page we have one succeed > and the other fail. That's a strange behaviour and I think we want to > avoid it in this case too. Hello, David. I don't think it is a strange behaviour. Similar situation can occur even though we use the mutex. Hugepage allocation can be failed when the first process try to allocate the hugepage while second process is blocked by the mutex. And then, second process will go into the fault handler. And at this time, it can succeed. So result is that we have one succeed and the other fail. It is slightly different from the case you mentioned, but I think that effect for user is same. We cannot avoid this kind of race completely and I think that avoiding the race for administrator managed hugepage pool is good enough to use. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/