Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761383Ab2KAB12 (ORCPT ); Wed, 31 Oct 2012 21:27:28 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:48683 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761179Ab2KAB10 (ORCPT ); Wed, 31 Oct 2012 21:27:26 -0400 X-AuditID: 9c930179-b7c8bae000003559-51-5091cffcd49e Date: Thu, 1 Nov 2012 10:33:25 +0900 From: Minchan Kim To: Paul Turner Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Stultz , Christoph Lameter , Android Kernel Team , Robert Love , Mel Gorman , Hugh Dickins , Dave Hansen , Rik van Riel , Dave Chinner , Neil Brown , Mike Hommey , Taras Glek , KOSAKI Motohiro , KAMEZAWA Hiroyuki , sanjay@google.com, David Rientjes Subject: Re: [RFC v2] Support volatile range for anon vma Message-ID: <20121101013325.GD26256@bbox> References: <1351560594-18366-1-git-send-email-minchan@kernel.org> <20121031143524.0509665d.akpm@linux-foundation.org> <20121101005052.GB26256@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3528 Lines: 77 On Wed, Oct 31, 2012 at 06:22:58PM -0700, Paul Turner wrote: > On Wed, Oct 31, 2012 at 5:50 PM, Minchan Kim wrote: > > Hello, > > > > On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote: > >> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton > >> wrote: > >> > > >> > On Tue, 30 Oct 2012 10:29:54 +0900 > >> > Minchan Kim wrote: > >> > > >> > > This patch introudces new madvise behavior MADV_VOLATILE and > >> > > MADV_NOVOLATILE for anonymous pages. It's different with > >> > > John Stultz's version which considers only tmpfs while this patch > >> > > considers only anonymous pages so this cannot cover John's one. > >> > > If below idea is proved as reasonable, I hope we can unify both > >> > > concepts by madvise/fadvise. > >> > > > >> > > Rationale is following as. > >> > > Many allocators call munmap(2) when user call free(3) if ptr is > >> > > in mmaped area. But munmap isn't cheap because it have to clean up > >> > > all pte entries and unlinking a vma so overhead would be increased > >> > > linearly by mmaped area's size. > >> > > >> > Presumably the userspace allocator will internally manage memory in > >> > large chunks, so the munmap() call frequency will be much lower than > >> > the free() call frequency. So the performance gains from this change > >> > might be very small. > >> > >> I don't think I strictly understand the motivation from a > >> malloc-standpoint here. > >> > >> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want > >> to perform discards on Linux. For any reasonable allocator (short > >> of binding malloc --> mmap, free --> unmap) this seems a better > >> choice. > >> > >> Note also from a performance stand-point I doubt any allocator (which > >> case about performance) is going to want to pay the cost of even a > >> null syscall about typical malloc/free usage (consider: a tcmalloc > > > > Good point. > > > >> malloc/free pairis currently <20ns). Given then that this cost is > >> amortized once you start doing discards on larger blocks MADV_DONTNEED > >> seems a preferable interface: > >> - You don't need to reconstruct an arena when you do want to allocate > >> since there's no munmap/mmap for the region to change about > >> - There are no syscalls involved in later reallocating the block. > > > > Above benefits are applied on MADV_VOLATILE, too. > > But as you pointed out, there is a little bit overhead than DONTNEED > > because allocator should call madvise(MADV_NOVOLATILE) before allocation. > > For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem > > and could be a problem on parallel malloc/free workload as KOSAKI pointed out. > > > > In such case, we can change semantic so malloc doesn't need to call > > madivse(NOVOLATILE) before allocating. Then, page fault handler have to > > check whether this page fault happen by access of volatile vma. If so, > > it could return zero page instead of SIGBUS and mark the vma isn't volatile > > any more. > > I think being able to determine whether the backing was discarded > (about a atomic transition to non-volatile) would be a required > property to make this useful for non-malloc use-cases. > Absolutely. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/