Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761285Ab2KAAo4 (ORCPT ); Wed, 31 Oct 2012 20:44:56 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:44689 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761212Ab2KAAoy (ORCPT ); Wed, 31 Oct 2012 20:44:54 -0400 X-AuditID: 9c930179-b7c8bae000003559-36-5091c6034a07 Date: Thu, 1 Nov 2012 09:50:52 +0900 From: Minchan Kim To: Paul Turner Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Stultz , Christoph Lameter , Android Kernel Team , Robert Love , Mel Gorman , Hugh Dickins , Dave Hansen , Rik van Riel , Dave Chinner , Neil Brown , Mike Hommey , Taras Glek , KOSAKI Motohiro , KAMEZAWA Hiroyuki , sanjay@google.com, David Rientjes Subject: Re: [RFC v2] Support volatile range for anon vma Message-ID: <20121101005052.GB26256@bbox> References: <1351560594-18366-1-git-send-email-minchan@kernel.org> <20121031143524.0509665d.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3208 Lines: 77 Hello, On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote: > On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton > wrote: > > > > On Tue, 30 Oct 2012 10:29:54 +0900 > > Minchan Kim wrote: > > > > > This patch introudces new madvise behavior MADV_VOLATILE and > > > MADV_NOVOLATILE for anonymous pages. It's different with > > > John Stultz's version which considers only tmpfs while this patch > > > considers only anonymous pages so this cannot cover John's one. > > > If below idea is proved as reasonable, I hope we can unify both > > > concepts by madvise/fadvise. > > > > > > Rationale is following as. > > > Many allocators call munmap(2) when user call free(3) if ptr is > > > in mmaped area. But munmap isn't cheap because it have to clean up > > > all pte entries and unlinking a vma so overhead would be increased > > > linearly by mmaped area's size. > > > > Presumably the userspace allocator will internally manage memory in > > large chunks, so the munmap() call frequency will be much lower than > > the free() call frequency. So the performance gains from this change > > might be very small. > > I don't think I strictly understand the motivation from a > malloc-standpoint here. > > These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want > to perform discards on Linux. For any reasonable allocator (short > of binding malloc --> mmap, free --> unmap) this seems a better > choice. > > Note also from a performance stand-point I doubt any allocator (which > case about performance) is going to want to pay the cost of even a > null syscall about typical malloc/free usage (consider: a tcmalloc Good point. > malloc/free pairis currently <20ns). Given then that this cost is > amortized once you start doing discards on larger blocks MADV_DONTNEED > seems a preferable interface: > - You don't need to reconstruct an arena when you do want to allocate > since there's no munmap/mmap for the region to change about > - There are no syscalls involved in later reallocating the block. Above benefits are applied on MADV_VOLATILE, too. But as you pointed out, there is a little bit overhead than DONTNEED because allocator should call madvise(MADV_NOVOLATILE) before allocation. For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem and could be a problem on parallel malloc/free workload as KOSAKI pointed out. In such case, we can change semantic so malloc doesn't need to call madivse(NOVOLATILE) before allocating. Then, page fault handler have to check whether this page fault happen by access of volatile vma. If so, it could return zero page instead of SIGBUS and mark the vma isn't volatile any more. > > The only real additional cost is address-space. Are you strongly > concerned about the 32-bit case? No. I believe allocators have a logic to clean up them once address space is almost full. Thanks, Paul. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/