Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756285AbbKDSXh (ORCPT ); Wed, 4 Nov 2015 13:23:37 -0500 Received: from mail-ob0-f177.google.com ([209.85.214.177]:33378 "EHLO mail-ob0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754990AbbKDSXe (ORCPT ); Wed, 4 Nov 2015 13:23:34 -0500 MIME-Version: 1.0 In-Reply-To: <56399CA5.8090101@gmail.com> References: <1446600367-7976-1-git-send-email-minchan@kernel.org> <1446600367-7976-2-git-send-email-minchan@kernel.org> <56399CA5.8090101@gmail.com> From: Andy Lutomirski Date: Wed, 4 Nov 2015 10:23:13 -0800 Message-ID: Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE) To: Daniel Micay Cc: Minchan Kim , Hugh Dickins , Andrew Morton , Michael Kerrisk , Michal Hocko , "linux-mm@kvack.org" , KOSAKI Motohiro , "Kirill A. Shutemov" , Rik van Riel , Johannes Weiner , Linux API , Jason Evans , Shaohua Li , "linux-kernel@vger.kernel.org" , yalin wang , Mel Gorman Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3837 Lines: 88 On Tue, Nov 3, 2015 at 9:50 PM, Daniel Micay wrote: >> Does this set the write protect bit? >> >> What happens on architectures without hardware dirty tracking? > > It's supposed to avoid needing page faults when the data is accessed > again, but it can just be implemented via page faults on architectures > without a way to check for access or writes. MADV_DONTNEED is also a > valid implementation of MADV_FREE if it comes to that (which is what it > does on swapless systems for now). I wonder whether arches without the requisite tracking should just turn it off. While it might be faster than MADV_DONTNEED or munmap on those arches, it doesn't really deserve to be faster. > >> Using the dirty bit for these semantics scares me. This API creates a >> page that can have visible nonzero contents and then can >> asynchronously and magically zero itself thereafter. That makes me >> nervous. Could we use the accessed bit instead? Then the observable >> semantics would be equivalent to having MADV_FREE either zero the page >> or do nothing, except that it doesn't make up its mind until the next >> read. > > FWIW, those are already basically the semantics provided by GCC and LLVM > for data the compiler considers uninitialized (they could be more > aggressive since C just says it's undefined, but in practice they allow > it but can produce inconsistent results even if it isn't touched). > > http://llvm.org/docs/LangRef.html#undefined-values But C isn't the only thing in the world. Also, I think that a C optimizer should be free to turn: if ([complicated condition]) *ptr = 1; into: if (*ptr != 1 && [complicated condition]) *ptr = 1; as long as [complicated condition] has no side effects. The MADV_FREE semantics in this patch set break that. > > It doesn't seem like there would be an advantage to checking if the data > was written to vs. whether it was accessed if checking for both of those > is comparable in performance. I don't know enough about that. I'd imagine that there would be no performance difference whatsoever on hardware that has a real accessed bit. The only thing that changes is the choice of which bit to use. > >>> + ptent = pte_mkold(ptent); >>> + ptent = pte_mkclean(ptent); >>> + set_pte_at(mm, addr, pte, ptent); >>> + tlb_remove_tlb_entry(tlb, pte, addr); >> >> It looks like you are flushing the TLB. In a multithreaded program, >> that's rather expensive. Potentially silly question: would it be >> better to just zero the page immediately in a multithreaded program >> and then, when swapping out, check the page is zeroed and, if so, skip >> swapping it out? That could be done without forcing an IPI. > > In the common case it will be passed many pages by the allocator. There > will still be a layer of purging logic on top of MADV_FREE but it can be > much thinner than the current workarounds for MADV_DONTNEED. So the > allocator would still be coalescing dirty ranges and only purging when > the ratio of dirty:clean pages rises above some threshold. It would be > able to weight the largest ranges for purging first rather than logic > based on stuff like aging as is used for MADV_DONTNEED. > With enough pages at once, though, munmap would be fine, too. Maybe what's really needed is a MADV_FREE variant that takes an iovec. On an all-cores multithreaded mm, the TLB shootdown broadcast takes thousands of cycles on each core more or less regardless of how much of the TLB gets zapped. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/