Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751684AbbD2TXK (ORCPT ); Wed, 29 Apr 2015 15:23:10 -0400 Received: from mail-la0-f45.google.com ([209.85.215.45]:34159 "EHLO mail-la0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750791AbbD2TXD (ORCPT ); Wed, 29 Apr 2015 15:23:03 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 29 Apr 2015 20:23:01 +0100 Message-ID: Subject: Re: Regression: Requiring CAP_SYS_ADMIN for /proc//pagemap causes application-level breakage From: Mark Williamson To: Mark Seaborn Cc: kernel list , "Kirill A. Shutemov" , Pavel Emelyanov , Konstantin Khlebnikov , Andrew Morton , Linus Torvalds , Andy Lutomirski , Linux API , Finn Grimwood , Daniel James Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4115 Lines: 88 Hi again, On Wed, Apr 29, 2015 at 7:44 PM, Mark Williamson wrote: > We've been investigating further and found a snag with the PFN-hiding > approach discussed last week - looks like it won't be enough on all > the architectures we support. Our product runs on x86_32, x86_64 and > ARM. For now, it looks like soft-dirty is only available on x86_64. > A patch that simply zeros out the physical addresses in > /proc/PID/pagemap will therefore help us on x86_64 but we'll still > have problems on other platforms[1]. Another thought occurs - although we *strictly* want to know "what got written to", we might be able to get by with a superset of that, such as "what got accessed, read or write"... Thus, we could investigate clearing the Referenced bit (which I understand we can do through /proc/PID/clear_refs) and then just treat any subsequently-referenced pages as being potentially modified. It's not ideal but it might be enough to get by... I still feel a little nervous with this, since we support distros (e.g. RHEL5) that are too old to have clear_refs. Still, it would result in less disruption to the format of pagemap. Thanks, Mark > For context, we were previously using pagemap as a cross-platform way > to get soft-dirty-like functionality. Specifically, to ask "did a > process write to any pages since fork()" by comparing addresses and > deducing where CoW must have occurred. In the absence of soft-dirty > and the physical addresses, it looks like we can't figure that out > with the remaining information in pagemap. > > If the pagemap file included the "writeable" bit from the PTE, we > think we'd have all the information required to deduce what we need > (although I realise that's a bit of a nasty workaround). If I > proposed including the PTE protection bits in pagemap, would that be > controversial? I'm guessing yes but thought it was worth a shot ;-) > Would anybody be able to suggest a more tasteful approach? > > Thanks, > Mark > > [1] I'd note that using soft-dirty is clearly the right approach for > us on x64, where available and that ideally we'd use it on other > architectures - cross-arch support for soft-dirty is a slightly > different discussion, which I hope to post another thread for. > > On Fri, Apr 24, 2015 at 5:43 PM, Mark Williamson > wrote: >> Hi Mark, >> >> On Fri, Apr 24, 2015 at 4:26 PM, Mark Seaborn wrote: >>> I'm curious, what do you use the physical page addresses for? >>> >>> Since you pointed to http://undo-software.com, which talks about >>> reversible debugging tools, I can guess you would use the soft-dirty >>> flag to implement copy-on-write snapshotting. I'm guessing you might >>> use physical page addresses for determining when the same page is >>> mapped twice (in the same process or different processes)? >> >> That's pretty much it. Actually, we're effectively using the physical >> addresses to emulate soft-dirty. For certain operations (e.g. some >> system calls) we need to track what memory has changed since we last >> looked at the process state. We have a mechanism that forks a child >> process, runs the system call, then refers to pagemap to figure out >> what's been modified. >> >> Currently, our mechanism compares the physical addresses of pages >> before and after the syscall so that we can see which pages got CoWed. >> This is perhaps a slightly "unconventional" use of the interface but >> we support kernels that predate the soft-dirty mechanism and (as far >> as we know) this is probably the best way we can answer "What got >> changed?" on those releases. >> >> Using the soft-dirty mechanism where available should make our code >> both cleaner and faster, so if we can fix the pagemap file to allow >> that then we'll be quite happy! >> >> Cheers, >> Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/