Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752159AbbD2SpB (ORCPT ); Wed, 29 Apr 2015 14:45:01 -0400 Received: from mail-lb0-f170.google.com ([209.85.217.170]:33236 "EHLO mail-lb0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752095AbbD2So7 (ORCPT ); Wed, 29 Apr 2015 14:44:59 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 29 Apr 2015 19:44:57 +0100 Message-ID: Subject: Re: Regression: Requiring CAP_SYS_ADMIN for /proc//pagemap causes application-level breakage From: Mark Williamson To: Mark Seaborn Cc: kernel list , "Kirill A. Shutemov" , Pavel Emelyanov , Konstantin Khlebnikov , Andrew Morton , Linus Torvalds , Andy Lutomirski , Linux API , Finn Grimwood , Daniel James Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3300 Lines: 70 Hi all, We've been investigating further and found a snag with the PFN-hiding approach discussed last week - looks like it won't be enough on all the architectures we support. Our product runs on x86_32, x86_64 and ARM. For now, it looks like soft-dirty is only available on x86_64. A patch that simply zeros out the physical addresses in /proc/PID/pagemap will therefore help us on x86_64 but we'll still have problems on other platforms[1]. For context, we were previously using pagemap as a cross-platform way to get soft-dirty-like functionality. Specifically, to ask "did a process write to any pages since fork()" by comparing addresses and deducing where CoW must have occurred. In the absence of soft-dirty and the physical addresses, it looks like we can't figure that out with the remaining information in pagemap. If the pagemap file included the "writeable" bit from the PTE, we think we'd have all the information required to deduce what we need (although I realise that's a bit of a nasty workaround). If I proposed including the PTE protection bits in pagemap, would that be controversial? I'm guessing yes but thought it was worth a shot ;-) Would anybody be able to suggest a more tasteful approach? Thanks, Mark [1] I'd note that using soft-dirty is clearly the right approach for us on x64, where available and that ideally we'd use it on other architectures - cross-arch support for soft-dirty is a slightly different discussion, which I hope to post another thread for. On Fri, Apr 24, 2015 at 5:43 PM, Mark Williamson wrote: > Hi Mark, > > On Fri, Apr 24, 2015 at 4:26 PM, Mark Seaborn wrote: >> I'm curious, what do you use the physical page addresses for? >> >> Since you pointed to http://undo-software.com, which talks about >> reversible debugging tools, I can guess you would use the soft-dirty >> flag to implement copy-on-write snapshotting. I'm guessing you might >> use physical page addresses for determining when the same page is >> mapped twice (in the same process or different processes)? > > That's pretty much it. Actually, we're effectively using the physical > addresses to emulate soft-dirty. For certain operations (e.g. some > system calls) we need to track what memory has changed since we last > looked at the process state. We have a mechanism that forks a child > process, runs the system call, then refers to pagemap to figure out > what's been modified. > > Currently, our mechanism compares the physical addresses of pages > before and after the syscall so that we can see which pages got CoWed. > This is perhaps a slightly "unconventional" use of the interface but > we support kernels that predate the soft-dirty mechanism and (as far > as we know) this is probably the best way we can answer "What got > changed?" on those releases. > > Using the soft-dirty mechanism where available should make our code > both cleaner and faster, so if we can fix the pagemap file to allow > that then we'll be quite happy! > > Cheers, > Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/