Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752577AbaFMP1R (ORCPT ); Fri, 13 Jun 2014 11:27:17 -0400 Received: from mail-ie0-f178.google.com ([209.85.223.178]:61002 "EHLO mail-ie0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751181AbaFMP1P (ORCPT ); Fri, 13 Jun 2014 11:27:15 -0400 MIME-Version: 1.0 In-Reply-To: References: <1402655819-14325-1-git-send-email-dh.herrmann@gmail.com> <1402655819-14325-8-git-send-email-dh.herrmann@gmail.com> Date: Fri, 13 Jun 2014 17:27:14 +0200 Message-ID: Subject: Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files From: David Herrmann To: Andy Lutomirski Cc: "linux-kernel@vger.kernel.org" , Michael Kerrisk , Ryan Lortie , Linus Torvalds , Andrew Morton , "linux-mm@kvack.org" , Linux FS Devel , Linux API , Greg Kroah-Hartman , John Stultz , Lennart Poettering , Daniel Mack , Kay Sievers , Hugh Dickins , Tony Battersby Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski wrote: > On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann wrote: >> When setting SEAL_WRITE, we must make sure nobody has a writable reference >> to the pages (via GUP or similar). We currently check references and wait >> some time for them to be dropped. This, however, might fail for several >> reasons, including: >> - the page is pinned for longer than we wait >> - while we wait, someone takes an already pinned page for read-access >> >> Therefore, this patch introduces page-isolation. When sealing a file with >> SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage >> is put in place atomically, the old page is detached and left alone. It >> will get reclaimed once the last external user dropped it. >> >> Signed-off-by: David Herrmann > > Won't this have unexpected effects? > > Thread 1: start read into mapping backed by fd > > Thread 2: SEAL_WRITE > > Thread 1: read finishes. now the page doesn't match the sealed page Just to be clear: you're talking about read() calls that write into the memfd? (like my FUSE example does) Your language might be ambiguous to others as "read into" actually implies a write. No, this does not have unexpected effects. But yes, your conclusion is right. To be clear, this behavior would be part of the API. Any asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your buffer before the write finishes. But you actually have to extend your example: Thread 1: p = mmap(memfd, SIZE); Thread 1: h = async_read(some_fd, p, SIZE); Thread 1: munmap(p, SIZE); Thread 2: SEAL_WRITE Thread 1: async_wait(h); If you don't do the unmap(), then SEAL_WRITE will fail due to an elevated i_mmap_writable. I think this is fine. In fact, I remember reading that async-IO is not required to resolve user-space addresses at the time of the syscall, but might delay it to the time of the actual write. But you're right, it would be misleading that the AIO operation returns success. This would be part of the memfd-API, though. And if you mess with your address space while running an async-IO operation on it, you're screwed anyway. Btw., your sealing use-case is really odd. No-one guarantees that the SEAL_WRITE happens _after_ you schedule your async-read. In case you have some synchronization there, you just have to move it after waiting for your async-io to finish. Does that clear things up? Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/