Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp4167506imm; Mon, 17 Sep 2018 09:12:57 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb/WrEhoMeYVALJ+46WGkM12rZHyiMgelE5lkxAQZKqnSBiEXC9epq8PfM0ieacLkwVIOvI X-Received: by 2002:a63:9752:: with SMTP id d18-v6mr21911743pgo.405.1537200777362; Mon, 17 Sep 2018 09:12:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537200777; cv=none; d=google.com; s=arc-20160816; b=BK9v6Rhq0jCGNDe5K8FzIV05ZpT6HHZQsScyqNQVs5uFbTPJx2rALtFdUOy+FIIlJQ l6vSpxWIN1jbbQWwiad1MC5MpZ0hHqq0iVxR3i09rMjXZGsMqCoVSFAdaxV36QqeVfrp ZyRbIkP+w3EWclAumgipCtlRWAnUaQSCqn2Ca8Tw1vSzoUVKo0wRJdkQ6f9ydjEu79a5 akjows+T7iljWLW1yNUv8lpiKZPmeCpxXUCK83swKGbWofmxrLyBMMrKdG3Hln2ZVb4z aVfJmzKOPCUhZX+EsSR4UOhP1VXGJWVbqUWYiOZn4ZdCI19kFbw7G8ckF3NuBQPTWuWk owrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:mime-version:dkim-signature; bh=DdXEHCKPosa564GDkAuVhK6Z6eNdnupUEaWEkU52u8M=; b=vH6rlEnTDkPVykIEk2AMiIOHNaCuyOI+ID74SL3yo1zuajeehGjrjThEGbtRbOZkt5 F3m/PBfbUNuWsHNbJdTP9RLWBNk3S31NMcuQ9KAU/HKmuPjAfxvxzY0rgdcZZxusirTX YhpZP0oMEw85ddyM8O2x8OX57bXko17quHPR2EzoK166ZoJdZ673AfuvDXKYdk3d4EJ+ fXLdsHJmTr1XV3tn7PHNVj1c+fMDs9vYKp+ZZag9E5ErXwfiNrOlCQyjXStHTKSjG+yR AFD1NXio9DYXveWS1QU9iJ1Hpumcqs1QRnIEf211hEfqBQDXo+bv8Jp92mup5n5dLqd8 stwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SCpp4n9E; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gn18si16336890plb.500.2018.09.17.09.12.40; Mon, 17 Sep 2018 09:12:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SCpp4n9E; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728423AbeIQVke (ORCPT + 99 others); Mon, 17 Sep 2018 17:40:34 -0400 Received: from mail-ot1-f54.google.com ([209.85.210.54]:36911 "EHLO mail-ot1-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726795AbeIQVke (ORCPT ); Mon, 17 Sep 2018 17:40:34 -0400 Received: by mail-ot1-f54.google.com with SMTP id o13-v6so11875074otl.4 for ; Mon, 17 Sep 2018 09:12:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=DdXEHCKPosa564GDkAuVhK6Z6eNdnupUEaWEkU52u8M=; b=SCpp4n9EgpCFNedfV+beNxHZHvLasZL911tMnKAFSI7xmXpDMBkhm9NCxVD9W2UyWd 9RoepDkVgJ+KAzfj1luD+tYbQqNoFYrJKJKICcWf21WOnYqQM8v/d4XxrelF/dNuEm14 Vy3VFx0QHtRCj6RCINhvRZDavPksPuKKWTrP5+6bEAsi6UtUnK2F9zwOIMbWWI4QGFXV TRx5iCcdzoGKiqBcsyZBTK7LY+kr7N6vJcz2KIVJCzbna+HHExem9QHP02yCGQ8M+3yB 3AjvQHb+c6/6IjzVDiYa6uaXzb1A0TgeuWUb7DVVEdhrJkHrttKzOa6RK2UVIM4f/zMc 7GkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=DdXEHCKPosa564GDkAuVhK6Z6eNdnupUEaWEkU52u8M=; b=pj3yOqvSfInV5GwkPcwhE4C6D6+6ljOXq5t+gzEKiOxgU2j2mOJ11zAS3HFaaoLemX WrabFKLBWQFdEzdhUTi9vS68HT55gq/EeOPb69sOwKK8AHyiWJi1vTgvoTZMOZDsMoQA JVI7Kz/s6mWd1HK8Mh4qHo4JdV8VoEgt0Isk+SQhX9ai3CwUgWBOce9qsuGsg8gMscfk MZVG7KPxOIGrAkAAKMX76v92t60/c2nsCOuYlXDWKC1XP4TrhB7xobJFU4DPQrE/6uv9 oG9t0wv1eMzzCKT6G9/wN4AijtHBO40aJmPaqnRWQyiZHgaRFfQGiGs4k9BZlEoCOi0u 2kJQ== X-Gm-Message-State: APzg51D/FFODabi5ysAqlCaoNaX3m89f8NdwNFWyr8/mDEWwF9uqwlRg MFyXnFfYGr7C1AHI8LE5ufym/cmmD5+X/T72yw0GgQ== X-Received: by 2002:a9d:654a:: with SMTP id q10-v6mr12280262otl.256.1537200752307; Mon, 17 Sep 2018 09:12:32 -0700 (PDT) MIME-Version: 1.0 From: Jann Horn Date: Mon, 17 Sep 2018 18:12:05 +0200 Message-ID: Subject: [BUG] mm: direct I/O (using GUP) can write to COW anonymous pages To: Linux-MM , Dan Williams , Andrew Morton , Michal Hocko , Hugh Dickins , Rik van Riel Cc: kernel list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [I'm not sure who the best people to ask about this are, I hope the recipient list resembles something reasonable...] I have noticed that the dup_mmap() logic on fork() doesn't handle pages with active direct I/O properly: dup_mmap() seems to assume that making the PTE referencing a page readonly will always prevent future writes to the page, but if the kernel has acquired a direct reference to the page before (e.g. via get_user_pages_fast()), writes can still happen that way. The worst-case effect of this - as far as I can tell - is that when a multithreaded process forks while one thread is in the middle of sys_read() on a file that uses direct I/O with get_user_pages_fast(), the read data can become visible in the child while the parent's buffer stays uninitialized if the parent writes to a relevant page post-fork before either the I/O completes or the child writes to it. Reproducer code: =3D=3D=3D=3D=3D=3D START hello.c =3D=3D=3D=3D=3D=3D #define FUSE_USE_VERSION 26 #include #include #include #include #include #include #include #include static const char *hello_path =3D "/hello"; static int hello_getattr(const char *path, struct stat *stbuf) { int res =3D 0; memset(stbuf, 0, sizeof(struct stat)); if (strcmp(path, "/") =3D=3D 0) { stbuf->st_mode =3D S_IFDIR | 0755; stbuf->st_nlink =3D 2; } else if (strcmp(path, hello_path) =3D=3D 0) { stbuf->st_mode =3D S_IFREG | 0666; stbuf->st_nlink =3D 1; stbuf->st_size =3D 0x1000; stbuf->st_blocks =3D 0; } else res =3D -ENOENT; return res; } static int hello_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi) { filler(buf, ".", NULL, 0); filler(buf, "..", NULL, 0); filler(buf, hello_path + 1, NULL, 0); return 0; } static int hello_open(const char *path, struct fuse_file_info *fi) { return 0; } static int hello_read(const char *path, char *buf, size_t size, off_t offset, struct fuse_file_info *fi) { sleep(3); size_t len =3D 0x1000; if (offset < len) { if (offset + size > len) size =3D len - offset; memset(buf, 0, size); } else size =3D 0; return size; } static int hello_write(const char *path, const char *buf, size_t size, off_t offset, struct fuse_file_info *fi) { while(1) pause(); } static struct fuse_operations hello_oper =3D { .getattr =3D hello_getattr, .readdir =3D hello_readdir, .open =3D hello_open, .read =3D hello_read, .write =3D hello_write, }; int main(int argc, char *argv[]) { return fuse_main(argc, argv, &hello_oper, NULL); } =3D=3D=3D=3D=3D=3D END hello.c =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D START simple_mmap.c =3D=3D=3D=3D=3D=3D #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include __attribute__((aligned(0x1000))) char data_buffer_[0x10000]; #define data_buffer (data_buffer_ + 0x8000) void *fuse_thread(void *dummy) { /* step 2: start direct I/O on data_buffer */ int fuse_fd =3D open("mount/hello", O_RDWR); if (fuse_fd =3D=3D -1) err(1, "unable to open FUSE fd"); printf("char in parent (before): %hhd\n", data_buffer[0]); int res =3D read(fuse_fd, data_buffer, 0x1000); /* step 6: read completes, show post-read state */ printf("fuse read result: %d\n", res); printf("char in parent (after): %hhd\n", data_buffer[0]); } int main(void) { /* step 1: make data_buffer dirty */ data_buffer[0] =3D 1; pthread_t thread; if (pthread_create(&thread, NULL, fuse_thread, NULL)) errx(1, "pthread_create"); sleep(1); /* step 3: fork a child */ pid_t child =3D fork(); if (child =3D=3D -1) err(1, "fork"); if (child =3D=3D 0) { prctl(PR_SET_PDEATHSIG, SIGKILL); sleep(1); /* step 5: show pre-read state in the child */ printf("char in child (before): %hhd\n", data_buffer[0]); sleep(3); /* step 7: read is complete, show post-read state in child = */ printf("char in child (after): %hhd\n", data_buffer[0]); return 0; } /* step 4: de-CoW data_buffer in the parent */ data_buffer[0x800] =3D 2; int status; if (wait(&status) !=3D child) err(1, "wait"); } =3D=3D=3D=3D=3D=3D END simple_mmap.c =3D=3D=3D=3D=3D=3D Repro steps: In one terminal: $ mkdir mount $ gcc -o hello hello.c -Wall -std=3Dgnu99 `pkg-config fuse --cflags --libs` hello.c: In function =E2=80=98hello_write=E2=80=99: hello.c:67:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^ $ ./hello -d -o direct_io mount FUSE library version: 2.9.7 [...] In a second terminal: $ gcc -pthread -o simple_mmap simple_mmap.c $ ./simple_mmap char in parent (before): 1 char in child (before): 1 fuse read result: 4096 char in parent (after): 1 char in child (after): 0 I have tested that this still works on 4.19.0-rc3+. As far as I can tell, the fix would be to immediately copy pages with `refcount - mapcount > N` in dup_mmap(), or something like that?