Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3321765imm; Sun, 1 Jul 2018 18:32:37 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJPoBg3Pv+hWNonMwdlrgbuWqvwihvEejSA0h6tC1YH0g4i822IPw9J0a9zTxgM0qTiGEqA X-Received: by 2002:a65:45cc:: with SMTP id m12-v6mr20024476pgr.160.1530495156887; Sun, 01 Jul 2018 18:32:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530495156; cv=none; d=google.com; s=arc-20160816; b=iPEv3J6mJumk+0alsah6RSFKb+G0BVhHVvQMYx26Um062T7FnIB1HVKx76gWlOHzzE d88u94sg34Sn5MJutXZsXUH4hKA43eo0FvJK4H/z+Veq7miNTQU5bOEUqSVDG1YeWKhd DA+KuuoSv49h9TqFVoc4CbEYBa5ao9/iGh5dQodLXud2f/oDtVM+cerYyOAsAc+8RLXu 1txgLZXRbhtBjkBlxSW+aNJf/2srTItF2LdyH8r3amiPLOPqBdDjwyiGbosaxr0EDiSy CLHT1zYqNjqY5gvgAIR6xdmIWElORplNzllX3O3qMdwEYdsYoRAAW5Ov+8mGY2D+8Eb8 TFbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=AidDj29RtuV0jeEJEUfNKW78+sGz14aXoWxsFWh9zFU=; b=h3YFxTejmm392mtJOFYOLJ+xoJs6Fl9kkzaF5ynAKH4jMYiYSnZJW/VqWuRiFTm8bM giwG7+t6Pq3F4ZFdHrXNne+jBea7lWbW21f25V3mvuamibBXu8ryb8gNRN045aQqsH53 7ZZqscQJNYWFWHW87D1gfvZ7Zwwpup3MIaJHXgNMU99y3pTTioFiBYnrK0q4sKShGjMl 4cyawP/TwmmYz1mAcM21zr54BXV1vU1NpPV4bD4gVYt7/cXws029qWFUILc0cb6UY2nH h2HzAzDFEDVrW4j8gxCLiRL9NqVTqwb2iOAJzbIp7g/K6qt0goZHX4v4kpddhiDqtKsC 0GkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=vHODTGNb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v4-v6si12779183pgs.299.2018.07.01.18.32.20; Sun, 01 Jul 2018 18:32:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=vHODTGNb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932268AbeGBA5k (ORCPT + 99 others); Sun, 1 Jul 2018 20:57:40 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:40468 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752789AbeGBA5f (ORCPT ); Sun, 1 Jul 2018 20:57:35 -0400 Received: by mail-oi0-f67.google.com with SMTP id w126-v6so6056432oie.7; Sun, 01 Jul 2018 17:57:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=AidDj29RtuV0jeEJEUfNKW78+sGz14aXoWxsFWh9zFU=; b=vHODTGNbcbAaY+HXlcYdtPlqq+50CAhkSkliX1O2LL7yGAxkcZVNyAFBFkezHERF2v GiH2g31s2aneaeS32tkUM/c6IhVNeaNqU1rgbptSTfoHYcucGLzFg67t9++H0+S1mYj3 iFIYGpM7GJHjrwHhZU5869oVJrI42fTKsNb4vhomnlAe6YOpvXs8Me+aYHncVZxIg7An 6QZY1zcqUg+fcjRE1gh0VmnJRwZljuqZBP1KsSJM4gaV/Jx7gorEGfm3eqTUs5qkqadP jF1frkgHlwsRvYa2t8ETMx+64JmyNEoi2q3wghO2kQvlXRY9S/jawvxc/Inukf3nSzxd AGYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=AidDj29RtuV0jeEJEUfNKW78+sGz14aXoWxsFWh9zFU=; b=UfHImiTG/KtdFOX/ZT85liO4nXvX6lAH5t2HuRurkPSDFC4C2ymXUWMg/swCpuJZr+ ZW00s6JJDHk2hKolZwC/zoMk5QfyBGdnEyjkIHxDJNrMU9lQ9ad2YiRMh2ABQflR7zW1 sU23r1PtLdHHMbAfnA6LeBTtdT4W+gjusyJ70B9bLoQbeoQWvjZxdkwTSXH1bdnsTcP7 RR4b0eZ8nQidvajbyrojGffmSbyF6P99f51eYP5zwvL7NPjv2Ae/rPka5nA7QGy4T0ef rkDxr3xgD31mzdue3WY/9UiE001cqARCTcDGpZZXZ6jvunE/Wv8LDnjX5Z8fcFzRO87F s+pQ== X-Gm-Message-State: APt69E10lPtIL8GGIGEyFl5onvUkAYWzPjISpNhowvyYTwFC4AuZb3Yh 1Ju9o5XazGoTcQTc+N8Mzss= X-Received: by 2002:aca:3f57:: with SMTP id m84-v6mr12572970oia.280.1530493054893; Sun, 01 Jul 2018 17:57:34 -0700 (PDT) Received: from sandstorm.nvidia.com ([2600:1700:43b0:3120:feaa:14ff:fe9e:34cb]) by smtp.gmail.com with ESMTPSA id v6-v6sm4111672oix.30.2018.07.01.17.57.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 01 Jul 2018 17:57:33 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH v2 0/6] mm/fs: gup: don't unmap or drop filesystem buffers Date: Sun, 1 Jul 2018 17:56:48 -0700 Message-Id: <20180702005654.20369-1-jhubbard@nvidia.com> X-Mailer: git-send-email 2.18.0 X-NVConfidentiality: public Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: John Hubbard This fixes a few problems that came up when using devices (NICs, GPUs, for example) that want to have direct access to a chunk of system (CPU) memory, so that they can DMA to/from that memory. Problems [1] come up if that memory is backed by persistence storage; for example, an ext4 file system. I've been working on several customer bugs that are hitting this, and this patchset fixes those bugs. The bugs happen via: -- get_user_pages() on some ext4-backed pages -- device does DMA for a while to/from those pages -- Somewhere in here, some of the pages get disconnected from the file system, via try_to_unmap() and eventually drop_buffers() -- device is all done, device driver calls set_page_dirty_locked, then put_page() And then at some point, we see a this BUG(): kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899! backtrace: ext4_writepage __writepage write_cache_pages ext4_writepages do_writepages __writeback_single_inode writeback_sb_inodes __writeback_inodes_wb wb_writeback wb_workfn process_one_work worker_thread kthread ret_from_fork ...which is due to the file system asserting that there are still buffer heads attached: ({ \ BUG_ON(!PagePrivate(page)); \ ((struct buffer_head *)page_private(page)); \ }) How to fix this: If a page is pinned by any of the get_user_page("gup", here) variants, then there is no need for that page to be on an LRU. So, this patchset removes such pages from their LRU, thus leaving the page->lru fields *mostly* available for tracking gup pages. (The lowest bit of page->lru.next is used as PageTail, and these flags have to be checked when we don't know if it really is a tail page or not, so avoid that bit.) After that, the page is reference-counted via page->dma_pinned_count, and flagged via page->dma_pinned_flags. The PageDmaPinned flag is cleared when the reference count hits zero, and the reference count is only used when the flag is set. All of the above provides a reliable PageDmaPinned flag, which is then used to decide when to abort or wait for operations such as: try_to_unmap() page_mkclean() In order to handle page_mkclean(), new information had to be plumbed down from the filesystems, so that page_mkclean can decide whether to skip dma-pinned pages, or to wait for them. Thanks to Matthew Wilcox for suggesting re-using page->lru fields for a new refcount and flag, and to Jan Kara for explaining the rest of the design details (how to deal with page_mkclean() and try_to_unmap(), especially). Also thanks to Dan Williams for design advice and DAX, long-term pinning, and page flag thoughts. References: [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Changes since v1: -- Use page->lru and full reference counting, instead of a single page flag. -- Proper handling of page_mkclean(). John Hubbard (6): mm: get_user_pages: consolidate error handling mm: introduce page->dma_pinned_flags, _count mm: introduce zone_gup_lock, for dma-pinned pages mm/fs: add a sync_mode param for clear_page_dirty_for_io() mm: track gup pages with page->dma_pinned_* fields mm: page_mkclean, ttu: handle pinned pages drivers/video/fbdev/core/fb_defio.c | 3 +- fs/9p/vfs_addr.c | 2 +- fs/afs/write.c | 6 +- fs/btrfs/extent_io.c | 14 ++--- fs/btrfs/file.c | 2 +- fs/btrfs/free-space-cache.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/ceph/addr.c | 4 +- fs/cifs/cifssmb.c | 3 +- fs/cifs/file.c | 5 +- fs/ext4/inode.c | 5 +- fs/f2fs/checkpoint.c | 4 +- fs/f2fs/data.c | 2 +- fs/f2fs/dir.c | 2 +- fs/f2fs/gc.c | 4 +- fs/f2fs/inline.c | 2 +- fs/f2fs/node.c | 10 ++-- fs/f2fs/segment.c | 3 +- fs/fuse/file.c | 2 +- fs/gfs2/aops.c | 2 +- fs/nfs/write.c | 2 +- fs/nilfs2/page.c | 2 +- fs/nilfs2/segment.c | 10 ++-- fs/ubifs/file.c | 2 +- fs/xfs/xfs_aops.c | 2 +- include/linux/mm.h | 22 ++++++- include/linux/mm_types.h | 22 +++++-- include/linux/mmzone.h | 7 +++ include/linux/page-flags.h | 50 ++++++++++++++++ include/linux/rmap.h | 4 +- mm/gup.c | 93 +++++++++++++++++++++++------ mm/memcontrol.c | 7 +++ mm/memory-failure.c | 3 +- mm/migrate.c | 2 +- mm/page-writeback.c | 14 +++-- mm/page_alloc.c | 1 + mm/rmap.c | 71 ++++++++++++++++++++-- mm/swap.c | 48 +++++++++++++++ mm/truncate.c | 3 +- mm/vmscan.c | 2 +- 40 files changed, 361 insertions(+), 85 deletions(-) -- 2.18.0