Received: by 10.192.165.148 with SMTP id m20csp216445imm; Thu, 19 Apr 2018 20:02:28 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/SN9K3oTr1/aOQ1IP7DRRQLTsr7OAqbajeJ2O711CVcFlene6x9lXUQe0lVvtpZjuWWhA2 X-Received: by 2002:a17:902:3e5:: with SMTP id d92-v6mr8734605pld.104.1524193348642; Thu, 19 Apr 2018 20:02:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524193348; cv=none; d=google.com; s=arc-20160816; b=N7pmYmybVgNWXKQdBGkBXLhSOcwF1nYBzkGGz628opqNGjEyI3zb11ywC1S0o2Hcym E7CjLD8OIo5PVi4WY1+yw1XZ3JK6Xk/5yUoXoS0kI4o/mtV8CMwrox69ylrYSb91L4G+ DxARYXgvBmJ4PAJh1FH5YPBtrqrUtkeOTu/I9a1We5YtKOUNCmZ4f/4EkJrSQDKC3Az8 gqmUVhrJLB0OLWoXj4DfL0Ga5H67sj4ETNuzeIkYQi0FlSLRMnRa6nIreu5gWzjLbvvk YJwPYzKDCpuZnnv8Pf1TYJAHtNBI7ZFHxHj/Eozsfuk2pfgAVlNu3FlVVqNWYNylazI0 C4mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=/dR0ArHnnTPlVAtjYNkw5dfGf20U30NQ7kMb/MPFxss=; b=ENlshQxxDZXJ2cn2m41JN9eAxOp0FmA0RT2ko95IDyG+xmZu7vu1DqDdfzvDw4hXPv 71F8q45Qptihc7y4bEweHc7E3uv8opS4MGExFx5bxZRJolmaPNRMhuuXzrsSOMFnryMD tQcpFQIInhtkEQBOhyt2VIGknwbIkZI8ejUyQWJXhEDPiaf8ybsC+OMvUeOWNaNdeSh6 ZDikytVZRonXJnq1i9r6MXFofFhpOtnkJJGHWvSg9iFcBKzWWGntjkQeFqPaUw1MAnlb bFsVeCSDfy+dS07WdD4Y6t6aQw/pVOaMsokawyUE4p61ev5cTkm+wy02AobVvuCKhXj9 xrHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=B7INiI/E; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y9si4325594pgp.525.2018.04.19.20.02.02; Thu, 19 Apr 2018 20:02:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=B7INiI/E; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754128AbeDTDAP (ORCPT + 99 others); Thu, 19 Apr 2018 23:00:15 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:33149 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753964AbeDTDAM (ORCPT ); Thu, 19 Apr 2018 23:00:12 -0400 Received: by mail-oi0-f68.google.com with SMTP id 126-v6so6781700oig.0 for ; Thu, 19 Apr 2018 20:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=/dR0ArHnnTPlVAtjYNkw5dfGf20U30NQ7kMb/MPFxss=; b=B7INiI/E5HdTipYMki72fCWxdmVeArqhRbuDF8TffZgFf+JfnFzl5fhLG+WeBgh5EJ lTKpF/q+tOKk6gLGnb4PbivATSB+H+aHM7x9EfAtsm82MsM0eVwWLTzopRhAgLHKtDxG vG81MQMtkI+fDB8H+0JbnHNrhUDxu/scjvve/LWk+Z8NYh5GWRsqkSt58Q1CbSiiGC1j KpO2KGgZ97G/y+PdvgUX0HPrbsTKkkRMUGnMf9dFPvPKvqSw1dhhOwh4dzeDLDVb3bMY iPRrB/d34n/VGh6QPn7zQ268uPD+FzlLuWWQ69Kp1y6jz4KdHFBvDLoceICDFcc+1pAh dYnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/dR0ArHnnTPlVAtjYNkw5dfGf20U30NQ7kMb/MPFxss=; b=RGjy7Li8xHOLUnglt6bpESZb7jBSQCcZsxARXDWmqDB9+TSW6Ub0GCsyaW8KfkWdbr F7UyWvKk/EMClGtJZNfZtPFymZGmG6ysEFncm7sc0PVE1KeG6wE1M085+Na5dgx01Hz1 z2cQ5rgUgU5XJM9tEbAzNYDC6mlHr/bBzW9bWmVMMrojFcz5SfITV4e+ynQ6qPOWTOpR gyqjGwt9UIpgvX+CAFjCsJ7dLCWT98P8USE4Vqr7LYl6X3kMOQvCkINFFsrCfw2685bQ IwUK24Icyhn77kHw9QX8lJZYOkur/FOW0dTM7aVzDJpCs/sA5QMi1+mV/vY9FXms7z0x t0+g== X-Gm-Message-State: ALQs6tARldjlxkZQABYWfRU7mQcreiedy+yWD/8leEzx3bXhYnmRHpKa Iy+zjQSlH30t9qCwrITl/O/BAEME7DBRK21aaXSBKQ== X-Received: by 2002:aca:6505:: with SMTP id m5-v6mr5435120oim.215.1524193209918; Thu, 19 Apr 2018 20:00:09 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2d36:0:0:0:0:0 with HTTP; Thu, 19 Apr 2018 20:00:09 -0700 (PDT) In-Reply-To: <20180419104432.7lzk7nbjmwav6ojl@quack2.suse.cz> References: <152246892890.36038.18436540150980653229.stgit@dwillia2-desk3.amr.corp.intel.com> <152246901060.36038.4487158506830998280.stgit@dwillia2-desk3.amr.corp.intel.com> <20180404094656.dssixqvvdcp5jff2@quack2.suse.cz> <20180409164944.6u7i4wgbp6yihvin@quack2.suse.cz> <20180419104432.7lzk7nbjmwav6ojl@quack2.suse.cz> From: Dan Williams Date: Thu, 19 Apr 2018 20:00:09 -0700 Message-ID: Subject: Re: [PATCH v8 15/18] mm, fs, dax: handle layout changes to pinned dax mappings To: Jan Kara Cc: linux-nvdimm , Jeff Moyer , Dave Chinner , Matthew Wilcox , Alexander Viro , "Darrick J. Wong" , Ross Zwisler , Dave Hansen , Andrew Morton , Christoph Hellwig , linux-fsdevel , linux-xfs , Linux Kernel Mailing List , Mike Snitzer , Paul McKenney , Josh Triplett Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 19, 2018 at 3:44 AM, Jan Kara wrote: > On Fri 13-04-18 15:03:51, Dan Williams wrote: >> On Mon, Apr 9, 2018 at 9:51 AM, Dan Williams wrote: >> > On Mon, Apr 9, 2018 at 9:49 AM, Jan Kara wrote: >> >> On Sat 07-04-18 12:38:24, Dan Williams wrote: >> > [..] >> >>> I wonder if this can be trivially solved by using srcu. I.e. we don't >> >>> need to wait for a global quiescent state, just a >> >>> get_user_pages_fast() quiescent state. ...or is that an abuse of the >> >>> srcu api? >> >> >> >> Well, I'd rather use the percpu rwsemaphore (linux/percpu-rwsem.h) than >> >> SRCU. It is a more-or-less standard locking mechanism rather than relying >> >> on implementation properties of SRCU which is a data structure protection >> >> method. And the overhead of percpu rwsemaphore for your use case should be >> >> about the same as that of SRCU. >> > >> > I was just about to ask that. Yes, it seems they would share similar >> > properties and it would be better to use the explicit implementation >> > rather than a side effect of srcu. >> >> ...unfortunately: >> >> BUG: sleeping function called from invalid context at >> ./include/linux/percpu-rwsem.h:34 >> [..] >> Call Trace: >> dump_stack+0x85/0xcb >> ___might_sleep+0x15b/0x240 >> dax_layout_lock+0x18/0x80 >> get_user_pages_fast+0xf8/0x140 >> >> ...and thinking about it more srcu is a better fit. We don't need the >> 100% exclusion provided by an rwsem we only need the guarantee that >> all cpus that might have been running get_user_pages_fast() have >> finished it at least once. >> >> In my tests synchronize_srcu is a bit slower than unpatched for the >> trivial 100 truncate test, but certainly not the 200x latency you were >> seeing with syncrhonize_rcu. >> >> Elapsed time: >> 0.006149178 unpatched >> 0.009426360 srcu > > Hum, right. Yesterday I was looking into KSM for a different reason and > I've noticed it also does writeprotect pages and deals with races with GUP. > And what KSM relies on is: > > write_protect_page() > ... > entry = ptep_clear_flush(vma, pvmw.address, pvmw.pte); > /* > * Check that no O_DIRECT or similar I/O is in progress on the > * page > */ > if (page_mapcount(page) + 1 + swapped != page_count(page)) { > page used -> bail Slick. > } > > And this really works because gup_pte_range() does: > > page = pte_page(pte); > head = compound_head(page); > > if (!page_cache_get_speculative(head)) > goto pte_unmap; > > if (unlikely(pte_val(pte) != pte_val(*ptep))) { > bail Need to add a similar check to __gup_device_huge_pmd. > } > > So either write_protect_page() page sees the elevated reference or > gup_pte_range() bails because it will see the pte changed. > > In the truncate path things are a bit different but in principle the same > should work - once truncate blocks page faults and unmaps pages from page > tables, we can be sure GUP will not grab the page anymore or we'll see > elevated page count. So IMO there's no need for any additional locking > against the GUP path (but a comment explaining this is highly desirable I > guess). Yes, those "pte_val(pte) != pte_val(*ptep)" checks should be documented for the same reason we require comments on rmb/wmb pairs. I'll take a look, thanks Jan.