Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1304875imu; Wed, 16 Jan 2019 16:49:52 -0800 (PST) X-Google-Smtp-Source: ALg8bN6OKS+saSmPIbJYphZ3WmKNNtcIepY757iEJ1vBirkJn72ilWq2czWb0F5vfd+tKjO9QVO9 X-Received: by 2002:a62:1f9d:: with SMTP id l29mr12779651pfj.14.1547686192736; Wed, 16 Jan 2019 16:49:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547686192; cv=none; d=google.com; s=arc-20160816; b=c/e/SsbK20MrgS7aabo2GLFC384omOJewZWdn7vqEUR84hdVZ7VKOIHp96TMXyVzm9 Cat2wt7V5HOLCJB0Pcq/1IdeRXq6FzTGoGUmZ+d0X8abNfHKRTKCkcK1sxpM89GCMbJP jfCSIQOv1rqIwZ7oZwKfAwy67lJhZCWqpw2+DkbpSHHSJMIDn54VKTu+5b1OJibCNgqw q1zy8DeTnH23I1UOtJG/1BrGrTRmEf1/9B3d5MmkLdAHqROHQSTL7Wfrp6xsnu+cgC2v +f8o2kbbWEmA4zly5+HXt4PeUwAmuUGKAdawgdEFVwjQc/8vRYqC1HKme4Gdrq9g3TdY KavA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=M/GjfyIfaRHW907PDNTphiP7T2kXeQi5Y/lgmjkMAtA=; b=oPiMNcVb2aqyq3wt0B3+Ngot5qkXpm8ZUBaByuv7JoxdIrz+B5oGWbjhZFasoQmj07 sL2jXTRpPafsGNFqJ85pttjLHunScKXwAW6AqMYbNCopIcHuWmNLIWCsYgOQaGDRma6g pVJfVm49+B8tz0915zIAvVhG/dMTdwVja6KjJNxV7eOa/ceoJpdtyBDsGVGObKUeUuUN WMbbEpCpPXWhpIl924v3QTY8p1vcAScMNd8puPYXpD5SRtp6ujOJ2I9JSSHAQq6sYvdv PvWnUbKo4g1M88alOwhVqU6khr3Po0ED/O3ekKG1eKTVh7ieG9OjzaxkJAtzjv3DGpw2 ZOow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o3si7738407pgm.441.2019.01.16.16.49.34; Wed, 16 Jan 2019 16:49:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387682AbfAPWvd (ORCPT + 99 others); Wed, 16 Jan 2019 17:51:33 -0500 Received: from ipmail03.adl6.internode.on.net ([150.101.137.143]:37693 "EHLO ipmail03.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfAPWvd (ORCPT ); Wed, 16 Jan 2019 17:51:33 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl6.internode.on.net with ESMTP; 17 Jan 2019 09:21:28 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gju1v-0003AG-1d; Thu, 17 Jan 2019 09:51:27 +1100 Date: Thu, 17 Jan 2019 09:51:27 +1100 From: Dave Chinner To: Jerome Glisse Cc: Dan Williams , John Hubbard , Jan Kara , Matthew Wilcox , John Hubbard , Andrew Morton , Linux MM , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Michal Hocko , Mike Marciniszyn , rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Message-ID: <20190116225126.GS4205@dastard> References: <20190115080759.GC29524@quack2.suse.cz> <20190115171557.GB3696@redhat.com> <752839e6-6cb3-a6aa-94cb-63d3d4265934@nvidia.com> <20190115221205.GD3696@redhat.com> <99110c19-3168-f6a9-fbde-0a0e57f67279@nvidia.com> <20190116015610.GH3696@redhat.com> <20190116022312.GJ3696@redhat.com> <20190116043455.GP4205@dastard> <20190116145016.GB3617@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190116145016.GB3617@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 16, 2019 at 09:50:16AM -0500, Jerome Glisse wrote: > On Wed, Jan 16, 2019 at 03:34:55PM +1100, Dave Chinner wrote: > > On Tue, Jan 15, 2019 at 09:23:12PM -0500, Jerome Glisse wrote: > > > On Tue, Jan 15, 2019 at 06:01:09PM -0800, Dan Williams wrote: > > > > On Tue, Jan 15, 2019 at 5:56 PM Jerome Glisse wrote: > > > > > On Tue, Jan 15, 2019 at 04:44:41PM -0800, John Hubbard wrote: > > > > [..] > > > > > To make it clear. > > > > > > > > > > Lock code: > > > > > GUP() > > > > > ... > > > > > lock_page(page); > > > > > if (PageWriteback(page)) { > > > > > unlock_page(page); > > > > > wait_stable_page(page); > > > > > goto retry; > > > > > } > > > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > > > unlock_page(page); > > > > > > > > > > test_set_page_writeback() > > > > > bool pinned = false; > > > > > ... > > > > > pinned = page_is_pin(page); // could be after TestSetPageWriteback > > > > > TestSetPageWriteback(page); > > > > > ... > > > > > return pinned; > > > > > > > > > > Memory barrier: > > > > > GUP() > > > > > ... > > > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > > > smp_mb(); > > > > > if (PageWriteback(page)) { > > > > > atomic_add(page->refcount, -PAGE_PIN_BIAS); > > > > > wait_stable_page(page); > > > > > goto retry; > > > > > } > > > > > > > > > > test_set_page_writeback() > > > > > bool pinned = false; > > > > > ... > > > > > TestSetPageWriteback(page); > > > > > smp_wmb(); > > > > > pinned = page_is_pin(page); > > > > > ... > > > > > return pinned; > > > > > > > > > > > > > > > One is not more complex than the other. One can contend, the other > > > > > will _never_ contend. > > > > > > > > The complexity is in the validation of lockless algorithms. It's > > > > easier to reason about locks than barriers for the long term > > > > maintainability of this code. I'm with Jan and John on wanting to > > > > explore lock_page() before a barrier-based scheme. > > > > > > How is the above hard to validate ? > > > > Well, if you think it's so easy, then please write the test cases so > > we can add them to fstests and make sure that we don't break it in > > future. > > > > If you can't write filesystem test cases that exercise these race > > conditions reliably, then the answer to your question is "it is > > extremely hard to validate" and the correct thing to do is to start > > with the simple lock_page() based algorithm. > > > > Premature optimisation in code this complex is something we really, > > really need to avoid. > > Litmus test shows that this never happens, i am attaching 2 litmus > test one with barrier and one without. Without barrier we can see > the double negative !PageWriteback in GUP and !page_pinned() in > test_set_page_writeback() (0:EAX = 0; 1:EAX = 0; below) That's not a regression test, nor does it actually test the code that the kernel runs. It's just an extremely simplified model of a small part of the algorithm. Sure, that specific interaction is fine, but that in no way reflects the complexity of the code or the interactions with other code that interacts with that state. And it's not something we can use to detect that some future change has broken gup vs writeback synchronisation. Memory barriers might be fast, but they are hell for anyone but the person who wrote the algorithm to understand. If a simple page lock is good enough and doesn't cause performance problems, then use the simple page lock mechanism and stop trying to be clever. Other people who will have to understand and debug issues in this code are much less accepting of such clever algorithms. We've been badly burnt on repeated occasions by broken memory barriers in code heavily optimised for performance (*cough* rwsems *cough*), so I'm extremely wary of using memory ordering dependent algorithms in places where they are not necessary. Cheers, Dave. -- Dave Chinner david@fromorbit.com