Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp1403068pxb; Sat, 9 Jan 2021 19:28:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJyX6U9FwW6MlNuevN4HKCuH8jW/A5IY6NRsn+Ppyy7nnzMLtqwpsDN0qA3TJAOc+j9qqUbT X-Received: by 2002:a05:6402:17c6:: with SMTP id s6mr9904661edy.142.1610249309445; Sat, 09 Jan 2021 19:28:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610249309; cv=none; d=google.com; s=arc-20160816; b=xnEUTERGOscbRoErWtKBAtgFElNdLPuJlNbLsJgxFYlSw6p6jtDckCAol6aS9++Nhy 6OKLiI7LyXhfpfRk+vTPF0dUixT7iItp2MAdOCgDAJ+Vi1H18ma+O0f3mGe4pNxuklPx GFNmeFViTG8OZ7JsJYNoarRFpFPDg+V+TcXX2qRAHoK78pj6KbS0vu0fGMWNon+fgYId R7uSh1dWI+48eGZ/03vljZc6IuW5DdxaLRInRzTU7QSrXIQk9HzMbNBUm0Wb6K+sQOCk Ihb31UqqiC0+zzE9oGn9ntFwOm7YlVZUaB04cjcebGYqZiNmt+wSKx3mwJIy64vKWZVz JwXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=/+8EcBqhJXYrraKAb9xX5znLleqitaHZlI2nnyEeNlE=; b=hDiLsiqAUzLQBB/W9ie1lvzA15rMT8+bC2b37vSLms9JJnzzQEdrAXAXRXHOyFbynt Ox7UoSVg6xrlCClvsN2BEbj+Zf6waD0RyoIqSciJViuXymIA9n9GaXMr9MTXIuGFrge8 9QzWfFm55gqREo08YtDLPccv1c1EtDwQ6N7c7HZNCL8fi0JNLXnxQWHtt4EMZamTYZ3e qWB363hkzsSsLgQuQj+O4vIAVhJ4zVoJ7az1oWB67e32t13Aw6ssQf2632CChtG4UoDX 7WnIiUE8uOcDeZDADh18y2+1dtaomkCenNNzuzGISCDDtY/Jpu0GZgQ38q074cMMdAdz 0WNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hp+UFFPL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v21si5415958ejg.492.2021.01.09.19.27.53; Sat, 09 Jan 2021 19:28:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hp+UFFPL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726375AbhAJD0f (ORCPT + 99 others); Sat, 9 Jan 2021 22:26:35 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:28068 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726132AbhAJD0e (ORCPT ); Sat, 9 Jan 2021 22:26:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610249107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/+8EcBqhJXYrraKAb9xX5znLleqitaHZlI2nnyEeNlE=; b=Hp+UFFPLBk059CZiqujqX8wpIT3CmDia5hNV0x75vBfaQn96LIXyh9V3udM3nHxyN17OZI fnkwpA/vDi9678oZ+u+bMDwx1PnWevnGlsQT4GrWXrHM5RbkxvNsSiB2DITZF/IrlNVE6e cqiFA23Yab04wHxsBhLmE+7zQ0Co7jw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-107-X3KvbQBtNUule9ZWTCgMXw-1; Sat, 09 Jan 2021 22:25:04 -0500 X-MC-Unique: X3KvbQBtNUule9ZWTCgMXw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D08B3800D25; Sun, 10 Jan 2021 03:25:00 +0000 (UTC) Received: from mail (ovpn-112-222.rdu2.redhat.com [10.10.112.222]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A533E19D9B; Sun, 10 Jan 2021 03:24:53 +0000 (UTC) Date: Sat, 9 Jan 2021 22:24:52 -0500 From: Andrea Arcangeli To: Linus Torvalds Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse Message-ID: References: <20210110004435.26382-1-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 09, 2021 at 05:37:09PM -0800, Linus Torvalds wrote: > On Sat, Jan 9, 2021 at 5:19 PM Linus Torvalds > wrote: > > > > And no, I didn't make the UFFDIO_WRITEPROTECT code take the mmap_sem > > for writing. For whoever wants to look at that, it's > > mwriteprotect_range() in mm/userfaultfd.c and the fix is literally to > > turn the read-lock (and unlock) into a write-lock (and unlock). > > Oh, and if it wasn't obvious, we'll have to debate what to do with > trying to mprotect() a pinned page. Do we just ignore the pinned page > (the way my clear_refs patch did)? Or do we turn it into -EBUSY? Or > what? Agreed, I assume mprotect would have the same effect. mprotect in parallel of a read or recvmgs may be undefined, so I didn't bring it up, but it was pretty clear. The moment the write bit is cleared (no matter why and from who) and the PG lock relased, if there's any GUP pin, GUP currently loses synchrony. In any case I intended to help exercising the new page_count logic with the testcase, possibly to make it behave better somehow, no matter how. I admit I'm also wondering myself the exact semantics of O_DIRECT on clear_refs or uffd-wp tracking, but the point is that losing reads and getting unexpected data in the page, still doesn't look a good behavior and it had to be at least checked. To me ultimately the useful use case that is become impossible with page_count isn't even clear_refs nor uffd-wp. The useful case that I can see zero fundamental flaws in it, is a RDMA or some other device computing in pure readonly DMA on the data while a program runs normally and produces it. It could be even a framebuffer that doesn't care about coherency. You may want to occasionally wrprotect the memory under readonly long term GUP pin for consistency even against bugs of the program itself. Why should wrprotecting make the device lose synchrony? And kind of performance we gain to the normal useful cases by breaking the special case? Is there a benchmark showing it? > So it's not *just* the locking that needs to be fixed. But just take a > look at that suggested clear_refs patch of mine - it sure isn't > complicated. If we can skip the wrprotection it's fairly easy, I fully agree, even then it still looks more difficult than using page_mapcount in do_wp_page in my view, so I also don't see the simplification. And overall the amount of kernel code had a net increase as result. Thanks, Andrea