Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp5273517pxu; Tue, 22 Dec 2020 12:29:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJwigYwpmAAc1aP5iAyZSg45nBVV68JiiHvPlQv6+NVNUoFto5BqvM9YT3hcYnVIGO0YEkGS X-Received: by 2002:a05:6402:95c:: with SMTP id h28mr21807600edz.26.1608668966371; Tue, 22 Dec 2020 12:29:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608668966; cv=none; d=google.com; s=arc-20160816; b=jGqY5V6xIsWQY2xYqAL76otQTHzxxyszkEDW50mgZ5zOA7VS7tZLryjsv633LCFQce d10qgafB9lbrSdZI1tS58rnkbL0LRJCSf9vl8dfhKGhgLZkGUFFbjThV6voBrMoRk7sC sQZAr6kV6DdfBC87PH/ujuShuPaBMXh2WuR7OFRU+HjxczR4RJ9EJOXxZ04rrlmX52kR dExfnGAf4n9YofjPoDeYurOjLJZc9dUSCrvLuPIynBaHT/JHDcoxXLTHlkt6uhAfCG5Y JYFvJASuN2xHVrcD0CZlsI8LAYsFwgmgNs2laeCOb8vNetEGn9bP1JMHSAH8rgZ1a+fz HWbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=s7akrYpynUQd+BLDMzyxOvN0FZj7YM32yeR9YZxTgZw=; b=VgDn/dcD5gQD+n7pqE56H3iEYFkxm40rPDvGb9oGaCrmYfCu66QKmJEfjzFKVEd7vO 3nd1faCTUJPJI+e3Nk6rN5IarTD+0WG1zVLWwEezXVjawi/VXChYvz71rrUKmtti3DKm OB2gDra8lf/jhil/na3p+i+FQ01/OElgl0GSPQkw1gOgwCCjx+oBxfn6K+eL5G4ZGWt9 NgGUtlhXn/Sd2VJLjBM5p6HH7sLVNguFCgc5QK/vKGzij/hXRhVW2/LjGgPXu/jy4LlG 8FE9AJZEtZi65zbHKf9mJP6KbsHexTI3yfxpfjWpzb4fqji8UYl/6e8H+9wQILFPWAYg GLhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hjiglSOh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d14si13184391edp.294.2020.12.22.12.29.04; Tue, 22 Dec 2020 12:29:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hjiglSOh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727402AbgLVU1i (ORCPT + 99 others); Tue, 22 Dec 2020 15:27:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49922 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727235AbgLVU1h (ORCPT ); Tue, 22 Dec 2020 15:27:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608668770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=s7akrYpynUQd+BLDMzyxOvN0FZj7YM32yeR9YZxTgZw=; b=hjiglSOhEJ8p3qdrhk0jrBvcpfIhE7bH56urC7EakRZgGDrj3YxGsnhSJApVKZY9gIXxkF fWipWpk4DXp9S10T2ZQuFJ3IA3Mp/bU3EiGt4N6dFzxDVkeNE8vAuDvg99Xw5evXHDcOCX U9InMe57kIGlWC+kHf7Qpm4nfR8KC7U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-44-KhixZ06fOeqJknDg46z2uA-1; Tue, 22 Dec 2020 15:26:08 -0500 X-MC-Unique: KhixZ06fOeqJknDg46z2uA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE031180E469; Tue, 22 Dec 2020 20:26:06 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A34BB5D9CC; Tue, 22 Dec 2020 20:26:03 +0000 (UTC) Date: Tue, 22 Dec 2020 15:26:03 -0500 From: Andrea Arcangeli To: Matthew Wilcox Cc: Andy Lutomirski , Linus Torvalds , Peter Xu , Nadav Amit , Yu Zhao , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra , Kent Overstreet Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> <20201222201553.GM874@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201222201553.GM874@casper.infradead.org> User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 22, 2020 at 08:15:53PM +0000, Matthew Wilcox wrote: > On Tue, Dec 22, 2020 at 02:31:52PM -0500, Andrea Arcangeli wrote: > > My previous suggestion to use a mutex to serialize > > userfaultfd_writeprotect with a mutex will still work, but we can run > > as many wrprotect and un-wrprotect as we want in parallel, as long as > > they're not simultaneous, we can do much better than a mutex. > > > > Ideally we would need a new two_group_semaphore, where each group can > > run as many parallel instances as it wants, but no instance of one > > group can run in parallel with any instance of the other group. AFIK > > such a kind of lock doesn't exist right now. > > Kent and I worked on one for a bit, and we called it a red-black mutex. > If team red had the lock, more members of team red could join in. > If team black had the lock, more members of team black could join in. > I forget what our rule was around fairness (if team red has the lock, > and somebody from team black is waiting, can another member of team red > take the lock, or must they block?) In this case they would need to block and provide full fairness. Well maybe just a bit of unfariness (to let a few more through the door before it shuts) wouldn't be a deal breaker but it would need to be bound or it'd starve the other color/side indefinitely. Otherwise an ioctl mode_wp = true would block forever, if more ioctl mode_wp = false keep coming in other CPUs (or the other way around). The approximation with rwsem and two atomics provides full fariness in both read and write mode (originally the read would stave the write IIRC which was an issue for all mprotect etc.. not anymore thankfully). > It was to solve the direct-IO vs buffered-IO problem (you can have as many > direct-IO readers/writers at once or you can have as many buffered-IO > readers/writers at once, but exclude a mix of direct and buffered I/O). > In the end, we decided it didn't work all that well. Well mixing buffered and direct-IO is certainly not a good practice so it's reasonable to leave it up to userland to serialize if such mix is needed, the kernel behavior is undefined if the mix is concurrent out of order.