Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp14089223pxu; Mon, 4 Jan 2021 12:24:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJweTzfZQUCeIgxPUl/bphqgtyg69cypRJ0kWmtrWQ0VQaTsv1hmvkq686Fnnk28HUGSKtUj X-Received: by 2002:aa7:c353:: with SMTP id j19mr72650965edr.204.1609791857164; Mon, 04 Jan 2021 12:24:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609791857; cv=none; d=google.com; s=arc-20160816; b=KlNW40siOrqth7F9hZ2AeLYTAnhyl+ATD1a8+gchhdOg13/o4GaAJHWkjMEqXypMjy SefpuuM5PVUj2L0ckfS0xrfZGV/lQWmn8+Y5NuaCVlExPrG/kEzg644B1zB5HVhmu2XO pL5QOeYJFgJOEf9FsdCrw5etUUt0NxYdXKipvJ+n5DLqEK47GB8/FBjPKPiU2sQEe3M7 Lwn/VPxWdP9guHlaBu9rLxMxxFzvtXNCnKpmvioqc+bMp6KyQUbEfbf1Jo86F+01KRzS Ven0kTAhU3D1xbUKCX8qFk1FYnf7rlNrCxB6FWV2+TDuRW4mots3ygyvHjwrbmUbR9wJ dgZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=CHG8PYLjaqUlV4xnF924RfJu2w8NSfaaPiEISSrHtOc=; b=e6cNK7HoTeE7QrzOI8KEqSMXbXht2QR/lA+dVcomg1pHz9Albk9+eU7ZTS/2qRVfRl LR4gPPmigxA2g2hYikObGuTjn/MU+WJ3ZUWjnScOLNx0myrNhbXrRZ01YUYwhSNs+ZcD l58aAI9CMMNRkADWG3mQ2wpiCxwzxRGa6Y5QKvqHPRL+RP+b7JgwK6KoFD7GzDjZn3qZ zkHsUkB3iEmOEvmaxzrp5jo/3kgyfsI8DMMLcUzIow3zarDXZn47SngNqZT3NVoDrIdi TMiolGKVOCl9s7IumpOFfJLq45EmOoDR5rHrGJqkWCQhA4OFI/SjnPjZpux9kWrEQule qJ8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ETS2/By4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cq5si8716674edb.200.2021.01.04.12.23.53; Mon, 04 Jan 2021 12:24:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ETS2/By4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728038AbhADUVa (ORCPT + 99 others); Mon, 4 Jan 2021 15:21:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:37060 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727783AbhADUVa (ORCPT ); Mon, 4 Jan 2021 15:21:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609791604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CHG8PYLjaqUlV4xnF924RfJu2w8NSfaaPiEISSrHtOc=; b=ETS2/By4C4AKAPnQeYPgXI5/mj7rW6krv4JLWHgCLM+0ZfZSOC3OFVNUifpzJr5vfMlKZV ToGfdDkmiqyBsd8caBaAp8mag514eZFYXjG4uF8gbKslAgraNCiu3OojBUJ0zD+4/HYNgF 2dvkyU2MexCEM3G6b+x6pcMooHOElhU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-414-ITELSFJ3NtGEu36vk7ubOw-1; Mon, 04 Jan 2021 15:20:00 -0500 X-MC-Unique: ITELSFJ3NtGEu36vk7ubOw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7A0DE107ACE3; Mon, 4 Jan 2021 20:19:58 +0000 (UTC) Received: from mail (ovpn-112-76.rdu2.redhat.com [10.10.112.76]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E4CD37086C; Mon, 4 Jan 2021 20:19:54 +0000 (UTC) Date: Mon, 4 Jan 2021 15:19:54 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: Peter Zijlstra , linux-mm , lkml , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Mel Gorman Subject: Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201225092529.3228466-1-namit@vmware.com> <20201225092529.3228466-2-namit@vmware.com> <20210104122227.GL3021@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 04, 2021 at 07:35:06PM +0000, Nadav Amit wrote: > > On Jan 4, 2021, at 11:24 AM, Andrea Arcangeli wrote: > > > > Hello, > > > > On Mon, Jan 04, 2021 at 01:22:27PM +0100, Peter Zijlstra wrote: > >> On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > >> > >>> The scenario that happens in selftests/vm/userfaultfd is as follows: > >>> > >>> cpu0 cpu1 cpu2 > >>> ---- ---- ---- > >>> [ Writable PTE > >>> cached in TLB ] > >>> userfaultfd_writeprotect() > >>> [ write-*unprotect* ] > >>> mwriteprotect_range() > >>> mmap_read_lock() > >>> change_protection() > >>> > >>> change_protection_range() > >>> ... > >>> change_pte_range() > >>> [ *clear* “write”-bit ] > >>> [ defer TLB flushes ] > >>> [ page-fault ] > >>> ... > >>> wp_page_copy() > >>> cow_user_page() > >>> [ copy page ] > >>> [ write to old > >>> page ] > >>> ... > >>> set_pte_at_notify() > >> > >> Yuck! > > > > Note, the above was posted before we figured out the details so it > > wasn't showing the real deferred tlb flush that caused problems (the > > one showed on the left causes zero issues). > > Actually it was posted after (note that this is v2). The aforementioned > scenario that Peter regards to is the one that I actually encountered (not > the second scenario that is “theoretical”). This scenario that Peter regards > is indeed more “stupid” in the sense that we should just not write-protect > the PTE on userfaultfd write-unprotect. > > Let me know if I made any mistake in the description. I didn't say there is a mistake. I said it is not showing the real deferred tlb flush that cause problems. The issue here is that we have a "defer tlb flush" that runs after "write to old page". If you look at the above, you're induced to think the "defer tlb flush" that causes issues is the one in cpu0. It's not. That is totally harmless. > > > The problematic one not pictured is the one of the wrprotect that has > > to be running in another CPU which is also isn't picture above. More > > accurate traces are posted later in the thread. > > I think I included this scenario as well in the commit log (of v2). Let me > know if I screwed up and the description is not clear. Instead of not showing the real "defer tlb flush" in the trace and then fixing it up in the comment, why don't you take the trace showing the real problematic "defer tlb flush"? No need to reinvent it. https://lkml.kernel.org/r/X+JJqK91plkBVisG@redhat.com See here the detail underlined: deferred tlb flush <- too late XXXXXXXXXXXXXX BUG RACE window close here This show the real deferred tlb flush, your v2 does not include it instead.