Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4454457pxu; Mon, 21 Dec 2020 12:57:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJxDyyhwmH0P0PRvlagvX9eidqGX/csXeewuVNssZole82esBmom5ruhxO7fqa9Zhx/oVjts X-Received: by 2002:a17:906:6449:: with SMTP id l9mr11201127ejn.320.1608584268021; Mon, 21 Dec 2020 12:57:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608584268; cv=none; d=google.com; s=arc-20160816; b=KZM6lr2HzENFqt1g0D19ibV3umFZcldh9XuUM/qyaYgL+jjpj3vf9VMC8OxWZDuosl k0ntWsvtBw+PM1ibf6YTEgF4+u12kk+fxNUhvX9S1f3U39Pw2Hns3LunJaPv5fouaRLG cz2Nhr9paAC8pP2PSiHgSRG/UkyTBLJZZEEiCIgcItO6TK3014IwiF6tyPbGdLJ3+Bbn tlUkJEQFR/2hsnI3ym61N/1F41Vg35+Tk9rtcgLNqaRTfC/VcYmcvnwl7UdcSIwXqKY4 ww6EAX3lgvALMloN5O9aT/lMj7KPjajrrrFDJVL9SDk0OLRpLmK5R6+7ZbypQypQa5yB W4Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=YISt5j4b8XAb8GNT7dnsvmnkE22RNgIdTlImdupifmU=; b=bMQAYUUOVT/XxTvKHDpSQ8kk4SNAfLz3q6nGms32ybVRA1Ha+JuNLS9fpKLiKz/Pp/ SoHrqfkCkWnYJZyQ3K3dw0EsMpP0Kn5XFjRDQlYsMRgEa9uYh5tPnbMEKqh8AeI8t6hj ic42uYA6F3UIJSGfBx5dJMyhepSp2Gj9eq2yvlsEhqgCnp+57zAyKDXNgMwKO65rO/Gn B/AcD2TigjqxkuA7IpfDjNahGbpusYvjBeReXgcNKlFOf45s3toR4FoAwoaF5pY67vQ0 RUBSgsvowkk9+u/bdDLoXH5M6mPPkrid6BJjxsON0XLOFhhnbHJ+YxPsOeHTlXWPCotB BBLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DGBQZcf0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j20si1435233ejt.403.2020.12.21.12.57.25; Mon, 21 Dec 2020 12:57:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DGBQZcf0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726094AbgLUUyS (ORCPT + 99 others); Mon, 21 Dec 2020 15:54:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:54200 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725780AbgLUUyR (ORCPT ); Mon, 21 Dec 2020 15:54:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608583970; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YISt5j4b8XAb8GNT7dnsvmnkE22RNgIdTlImdupifmU=; b=DGBQZcf0pJ3iJ7VJ1xWubLRp0evF4DvpT7ZVpYcIXwMUgdKZowv5dbhsoz9Ce4zYucFA+h VexD8Bjo3+rNyXJqkKD6j9cQOmGP5/MI2WPC0M2MCWsoILnEvvIpl4NFp8augtxQmuS82M ARKBOYULMiJnvGTVNQU36AtbdN5Yf+w= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-452-Y-LlExs1OneXozbAn18_zQ-1; Mon, 21 Dec 2020 15:52:48 -0500 X-MC-Unique: Y-LlExs1OneXozbAn18_zQ-1 Received: by mail-qt1-f197.google.com with SMTP id v9so8712119qtw.12 for ; Mon, 21 Dec 2020 12:52:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=YISt5j4b8XAb8GNT7dnsvmnkE22RNgIdTlImdupifmU=; b=Vi2B6INP+sAolHP27pHw04ApKT4RORMs1EKiMTWPSYyMMIAJJrGcGRPS6EKEnRRHhT ljaTa6yK0DFsR2j5MgDAponUB79T36LwoWZZd7Cw0aQSvI+jO39/iAFqHKSdk3M8Pqj2 X4tFKb+E3y/JLif7bFtjq5n+vRtYcx1uEweATZ/+MeRC6puKzmY/UWYpzSIhPbR8a0hn f+3H/f+pVAdvQVoIJRRvc5eXnrkxTJprMxV3YVwJ9HusA2lWa8RhSAbjj4knQjOd0r2j G/HZFNPIYckMCfs7YVjNJzTeFp3hKatUcDpIiZ6OWDk5dxjJ8aW0uxlaq2ZY4gIinZce lEIw== X-Gm-Message-State: AOAM532BhuDKSjvU1LXm81nOxNd8Ja1nhZe8tZnnggNQm3KwFi1poTV4 58SKtKmU8rYm+hmB1DELLwaBb458c/HzoWkeTr8CmHLcn8ipMJv+SjY7cDudUFKsSWcNcefGVoG hBXvaeD2nISgOQVbFMjCwW23X X-Received: by 2002:ac8:1c6a:: with SMTP id j39mr18262160qtk.341.1608583968347; Mon, 21 Dec 2020 12:52:48 -0800 (PST) X-Received: by 2002:ac8:1c6a:: with SMTP id j39mr18262143qtk.341.1608583968088; Mon, 21 Dec 2020 12:52:48 -0800 (PST) Received: from xz-x1 ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id y67sm1521029qka.68.2020.12.21.12.52.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Dec 2020 12:52:47 -0800 (PST) Date: Mon, 21 Dec 2020 15:52:45 -0500 From: Peter Xu To: Nadav Amit Cc: "linux-fsdevel@vger.kernel.org" , Jens Axboe , Andrea Arcangeli , Alexander Viro , "io-uring@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [RFC PATCH 03/13] selftests/vm/userfaultfd: wake after copy failure Message-ID: <20201221205245.GJ6640@xz-x1> References: <20201129004548.1619714-1-namit@vmware.com> <20201129004548.1619714-4-namit@vmware.com> <20201221192846.GH6640@xz-x1> <2B08ECCA-A7D2-4743-8956-571CB8788FDA@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2B08ECCA-A7D2-4743-8956-571CB8788FDA@vmware.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 21, 2020 at 07:51:52PM +0000, Nadav Amit wrote: > > On Dec 21, 2020, at 11:28 AM, Peter Xu wrote: > > > > On Sat, Nov 28, 2020 at 04:45:38PM -0800, Nadav Amit wrote: > >> From: Nadav Amit > >> > >> When userfaultfd copy-ioctl fails since the PTE already exists, an > >> -EEXIST error is returned and the faulting thread is not woken. The > >> current userfaultfd test does not wake the faulting thread in such case. > >> The assumption is presumably that another thread set the PTE through > >> copy/wp ioctl and would wake the faulting thread or that alternatively > >> the fault handler would realize there is no need to "must_wait" and > >> continue. This is not necessarily true. > >> > >> There is an assumption that the "must_wait" tests in handle_userfault() > >> are sufficient to provide definitive answer whether the offending PTE is > >> populated or not. However, userfaultfd_must_wait() test is lockless. > >> Consequently, concurrent calls to ptep_modify_prot_start(), for > >> instance, can clear the PTE and can cause userfaultfd_must_wait() > >> to wrongly assume it is not populated and a wait is needed. > > > > Yes userfaultfd_must_wait() is lockless, however my understanding is that we'll > > enqueue before reading the page table, which seems to me that we'll always get > > notified even the race happens. Should apply to either UFFDIO_WRITEPROTECT or > > UFFDIO_COPY, iiuc, as long as we follow the order of (1) modify pgtable (2) > > wake sleeping threads. Then it also means that when must_wait() returned true, > > it should always get waked up when fault resolved. > > > > Taking UFFDIO_COPY as example, even if UFFDIO_COPY happen right before > > must_wait() calls: > > > > worker thread uffd thread > > ------------- ----------- > > > > handle_userfault > > spin_lock(fault_pending_wqh) > > enqueue() > > set_current_state(INTERRUPTIBLE) > > spin_unlock(fault_pending_wqh) > > must_wait() > > lockless walk page table > > UFFDIO_COPY > > fill in the hole > > wake up threads > > (this will wake up worker thread too?) > > schedule() > > (which may return immediately?) > > > > While here fault_pending_wqh is lock protected. I just feel like there's some > > other reason to cause the thread to stall. Or did I miss something? > > But what happens if the copy completed before the enqueuing? Assume > the page is write-protected during UFFDIO_COPY: > > > cpu0 cpu1 > ---- ---- > handle_userfault > UFFDIO_COPY > [ write-protected ] > fill in the hole > wake up threads > [nothing to wake] > > UFFD_WP (unprotect) > logically marks as unprotected > [nothing to wake] > > spin_lock(fault_pending_wqh) > enqueue() > set_current_state(INTERRUPTIBLE) > spin_unlock(fault_pending_wqh) > must_wait() > > [ #PF on the same PTE > due to write-protection ] > > ... > wp_page_copy() > ptep_clear_flush_notify() > [ PTE is clear ] > > lockless walk page table > pte_none(*pte) -> must wait > > Note that additional scenarios are possible. For instance, instead of > wp_page_copy(), we can have other change_pte_range() (due to worker’s > mprotect() or NUMA balancing), calling ptep_modify_prot_start() and clearing > the PTE. > > Am I missing something? Ah I see your point, thanks. I think you're right: Reviewed-by: Peter Xu Would you mind adding something like above into the commit message if you're going to repost? IMHO it would even be nicer to mention why UFFDIO_WRITEPROTECT does not need this extra wakeup (I think it's because it'll do the wakeup unconditionally anyway). -- Peter Xu