Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp392123pxb; Fri, 29 Oct 2021 11:50:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4gJv+ZqoEg6v0YIqn1oSC53l5DYHlyB4xEdrvoDqOELeU3Xv8AiVms36WTrhW2kkXQQGs X-Received: by 2002:aa7:dbc1:: with SMTP id v1mr17128879edt.49.1635533433963; Fri, 29 Oct 2021 11:50:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635533433; cv=none; d=google.com; s=arc-20160816; b=zWdvZUPCZxT03VqBWcEalP5xR/Foe4aFDez/drOgOOVKsGlxD/f+yeXxf3rmqK4jKV i1IncJSTUMbQ2HkXGSV+tQwaCPHFu7oV0qGSXXHOyIjuTZ7o5dooeBqFjnaV6y9yc638 0AIiRtz3Yx5iRQFJet3oYRyef9yINpaNs9FqnO+3YDp5bKOP0TqZ/3QTllAAKAnRr+1V 3OfeaggXuRxESXD3Wi7/MEoUHCUncfIdbm1ILbBKgFUyy+hJdj6DJhGt+BolUZNWDchw iuZ06OHMFan6ZxlDKSesoeTgCiQoX4al0EuUxRTn3PTtQgFT8OveMBacbKlcd/yzk+9Q 0cVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uGPiPKSphOUEs2fEDAQ1oyHUSr8W7H4wPzT5A6mID2k=; b=ecnQLZnc+DehfHIsPHoLrL+C+4Jt7jtUq0XhGUbbLDBc6kj58PZCchu4bR+N6pXLmo dyzJsjQqkBW+bADTwiAQRlmV7w2YX5pBy4RjS6qos4ezybYauhMLmI9hGwP77AZppwFT gxNyDhFg/YGogzrH1QnoqbgcYaQjw/pLsP3UHGU2xMUOvEbK8X6Y4fmWJI4tvALW75pD 9RxHH0sPa5IM1fcQSYyJYzt0oJVuo+muu+hkPSBc8bbAqlM6kOc6K859D5s3FMm8HhG7 mACLSdElqnbxgIxWK4raxWYu2dCKAb+1t3skh5DNCvpoO1275JLliMBb98Pbskp4aaCf BjpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PwcTyiA+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ht11si11698869ejc.656.2021.10.29.11.50.09; Fri, 29 Oct 2021 11:50:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PwcTyiA+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230044AbhJ2SuX (ORCPT + 99 others); Fri, 29 Oct 2021 14:50:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbhJ2SuW (ORCPT ); Fri, 29 Oct 2021 14:50:22 -0400 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33B8EC061570 for ; Fri, 29 Oct 2021 11:47:53 -0700 (PDT) Received: by mail-lf1-x12c.google.com with SMTP id p16so22861354lfa.2 for ; Fri, 29 Oct 2021 11:47:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uGPiPKSphOUEs2fEDAQ1oyHUSr8W7H4wPzT5A6mID2k=; b=PwcTyiA+l0irgoS20hv3ruCOgqQ9pk+pMaL//cWuIfytOjzpNyUjD0wheL47Rw4p5e XrhsN1D22td0eNyy9FTurkcJuSyIf4PSSbRhWBWg03sYd3a5zne0nY8hIhl5wl5Iiq0/ rwXzrUhn5/fXzh2aHHrpyapZm5QCvdomJAv4A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uGPiPKSphOUEs2fEDAQ1oyHUSr8W7H4wPzT5A6mID2k=; b=2IDctzMI7WwbFtcDzRzbEAki1dKtGcd89CIesav35dGcbDloAu978CRV267CmH04hI 2OMggRYF16h11Q7YJEKknS6zVnBLJCy9ww2EgL+p5gcXIZvOskhr0mVqIxzdMJP2qJsO KdhoLyK5wttOjHi0JgXdMFozmsFGQh91xnFQaKqpABjV2EIbxUxgkUUKvfN7rIsScy1H rp4l/kHFkrSRkBQQUsrT1POnh0/CZjD/LxjCGrOtO0xRlV+e86tu9LQ3JJkLS17fjcNI 0TIyKn/YH/zQ9pRc1nI6Y7vnDFU6V7abB7tEXeC1Lz/RSYFAH5R287Rr/XwZFfsIpzs7 XM6g== X-Gm-Message-State: AOAM530VPz8GE93e+AjRaGF+bUkllVlrjd5WV0TTjUxIhPsJ/7WRhizZ rKoLoVL33/spyiQGIc7QzlKQfB2oHMQK5MtnwMY= X-Received: by 2002:a05:6512:12c3:: with SMTP id p3mr9965179lfg.384.1635533270662; Fri, 29 Oct 2021 11:47:50 -0700 (PDT) Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com. [209.85.208.176]) by smtp.gmail.com with ESMTPSA id d2sm121227ljo.15.2021.10.29.11.47.49 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Oct 2021 11:47:49 -0700 (PDT) Received: by mail-lj1-f176.google.com with SMTP id u5so18302338ljo.8 for ; Fri, 29 Oct 2021 11:47:49 -0700 (PDT) X-Received: by 2002:a05:651c:17a6:: with SMTP id bn38mr13088470ljb.56.1635533269069; Fri, 29 Oct 2021 11:47:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Fri, 29 Oct 2021 11:47:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks To: Catalin Marinas Cc: Andreas Gruenbacher , Paul Mackerras , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs , Tony Luck , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 29, 2021 at 10:50 AM Catalin Marinas wrote: > > First of all, a uaccess in interrupt should not force such signal as it > had nothing to do with the interrupted context. I guess we can do an > in_task() check in the fault handler. Yeah. It ends up being similar to the thread flag in that you still end up having to protect against NMI and other users of asynchronous page faults. So the suggestion was more of a "mindset" difference and modified version of the task flag rather than anything fundamentally different. > Second, is there a chance that we enter the fault-in loop with a SIGSEGV > already pending? Maybe it's not a problem, we just bail out of the loop > early and deliver the signal, though unrelated to the actual uaccess in > the loop. If we ever run in user space with a pending per-thread SIGSEGV, that would already be a fairly bad bug. The intent of "force_sig()" is not only to make sure you can't block the signal, but also that it targets the particular thread that caused the problem: unlike other random "send signal to process", a SIGSEGV caused by a bad memory access is really local to that _thread_, not the signal thread group. So somebody else sending a SIGSEGV asynchronsly is actually very different - it goes to the thread group (although you can specify individual threads too - but once you do that you're already outside of POSIX). That said, the more I look at it, the more I think I was wrong. I think the "we have a SIGSEGV pending" could act as the per-thread flag, but the complexity of the signal handling is probably an argument against it. Not because a SIGSEGV could already be pending, but because so many other situations could be pending. In particular, the signal code won't send new signals to a thread if that thread group is already exiting. So another thread may have already started the exit and core dump sequence, and is in the process of killing the shared signal threads, and if one of those threads is now in the kernel and goes through the copy_from_user() dance, that whole "thread group is exiting" will mean that the signal code won't add a new SIGSEGV to the queue. So the signal could conceptually be used as the flag to stop looping, but it ends up being such a complicated flag that I think it's probably not worth it after all. Even if it semantically would be fairly nice to use pre-existing machinery. Could it be worked around? Sure. That kernel loop probably has to check for fatal_signal_pending() anyway, so it would all work even in the presense of the above kinds of issues. But just the fact that I went and looked at just how exciting the signal code is made me think "ok, conceptually nice, but we take a lot of locks and we do a lot of special things even in the 'simple' force_sig() case". > Third is the sigcontext.pc presented to the signal handler. Normally for > SIGSEGV it points to the address of a load/store instruction and a > handler could disable MTE and restart from that point. With a syscall we > don't want it to point to the syscall place as it shouldn't be restarted > in case it copied something. I think this is actually independent of the whole "how to return errors". We'll still need to return an error from the system call, even if we also have a signal pending. Linus