Received: by 10.213.65.68 with SMTP id h4csp1000168imn; Wed, 4 Apr 2018 10:48:33 -0700 (PDT) X-Google-Smtp-Source: AIpwx48rNIRq+8D3htIM+KQzaXfW5NHrbGXMnQt2craNfNFluwv4abx2ChvmPG3mluJNoUsSjuI7 X-Received: by 2002:a17:902:3341:: with SMTP id a59-v6mr19511651plc.68.1522864113743; Wed, 04 Apr 2018 10:48:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522864113; cv=none; d=google.com; s=arc-20160816; b=jEculX2MsVr0pEXmdYmOQcXeR7BEtqjXp4NVywzKl5BqCOs8PDOqf7T/7HYsfVfSCs CBAC4iHJYSoG3yjvbA/nyja634vpFPka51cL/B7xe+MSRHzIpLoojPe4lT1fijLu2yzN XhWd/TRhkgYLdiRuCTnS+++gcoCDgA+bsQCxm8c2Wt46SjfH2Ke9hY/4Lx7z5TQc8Xu9 G8dwaxpNhoYWKSFILdj8EkCa0dVwIRKpEYb7chfuyEdycqjxerXoYHiEqD/QOJaB5nqj o/QoRubA4IzV/SkwksTjqCbchcYlz3+Yy825yq9Id49fiX2aoeTINmy8rhdS2ybYSbJN D+Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=EJpj5SkaCyVnzr9hbe5WwYANDwf0rWLTMt4uIn2Bpjk=; b=oA04bgbXra0WkRoI6VzOjRKb6QvgJH/8ngiwb6IFzg6dQCt1PeV8w5H8VL6O6Izz3s N4RMobJgdCR2ZbqPT1C8QrvgEmGGKr0m/+AnTkamv1nnPOYUIpRpBSIGoudtkjx6PURj C5yglii6UCachfbqAk/TLe8c6A5O4BzUNBb67UhcduiZcR1xvchlpzq1PbG42x0p/Cw4 IWe47zIiup4CG5nuqhSrhgCesxQtFO5wfJR+RDnSOW19xw88cn3Rn2ZcGkOCIjO29cDq 8TTXO/9hDyK88sfk1m8f8ojS5fzXiQjakrV3is9iWiuTkyhKR7JgpE5B8P92GeAdJGQ9 NAXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=U79QHU3o; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p77si4439659pfk.294.2018.04.04.10.48.19; Wed, 04 Apr 2018 10:48:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=U79QHU3o; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751410AbeDDRp1 (ORCPT + 99 others); Wed, 4 Apr 2018 13:45:27 -0400 Received: from mail-io0-f181.google.com ([209.85.223.181]:45712 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750915AbeDDRpZ (ORCPT ); Wed, 4 Apr 2018 13:45:25 -0400 Received: by mail-io0-f181.google.com with SMTP id 141so27319557iou.12 for ; Wed, 04 Apr 2018 10:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=EJpj5SkaCyVnzr9hbe5WwYANDwf0rWLTMt4uIn2Bpjk=; b=U79QHU3ouIk3RpLt8Ljp0iEBvbU5spKrXNBjueqHyJ1Ha7or2HBRXOm4i5w+oRSbFv b84rtkWSgl7EUw/vETKkhq7xBdOE2WZCkVa2Jp4Bru+wqPK03EleVtL3J17RsqxB7Q2f FJ9C8oveXEHI83WClY1W5E+c2wroABjfecHik= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=EJpj5SkaCyVnzr9hbe5WwYANDwf0rWLTMt4uIn2Bpjk=; b=ZKNhP/3Bb3mwOi8Knu9UBufZxHCW2QNOEAnuFaXrKdKNBgwH8x8kb/Bz4S3LEzMxW4 EKsnS/AW/78NlJo4TVm/w6QJ0x6t7paY9QtB5KjqTCQ8BUIsbJn6eTq5tW9EDZJKAWyd 2O2Cy/eWE+u6GMRnBPV3Wv8WQfTODvzfQIdae7JRabHAcIDsUlsPKmD8gekMhMOGSNMI 1o6f0iBwkf29g6jq2FyznJODfbNUhJ2gHbdqv18RhhYQbRJamBYSGVvzzkGZ08jkfEot MYKwV7RSJiCfDDbBW2UJLORzlEo6w3pYgQDoYlI7HbE4q77ifcr9KrubWs/9qVstrav0 INXw== X-Gm-Message-State: ALQs6tBO/10B+Ay+VtBdcmShcsoUCR1UekOEKWaB3eLXDh5eL4t+k3eL IuLcddp6PeJF/QdRGlhucP1/kzVN+W8pHSq8j7SdtA== X-Received: by 10.107.164.199 with SMTP id d68mr18057638ioj.34.1522863924543; Wed, 04 Apr 2018 10:45:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.40.129 with HTTP; Wed, 4 Apr 2018 10:45:23 -0700 (PDT) X-Originating-IP: [212.51.149.109] In-Reply-To: <20180404162433.GB16142@bombadil.infradead.org> References: <20180402141058.GL13332@bombadil.infradead.org> <20180404093254.GC3881@phenom.ffwll.local> <20180404143900.GA1777@bombadil.infradead.org> <20180404162433.GB16142@bombadil.infradead.org> From: Daniel Vetter Date: Wed, 4 Apr 2018 19:45:23 +0200 Message-ID: Subject: Re: Signal handling in a page fault handler To: Matthew Wilcox Cc: dri-devel , Linux MM , Souptick Joarder , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 4, 2018 at 6:24 PM, Matthew Wilcox wrote: > On Wed, Apr 04, 2018 at 05:15:46PM +0200, Daniel Vetter wrote: >> On Wed, Apr 4, 2018 at 4:39 PM, Matthew Wilcox wrote: >> > I actually have plans to allow mutex_lock_{interruptible,killable} to >> > return -EWOULDBLOCK if a flag is set. So this doesn't seem entirely >> > unrelated. Something like this perhaps: >> > >> > struct task_struct { >> > + unsigned int sleep_state; >> > }; >> > >> > static noinline int __sched >> > -__mutex_lock_interruptible_slowpath(struct mutex *lock) >> > +__mutex_lock_slowpath(struct mutex *lock, long state) >> > { >> > - return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_); >> > + if (state == TASK_NOBLOCK) >> > + return -EWOULDBLOCK; >> > + return __mutex_lock(lock, state, 0, NULL, _RET_IP_); >> > } >> > >> > +int __sched mutex_lock_state(struct mutex *lock, long state) >> > +{ >> > + might_sleep(); >> > + >> > + if (__mutex_trylock_fast(lock)) >> > + return 0; >> > + >> > + return __mutex_lock_slowpath(lock, state); >> > +} >> > +EXPORT_SYMBOL(mutex_lock_state); >> > >> > Then the page fault handler can do something like: >> > >> > old_state = current->sleep_state; >> > current->sleep_state = TASK_INTERRUPTIBLE; >> > ... >> > current->sleep_state = old_state; >> > >> > >> > This has the page-fault-in-a-signal-handler problem. I don't know if >> > there's a way to determine if we're already in a signal handler and use >> > a different sleep_state ...? >> >> Not sure what problem you're trying to solve, but I don't think that's >> the one we have. The only way what we do goes wrong is if the fault >> originates from kernel context. For faults from the signal handler I >> think you just get to keep the pieces. Faults form kernel we can >> detect through FAULT_FLAG_USER. > > Gah, I didn't explain well enough ;-( > > From the get_user_pages (and similar) handlers, we'd do > > old_state = current->sleep_state; > current->sleep_state = TASK_KILLABLE; > ... > current->sleep_state = old_state; > > So you wouldn't need to discriminate on whether FAULT_FLAG_USER was set, > but could just use current->sleep_state. > >> The issue I'm seeing is the following: >> 1. Some kernel code does copy_*_user, and it points at a gpu mmap region. >> 2. We fault and go into the gpu driver fault handler. That refuses to >> insert the pte because a signal is pending (because of all the >> interruptible waits and locks). >> 3. Fixup section runs, which afaict tries to do the copy once more >> using copy_user_handle_tail. >> 4. We fault again, because the pte is still not present. >> 5. GPU driver is still refusing to install the pte because signals are pending. >> 6. Fixup section for copy_user_handle_tail just bails out. >> 7. copy_*_user returns and indicates that that not all bytes have been copied. >> 8. syscall (or whatever it is) bails out and returns to userspace, >> most likely with -EFAULT (but this ofc depends upon the syscall and >> what it should do when userspace access faults. >> 9. Signal finally gets handled, but the syscall already failed, and no >> one will restart it. If userspace is prudent, it might fail (or maybe >> hit an assert or something). > > I think my patch above fixes this. It makes the syscall killable rather > than interruptible, so it can never observe the short read / -EFAULT > return if it gets a fatal signal, and the non-fatal signal will be held > off until the syscall completes. > >> Or maybe I'm confused by your diff, since nothing seems to use >> current->sleep_state. The problem is also that it's any sleep we do >> (they all tend to be interruptible, at least when waiting for the gpu >> or taking any locks that might be held while waiting for the gpu, or >> anything else that might be blocked waiting for the gpu really). So >> only patching mutex_lock won't fix this. > > Sure, I was only patching mutex_lock_state in as an illustration. > I've also got a 'state' equivalent for wait_on_page_bit() (although > I'm not sure you care ...). > > Looks like you'd need wait_for_completion_state() and > wait_event_state_timeout() as well. Yeah, plus ww_mutex_lock_state, and same for the dma_fence wait functions, but we can patch those once the core stuff has landed. The part I'm wondering about, that you didn't really illustrate: Are mutex_lock_interruptible&friends supposed to automatically use current->sleep_state, or are we somehow supposed to pass that through the call stack? Or is the idea that we'd mass-convert all the gpu code that can be reached from the fault handlers to use the new _state variants, which will internally consult sleep_state? Asking since the former would be best for us: Most paths can do interruptible, but some can't, so changing the current->state logic to take the most limiting of both the explicitly passed sleep_state and the one set in current->sleep_state would be best. Otrherwise if you have a path that can at most do killable waits, but it's context might either allow fully interruptible or killable or even neither, then there's still some fragile code around. Bonus points if you could even nest assignemtns to current->sleep_state (even when there's not really all that many options), similar to how preempt_disable and other atomic sections nest. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch