Received: by 10.213.65.68 with SMTP id h4csp920753imn; Wed, 4 Apr 2018 09:26:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx48AuXEgz6jnq2okN1dS/wMWYel4/+Y12bk9uJdtYxmvUZVTtN7ml7X8cZ5OHNvOvEXjRdgq X-Received: by 2002:a17:902:2e:: with SMTP id 43-v6mr19191607pla.282.1522859161129; Wed, 04 Apr 2018 09:26:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522859161; cv=none; d=google.com; s=arc-20160816; b=hvjifEj34kjX52FqXMl8VXXC0Ig+EmlR/lnzxODrbhGFpvQCtR91YrkkNjejgqfpG5 6aIUOv0cb9aIWtyhpFRB+/TICyj5D3MSzF0B9vO5T7wXm/f/8pgLcVkGJ4lKmtzXhBjl JH+oB4lUEsLq7t1qOYYdozF3i8PjtnleaJqreYV+uS4T2d1KlzX2DhZ4HxHvF1fGvzOo WqHuIT4dTh1JZNGzxZwhjJu35vOzJ1vg56MLEqZZzpNtQ81Ly0w1CRQRJOv9meVU1GB2 +At0CTTEIe7pgakjJ7A2Hgo3j4Wg19Os6aNyUHdXKTHTSb3a83FFrtOXGB8IppOdVpJu R5OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=FQwx+vkpp+Bp11zqugwxWfy6adh8H3t+WzkwSNZcPbg=; b=yK2/W97kK49yYo5bL/LgA8LiPYrh1R6kv/+YF8mPK0Dt+X1/8pN2zccASEEou96wdP VpPzoSllAfWsQKTPftopl1ssAQNhVEV2fHi8S7pVzx4tkhGq6zhGYkyUNB0En8pj4eUE Gicov0KaIudnuT6dt7x2r46gqa0naJJGHsPuVS0S3yAXWb9COrSq+bExx/dxAd263tYp qEebTaAMKzIYWIsU7jnAvf5D2fo9gHeDMTSsV0Xv81L8+svG/KDGeaQY1pYUCqL58H7n T+jULkaZmsVJ1PGDFy1LV2RZVWEtBWbGDZyhDNEGMIMvY5gXEWvFOF7/cEhdjkxDNN9j +tDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=erk6+Br2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q14si3985119pgr.311.2018.04.04.09.25.47; Wed, 04 Apr 2018 09:26:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=erk6+Br2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752538AbeDDQYf (ORCPT + 99 others); Wed, 4 Apr 2018 12:24:35 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:42222 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbeDDQYe (ORCPT ); Wed, 4 Apr 2018 12:24:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=FQwx+vkpp+Bp11zqugwxWfy6adh8H3t+WzkwSNZcPbg=; b=erk6+Br20pJlzHD5S04U2/vTF F709/Etgbj9bIA05Z3YHjkNxYvVNN9M6eZRGUOkyY/mdB8PWf5aX32gz6gtJmOddCgDG+5gVy4S/a /ZFcfN+zOSzsjgM7hspmQRkEM4QLc9VA46YV09Pi6tfo17LrMF4XUeFj0Ii2HIyv77K43f1TNqIdF QPh0zhoRJLOu/dZrwKK07DNS9R/1bKrf0eFFO1Ux1Lmz12NZPmcMft+cjLtrri1W91b1Uj4hMgyHs ku+H6YqUcDIkA/jXWgTjhrsd65Du0sqWEIWeMPuP5m15qJmky/DMri62J67ka9woFeP6uqK9DbG1t nlHlnfrmg==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1f3lD7-00036n-6V; Wed, 04 Apr 2018 16:24:33 +0000 Date: Wed, 4 Apr 2018 09:24:33 -0700 From: Matthew Wilcox To: Daniel Vetter Cc: dri-devel , Linux MM , Souptick Joarder , Linux Kernel Mailing List Subject: Re: Signal handling in a page fault handler Message-ID: <20180404162433.GB16142@bombadil.infradead.org> References: <20180402141058.GL13332@bombadil.infradead.org> <20180404093254.GC3881@phenom.ffwll.local> <20180404143900.GA1777@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 04, 2018 at 05:15:46PM +0200, Daniel Vetter wrote: > On Wed, Apr 4, 2018 at 4:39 PM, Matthew Wilcox wrote: > > I actually have plans to allow mutex_lock_{interruptible,killable} to > > return -EWOULDBLOCK if a flag is set. So this doesn't seem entirely > > unrelated. Something like this perhaps: > > > > struct task_struct { > > + unsigned int sleep_state; > > }; > > > > static noinline int __sched > > -__mutex_lock_interruptible_slowpath(struct mutex *lock) > > +__mutex_lock_slowpath(struct mutex *lock, long state) > > { > > - return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_); > > + if (state == TASK_NOBLOCK) > > + return -EWOULDBLOCK; > > + return __mutex_lock(lock, state, 0, NULL, _RET_IP_); > > } > > > > +int __sched mutex_lock_state(struct mutex *lock, long state) > > +{ > > + might_sleep(); > > + > > + if (__mutex_trylock_fast(lock)) > > + return 0; > > + > > + return __mutex_lock_slowpath(lock, state); > > +} > > +EXPORT_SYMBOL(mutex_lock_state); > > > > Then the page fault handler can do something like: > > > > old_state = current->sleep_state; > > current->sleep_state = TASK_INTERRUPTIBLE; > > ... > > current->sleep_state = old_state; > > > > > > This has the page-fault-in-a-signal-handler problem. I don't know if > > there's a way to determine if we're already in a signal handler and use > > a different sleep_state ...? > > Not sure what problem you're trying to solve, but I don't think that's > the one we have. The only way what we do goes wrong is if the fault > originates from kernel context. For faults from the signal handler I > think you just get to keep the pieces. Faults form kernel we can > detect through FAULT_FLAG_USER. Gah, I didn't explain well enough ;-( From the get_user_pages (and similar) handlers, we'd do old_state = current->sleep_state; current->sleep_state = TASK_KILLABLE; ... current->sleep_state = old_state; So you wouldn't need to discriminate on whether FAULT_FLAG_USER was set, but could just use current->sleep_state. > The issue I'm seeing is the following: > 1. Some kernel code does copy_*_user, and it points at a gpu mmap region. > 2. We fault and go into the gpu driver fault handler. That refuses to > insert the pte because a signal is pending (because of all the > interruptible waits and locks). > 3. Fixup section runs, which afaict tries to do the copy once more > using copy_user_handle_tail. > 4. We fault again, because the pte is still not present. > 5. GPU driver is still refusing to install the pte because signals are pending. > 6. Fixup section for copy_user_handle_tail just bails out. > 7. copy_*_user returns and indicates that that not all bytes have been copied. > 8. syscall (or whatever it is) bails out and returns to userspace, > most likely with -EFAULT (but this ofc depends upon the syscall and > what it should do when userspace access faults. > 9. Signal finally gets handled, but the syscall already failed, and no > one will restart it. If userspace is prudent, it might fail (or maybe > hit an assert or something). I think my patch above fixes this. It makes the syscall killable rather than interruptible, so it can never observe the short read / -EFAULT return if it gets a fatal signal, and the non-fatal signal will be held off until the syscall completes. > Or maybe I'm confused by your diff, since nothing seems to use > current->sleep_state. The problem is also that it's any sleep we do > (they all tend to be interruptible, at least when waiting for the gpu > or taking any locks that might be held while waiting for the gpu, or > anything else that might be blocked waiting for the gpu really). So > only patching mutex_lock won't fix this. Sure, I was only patching mutex_lock_state in as an illustration. I've also got a 'state' equivalent for wait_on_page_bit() (although I'm not sure you care ...). Looks like you'd need wait_for_completion_state() and wait_event_state_timeout() as well.