Received: by 10.192.165.148 with SMTP id m20csp362449imm; Wed, 25 Apr 2018 00:16:12 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+t3QgsNSJ6gt2wpzP0A5SwRP6vNB9R33DE4q+uknam3XzT+g/d5dYS7mfm4KMj4+1U9IFz X-Received: by 10.98.166.92 with SMTP id t89mr26644134pfe.27.1524640572584; Wed, 25 Apr 2018 00:16:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524640572; cv=none; d=google.com; s=arc-20160816; b=J+oFrERKSCYL5lInzUvaHIDMj3wKYpQEzCQTmdCvy3vxrWFe2kMhoCNVh00974cGjT KwkDFaumJneDuRjh7CIqKuxsBPEKNTQXKRfQdc8iwqyr7GMZA+gyFwkLu8SC1WT1a/WD 6Zk93gYwwzXgYrF5SUhQawzFokpz6dt/sVhecZEom9RvXaL7ceuRkz/nCQvDXi0lEOUN q14+pR+9sxkkwQmx0VjjTGZhU/7eHBmqx1c5RINecVKM7lCoWNo++2Xcq8yOJzEG4kRD CLFaRQd+XiKujLBKqakr51bL/tHriLOZI2mWq+QnwBM4haNL1P5SCrnAQVwh3U0FqPfO KnRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=rud5p+Wp4zx09unEX0bd3WnJZ6l0lyAUJ5AYGXVXtsc=; b=jtvmZPsSQEj4mE42LfZ8H/4BHVsV/4vIHrCjmqPXCIwgzFI2tgW8KeajUjOc025pBd VjRRG2e7ypTVpOVW65Y04+5/RmfUpJ9uL4uhiKZo5jVqRvvYH8DPfDdssmBViVaNlqCE EcyA5qR6NpSoYmM4J61S9Nnk1Xt+Y4ea8nB1cn+4yRbeZglQJEv3sZQ6GskakmsT36bF S1QIr4QbPk7ecAabescUgct5JnGRz5nlbmH9VyBMJUuyNU5WaLvUfIqokwpOCTEaaUP2 lueRjdEKjvG77SvmJd7QBhfitUOJN0PaFko5zX5COoWyd+tjejrp513AlOafqwbFbvbF 5I2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=Lyt7lRWr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z1-v6si15404946plo.263.2018.04.25.00.15.57; Wed, 25 Apr 2018 00:16:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=Lyt7lRWr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751212AbeDYHOu (ORCPT + 99 others); Wed, 25 Apr 2018 03:14:50 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:38080 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750841AbeDYHOs (ORCPT ); Wed, 25 Apr 2018 03:14:48 -0400 Received: by mail-wm0-f68.google.com with SMTP id i3so4875101wmf.3 for ; Wed, 25 Apr 2018 00:14:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=rud5p+Wp4zx09unEX0bd3WnJZ6l0lyAUJ5AYGXVXtsc=; b=Lyt7lRWrKEPmg1UMgXiRDNRRWw6cSKQir6Z8wFMfPUP/YtyKXIu7EPmk8huMMoCGTC aLDFuhL/MAVWSkMf8tQdIoSQaHvw3YUSFLFEB6MLaynlF/lEA6aPPK0FeZPlL+z+KY77 PHSmko1cBLpN8FATif7hneeuTSLOlGc1YVD5Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=rud5p+Wp4zx09unEX0bd3WnJZ6l0lyAUJ5AYGXVXtsc=; b=lR7hB+YCTvtgJvc+oBrgmG1MmMBfzPYPissM/1QXxDowpyy7J6bq1I0t7cZBXEitdj lNCJezie9DLewDz162/XXnQgcJgBF1bRJk74YumTDGHK0h7bd3NaTUL+8P8vYWnPogFz LCmq2RKnWkZnjsGopks5EOSW7d+9IhvXUo0xLHpEXK9MbJ97dF4slLKup1zuL/KX7MbW uZoRoO6pBDGfW6mN2EggnQle0o6YSZ0NOJKQat7pA6qjk23/HkxiXQ68WJp1QYbc/NGk W0yGTaU+jAKNpoeHc4i5BPJq3X3EGsWI6yMV6V1c2WweAW5NrZNaGDdHKIKTJzwWpPdi AB4Q== X-Gm-Message-State: ALQs6tAVKpG8p6Vd8ZPreCIRNjkomaR8SNNrNod2QkQ1cDrxwb+IMBJT 2WyvKgIx4udZIBVS9fjIeZ3V8w== X-Received: by 10.80.242.146 with SMTP id f18mr2908948edm.176.1524640486749; Wed, 25 Apr 2018 00:14:46 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:5635:0:39d2:f87e:2033:9f6]) by smtp.gmail.com with ESMTPSA id h30sm7919375edh.73.2018.04.25.00.14.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 25 Apr 2018 00:14:46 -0700 (PDT) Date: Wed, 25 Apr 2018 09:14:44 +0200 From: Daniel Vetter To: Andrey Grodzovsky Cc: "Eric W. Biederman" , David.Panariti@amd.com, Michel =?iso-8859-1?Q?D=E4nzer?= , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, oleg@redhat.com, amd-gfx@lists.freedesktop.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org, Christian.Koenig@amd.com Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. Message-ID: <20180425071444.GM25142@phenom.ffwll.local> Mail-Followup-To: Andrey Grodzovsky , "Eric W. Biederman" , David.Panariti@amd.com, Michel =?iso-8859-1?Q?D=E4nzer?= , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, oleg@redhat.com, amd-gfx@lists.freedesktop.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org, Christian.Koenig@amd.com References: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com> <1524583836-12130-3-git-send-email-andrey.grodzovsky@amd.com> <7313704c-0693-0bb9-8818-99cd2b7c0ca0@daenzer.net> <20180424194418.GE25142@phenom.ffwll.local> <87tvs05mik.fsf@xmission.com> <27d7d15b-f7c3-2a0a-af85-eb243526ac88@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <27d7d15b-f7c3-2a0a-af85-eb243526ac88@amd.com> X-Operating-System: Linux phenom 4.15.0-1-amd64 User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 24, 2018 at 05:37:08PM -0400, Andrey Grodzovsky wrote: > > > On 04/24/2018 05:21 PM, Eric W. Biederman wrote: > > Andrey Grodzovsky writes: > > > > > On 04/24/2018 03:44 PM, Daniel Vetter wrote: > > > > On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel D?nzer wrote: > > > > > Adding the dri-devel list, since this is driver independent code. > > > > > > > > > > > > > > > On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote: > > > > > > Avoid calling wait_event_killable when you are possibly being called > > > > > > from get_signal routine since in that case you end up in a deadlock > > > > > > where you are alreay blocked in singla processing any trying to wait > > > > > Multiple typos here, "[...] already blocked in signal processing and [...]"? > > > > > > > > > > > > > > > > on a new signal. > > > > > > > > > > > > Signed-off-by: Andrey Grodzovsky > > > > > > --- > > > > > > drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- > > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > > > > > index 088ff2b..09fd258 100644 > > > > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c > > > > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > > > > > @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, > > > > > > return; > > > > > > /** > > > > > > * The client will not queue more IBs during this fini, consume existing > > > > > > - * queued IBs or discard them on SIGKILL > > > > > > + * queued IBs or discard them when in death signal state since > > > > > > + * wait_event_killable can't receive signals in that state. > > > > > > */ > > > > > > - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) > > > > > > + if (current->flags & PF_SIGNALED) > > > > You want fatal_signal_pending() here, instead of inventing your own broken > > > > version. > > > I rely on current->flags & PF_SIGNALED because this being set from > > > within get_signal, > > It doesn't mean that. Unless you are called by do_coredump (you > > aren't). > > Looking in latest code here > https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449 > i see that current->flags |= PF_SIGNALED; is out side of > if (sig_kernel_coredump(signr)) {...} scope Ok I read some more about this, and I guess you go through process exit and then eventually close. But I'm not sure. The code in drm_sched_entity_fini also looks strange: You unpark the scheduler thread before you remove all the IBs. At least from the comment that doesn't sound like what you want to do. But in general, PF_SIGNALED is really something deeply internal to the core (used for some book-keeping and accounting). The drm scheduler is the only thing looking at it, so smells like a layering violation. I suspect (but without knowing what you're actually trying to achive here can't be sure) you want to look at something else. E.g. PF_EXITING seems to be used in a lot more places to cancel stuff that's no longer relevant when a task exits, not PF_SIGNALED. There's the TIF_MEMDIE flag if you're hacking around issues with the oom-killer. This here on the other hand looks really fragile, and probably only does what you want to do by accident. -Daniel > > Andrey > > > The closing of files does not happen in do_coredump. > > Which means you are being called from do_exit. > > In fact you are being called after exit_files which closes > > the files. The actual __fput processing happens in task_work_run. > > > > > meaning I am within signal processing? in which case I want to avoid > > > any signal based wait for that task, > > > From what i see in the code, task_struct.pending.signal is being set > > > for other threads in same > > > group (zap_other_threads) or for other scenarios, those task are still > > > able to receive signals > > > so calling wait_event_killable there will not have problem. > > Excpet that you are geing called after from do_exit and after exit_files > > which is after exit_signal. Which means that PF_EXITING has been set. > > Which implies that the kernel signal handling machinery has already > > started being torn down. > > > > Not as much as I would like to happen at that point as we are still > > left with some old CLONE_PTHREAD messes in the code that need to be > > cleaned up. > > > > Still given the fact you are task_work_run it is quite possible even > > release_task has been run on that task before the f_op->release method > > is called. So you simply can not count on signals working. > > > > Which in practice leaves a timeout for ending your wait. That code can > > legitimately be in a context that is neither interruptible nor killable. > > > > > > > > entity->fini_status = -ERESTARTSYS; > > > > > > else > > > > > > entity->fini_status = wait_event_killable(sched->job_scheduled, > > > > But really this smells like a bug in wait_event_killable, since > > > > wait_event_interruptible does not suffer from the same bug. It will return > > > > immediately when there's a signal pending. > > > Even when wait_event_interruptible is called as following - > > > ...->do_signal->get_signal->....->wait_event_interruptible ? > > > I haven't tried it but wait_event_interruptible is very much alike to > > > wait_event_killable so I would assume it will also > > > not be interrupted if called like that. (Will give it a try just out > > > of curiosity anyway) > > As PF_EXITING is set want_signal should fail and the signal state of the > > task should not be updatable by signals. > > > > Eric > > > > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch