Received: by 10.192.165.148 with SMTP id m20csp5008155imm; Tue, 1 May 2018 07:37:21 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr9VytUEdHmChCEBe8wazDjswWZr+e6fxTsKHWwsX90FrEErvI9KVWgf9OktncBES1/jHY3 X-Received: by 10.98.110.136 with SMTP id j130mr15763118pfc.111.1525185441514; Tue, 01 May 2018 07:37:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525185441; cv=none; d=google.com; s=arc-20160816; b=i8QjTYrDML1fwFJ9/CPVJ8pxbSBQqziMq2iljplJNIZ5JEcTuqVQT6Exf4PQIpPDcB 9+2DxIqtkDmGrjpamOMXqAkRNT8en2Rg33Y26rNsLJL9He9xnSQyrfXaXDuJlmr8HH39 tJnAMQEDdJAiGuZriC2f5itv+UrP/+L+iQbBBxNb0jSramUlVW7IAoBlC1wguyvQMyPT i6fokoY8LsnLVSfh6u0K3cyBe7X/b+34k03bmbDwpGtTl+BfFQtJHIQi7U5qx7/czMAu FhEQvTVswKDMwh0CNrROeLBtHZGlYiwbtP9Ofg5jMZWSOUEriDHNoLOY1ee0N1tM0fWH 9Zsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=fxzM7d5j+zRSkoSr8fQ/uAi08KT9kF+I+M7nrhIfhsU=; b=wZlLDmRmBH0QC9fv9j0irlISyQ1jHiyMAFRDhqVuQ9hITIM698zqhAZitVR0M4bR3V vBGCoLY17rANbBiihKF7GVOwFr3gZUYldSltlPgEwOTjqBjDESBLjOlYPyJXGko6jMsX 6w3M5/qkGcapgnFa9wxd9C6tvCFLffYedIVjBkDQGrCAmNNYCQJA8bH6BNRTabKVB3Ox OYtr8z5vyyBBKYh5q9blYTbvpBAc2bDh8HV4JPF6nfpC9soZE/lyV7Ao4uhG62Ju5SeT WyHQO7C/4A77fCYXrC0UI5smdLKwnkiTkoA+HKyYYGPas1GWYSnt3RVc7S/fuN2DS3Gh Kb2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p23-v6si7753447pgv.153.2018.05.01.07.37.07; Tue, 01 May 2018 07:37:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756001AbeEAOf2 (ORCPT + 99 others); Tue, 1 May 2018 10:35:28 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36840 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755453AbeEAOf1 (ORCPT ); Tue, 1 May 2018 10:35:27 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2E1A54270965; Tue, 1 May 2018 14:35:27 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.34.27.30]) by smtp.corp.redhat.com (Postfix) with SMTP id F2A5B7C39; Tue, 1 May 2018 14:35:24 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Tue, 1 May 2018 16:35:26 +0200 (CEST) Date: Tue, 1 May 2018 16:35:24 +0200 From: Oleg Nesterov To: Andrey Grodzovsky Cc: christian.koenig@amd.com, "Eric W. Biederman" , David.Panariti@amd.com, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. Message-ID: <20180501143524.GA13017@redhat.com> References: <877eowa5qh.fsf@xmission.com> <20180425135552.GD7592@redhat.com> <20180425171757.GA10441@redhat.com> <874ljyu98e.fsf@xmission.com> <20180430160006.GB10583@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 01 May 2018 14:35:27 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 01 May 2018 14:35:27 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'oleg@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/30, Andrey Grodzovsky wrote: > > On 04/30/2018 12:00 PM, Oleg Nesterov wrote: > >On 04/30, Andrey Grodzovsky wrote: > >>What about changing PF_SIGNALED to? PF_EXITING in > >>drm_sched_entity_do_release > >> > >>-?????? if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) > >>+????? if ((current->flags & PF_EXITING) && current->exit_code == SIGKILL) > >let me repeat, please don't use task->exit_code. And in fact this check is racy > > > >But this doesn't matter. Say, we can trivially add SIGNAL_GROUP_KILLED_BY_SIGKILL, > >or do something else, > > Can you explain where is the race and what is a possible alternative then ? Oh. I mentioned this race automatically, because I am pedant ;) Let me repeat that this doesn't really matter, and let me remind that the caller of fop->release can be completely unrelated process, say $cat /proc/pid/fdinfo. And in any case ->exit_code should not be used outside of ptrace/exit paths. OK, the race. Consider a process P with a main thread M and a sub-thread T. T does pthread_exit(), enters do_exit() and gets a preemption before exit_files(). The process is killed by SIGKILL. M calls do_group_exit(), do_exit() and passes exit_files(). However, it doesn't call close_files() because T has another reference. T resumes, calls close_files(), fput(), etc, and then exit_task_work(), so it can finally call ->release() with current->exit_code == 0 desptite the fact the process was killed. Again, again, this doesn't matter. We can distinguish killed-or-not, by SIGKILL- or-not. But I still do not think we actually need this. At least in ->release() paths, ->flush() may differ. Oleg.