Received: by 10.192.165.148 with SMTP id m20csp3971355imm; Mon, 30 Apr 2018 09:26:22 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpBIbZJ5i+p7qhskZ3jOjCqwOMTB7LeWOR6AD6MRgGdcxRYgsUxYqeZefaKmFD4tHgyWT1b X-Received: by 2002:a17:902:b492:: with SMTP id y18-v6mr12982335plr.2.1525105582721; Mon, 30 Apr 2018 09:26:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105582; cv=none; d=google.com; s=arc-20160816; b=XTpytCIBbeAExd2XyATr3D4bq3gHGaV/UvLC2rdrhmmdQLSVu+x7bd/npg+fcOKRcy 9QLb8QnWt4/Y6dJtu8gMD+8SYTll/JB+UmrkxLdE8taPPPk60UxE8M7DucqjE5A4TJnk Ps7jPpV/m3hIx8gEG/u4tublAWcxPPxofpk9m26zMw9NwdhC2mgvbXaZpZKCVL13iTAG 48fwCk8lJYexpRMDFup2wvtfG0oF8Av1fybcLHkN26Vlboab/WBB6JkCyM6F0A2umMhG 95rlCw0txL5ebZWuQj6veEgvF46Bwtq6NUtAOHdXmg3TnF/erxqi9djyB2tgIpmHDG9L JIpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :mime-version:user-agent:message-id:in-reply-to:date:references:cc :to:from:arc-authentication-results; bh=agsq6JHAqw7eAxfxWhVVgIvbmQh11aN0GN299Oe2ZyA=; b=HPSkmVhfL/W4O52Ipm3o5hXt06NFKDEyxcaSBbQhfRhjO6VIQCInDj+izRW1EqiDND jhCuIj4AnjTYSci/WueIp+varcY/r8EeTRBblMhszA9E6KcyDV7RBi8aqhDFcHp7CED5 xNyJ1JPfsLhOxBSn/KUXKXey1kD5UT6bw+1mVQYFvOGpErGcIdHmuobwkviXe42Zcwrr 55DVJKKa11gjshYUzyH9ovJ99LZ+rYTqnUkvF4QZGTMoz5LMJdBWGg4HGs13kQNnnkvw DRQQ7vgN050tmffaO6hGBfpVlsuO0/tSiTQqa7H0dLajsMBSP25tB4knsZYwgTv0ePPR 9sgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si7808362pfk.166.2018.04.30.09.26.08; Mon, 30 Apr 2018 09:26:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754790AbeD3QZ4 convert rfc822-to-8bit (ORCPT + 99 others); Mon, 30 Apr 2018 12:25:56 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:55723 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754567AbeD3QZy (ORCPT ); Mon, 30 Apr 2018 12:25:54 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fDBce-0002La-UO; Mon, 30 Apr 2018 10:25:52 -0600 Received: from [97.119.174.25] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fDBcd-0003Q6-In; Mon, 30 Apr 2018 10:25:52 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Christian =?utf-8?Q?K=C3=B6nig?= Cc: Andrey Grodzovsky , christian.koenig@amd.com, David.Panariti@amd.com, Oleg Nesterov , amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org References: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com> <1524583836-12130-3-git-send-email-andrey.grodzovsky@amd.com> <87muxsbmkp.fsf@xmission.com> <8840ac96-50c4-f94d-eb7c-f007940163f3@amd.com> <877eowa5qh.fsf@xmission.com> <20180425135552.GD7592@redhat.com> <20180425171757.GA10441@redhat.com> <874ljyu98e.fsf@xmission.com> Date: Mon, 30 Apr 2018 11:25:47 -0500 In-Reply-To: ("Christian \=\?utf-8\?Q\?K\=C3\=B6nig\=22's\?\= message of "Mon, 30 Apr 2018 14:08:44 +0200") Message-ID: <87k1so8xv8.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1fDBcd-0003Q6-In;;;mid=<87k1so8xv8.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.174.25;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+3BheJ6dslz9zIyf2JrHqg8aNAx0wZUSo= X-SA-Exim-Connect-IP: 97.119.174.25 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa06.xmission.com X-Spam-Level: ** X-Spam-Status: No, score=2.0 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,T_TooManySym_01,T_TooManySym_02, T_TooManySym_03,XMNoVowels,XMSubLong autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_03 6+ unique symbols in subject * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: =?ISO-8859-1?Q?**;Christian K=c3=b6nig ?= X-Spam-Relay-Country: X-Spam-Timing: total 618 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 2.5 (0.4%), b_tie_ro: 1.74 (0.3%), parse: 0.92 (0.1%), extract_message_metadata: 11 (1.8%), get_uri_detail_list: 2.3 (0.4%), tests_pri_-1000: 4.6 (0.8%), tests_pri_-950: 1.16 (0.2%), tests_pri_-900: 1.00 (0.2%), tests_pri_-400: 27 (4.3%), check_bayes: 26 (4.2%), b_tokenize: 9 (1.4%), b_tok_get_all: 9 (1.5%), b_comp_prob: 3.0 (0.5%), b_tok_touch_all: 2.9 (0.5%), b_finish: 0.55 (0.1%), tests_pri_0: 256 (41.3%), check_dkim_signature: 0.52 (0.1%), check_dkim_adsp: 2.5 (0.4%), tests_pri_500: 312 (50.4%), poll_dns_idle: 305 (49.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christian König writes: > Hi Eric, > > sorry for the late response, was on vacation last week. > > Am 26.04.2018 um 02:01 schrieb Eric W. Biederman: >> Andrey Grodzovsky writes: >> >>> On 04/25/2018 01:17 PM, Oleg Nesterov wrote: >>>> On 04/25, Andrey Grodzovsky wrote: >>>>> here (drm_sched_entity_fini) is also a bad idea, but we still want to be >>>>> able to exit immediately >>>>> and not wait for GPU jobs completion when the reason for reaching this code >>>>> is because of KILL >>>>> signal to the user process who opened the device file. >>>> Can you hook f_op->flush method? > > THANKS! That sounds like a really good idea to me and we haven't investigated > into that direction yet. For the backwards compatibility concerns you cite below the flush method seems a much better place to introduce the wait. You at least really will be in a process context for that. Still might be in exit but at least you will be legitimately be in a process. >>> But this one is called for each task releasing a reference to the the file, so >>> not sure I see how this solves the problem. >> The big question is why do you need to wait during the final closing a >> file? > > As always it's because of historical reasons. Initially user space pushed > commands directly to a hardware queue and when a processes finished we didn't > need to wait for anything. > > Then the GPU scheduler was introduced which delayed pushing the jobs to the > hardware queue to a later point in time. > > This wait was then added to maintain backward compability and not break > userspace (but see below). That make sense. >> The wait can be terminated so the wait does not appear to be simply a >> matter of correctness. > > Well when the process is killed we don't care about correctness any more, we > just want to get rid of it as quickly as possible (OOM situation etc...). > > But it is perfectly possible that a process submits some render commands and > then calls exit() or terminates because of a SIGTERM, SIGINT etc.. In this case > we need to wait here to make sure that all rendering is pushed to the hardware > because the scheduler might need resources/settings from the file > descriptor. > > For example if you just remove that wait you could close firefox and get garbage > on the screen for a millisecond because the remaining rendering commands where > not executed. > > So what we essentially need is to distinct between a SIGKILL (which means stop > processing as soon as possible) and any other reason because then we don't want > to annoy the user with garbage on the screen (even if it's just for a few > milliseconds). I see a couple of issues. - Running the code in release rather than in flush. Using flush will catch every close so it should be more backwards compatible. f_op->flush always runs in process context so looking at current makes sense. - Distinguishing between death by SIGKILL and other process exit deaths. In f_op->flush the code can test "((tsk->flags & PF_EXITING) && (tsk->code == SIGKILL))" to see if it was SIGKILL that terminated the process. - Dealing with stuck queues (where this patchset came in). For stuck queues you are going to need a timeout instead of the current indefinite wait after PF_EXITING is set. From what you have described a few milliseconds should be enough. If PF_EXITING is not set you can still just make the wait killable and skip the timeout if that will give a better backwards compatible user experience. What can't be done is try and catch SIGKILL after a process has called do_exit. A dead process is a dead process. Eric