Received: by 10.192.165.148 with SMTP id m20csp3912533imm; Mon, 30 Apr 2018 08:27:40 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqctqlqFb1+EF+1g2vHWLHlt2mrLWy2F95tLXjpGHzqRmiOPAB5KWLyE0XGcUO8aUF35c1q X-Received: by 2002:a17:902:57d8:: with SMTP id g24-v6mr13026899plj.337.1525102060890; Mon, 30 Apr 2018 08:27:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525102060; cv=none; d=google.com; s=arc-20160816; b=AoduavMEX6A6Ik6JjFFwJVYyffttOslvJBcWlsfNWJbcGIEychOrBb6mQtoT7Oo+N5 9NOTJdkDNadtjkrspx7tPaVtoanyJZLII4Dsk3SOdpxpdJaVHTL55czNd40KdBm4S8jD BtDX5bxSfk+acpZZcnX+WP9kEXL+I8g7T01QC5vqWVfFM0SN8cpIvt2aL7Lyd9K80ltN Vv8YOsIN+b5HuxLkGoaphH8YTzsMBbFXL6hGG6YkjszGuBznS1I21swmPp7RD1NrNnzi YlKpMAURTyNm1eLE78lGYGoJvN2RE5QWD2bpS7UhBeN9T8k6n96PJDG3Gx8FIZYOPCrk f3nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:reply-to:dkim-signature :arc-authentication-results; bh=kX/l2OgHjFHTclxIcjzhoPKYd8Xn4GdHyjukwtxaTLU=; b=kxFSZSv3NFvPipvqkJJGpzuTIUMDkELwlLpoEqZ6wjBgHiu5GHkkWWm2TB6bHDNgi1 LF1hP7mYD+sTvVaP5WvJMliu1PiJ5Kqm7c/hf0C24A0kfAdT4LvGWcZmm4TOGVTmMpLz WVtbmF0padoLjGBnMWzajV1hF6vnNz4B3g8PEl4Q0m7Ye2yafZzvB2ZrmB95Yzzgft5N tBqM0sDdjQxbNWIU83rYzGAX532KuogZjpg1R4p5kv+HWBgl54zI3UJTyIeGIm0ih05x gpaffBoAq1S4AC6RxpgAM1CAzJO57K4ogNajK8AagVoPxrDvUZPV0mEpOWIU0SHLIVL3 dbIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LVNn2Tdl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e9-v6si6290911pgr.477.2018.04.30.08.27.26; Mon, 30 Apr 2018 08:27:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LVNn2Tdl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754534AbeD3PZj (ORCPT + 99 others); Mon, 30 Apr 2018 11:25:39 -0400 Received: from mail-wr0-f182.google.com ([209.85.128.182]:34186 "EHLO mail-wr0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753858AbeD3PZi (ORCPT ); Mon, 30 Apr 2018 11:25:38 -0400 Received: by mail-wr0-f182.google.com with SMTP id p18-v6so8425155wrm.1 for ; Mon, 30 Apr 2018 08:25:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=reply-to:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=kX/l2OgHjFHTclxIcjzhoPKYd8Xn4GdHyjukwtxaTLU=; b=LVNn2TdlMoNwDcg2b5fGmMm1aBFtBfh+rJA19fGJDNlUeZhCFuZBACvupPQGYS6UtF /z32v3eMi58qjUaqgiQLwhZfUqM0QMHqyz/7MziEVeqjMW+aDnmdpqupHi/gHlp0qTTH QAxHPfpnlf8QXWuRar4xyTeOOHF/PoC8J4U6Co7Er7YNz6wSNuJdbAOvxh6gyk5FPFxZ wek3dBuXLz9rmf21CXUQpJLM8s90+cUOOnUsWSioLdPNQ7AHZZD1D/4hEeaBjSuuUFHy mHS7QE8G1AK3c7eU+xO8EuekwwyJsMdGSJNoRfT4tdXinQELS+stKE++g2uWzFyZ+yI4 r1WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:cc:references:from :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=kX/l2OgHjFHTclxIcjzhoPKYd8Xn4GdHyjukwtxaTLU=; b=OwwihofVyrQQFaqjECtf/8YNyU4laJ8jC19+LYeZhwBPG8h7E7OVI8Wa9FGJ03EYc1 L1ksAAWZM7ajKytnPTCUtg/TA7Ih6uqYMWXn1S+/jd4jv/7IPof7pihfeanSdJoC3IWD lmyInWKATGR0JRNTayLIuVyS5xuCs4sR8487kCtOjR++sDROkpnEbdt1ukFf5pCTmfYL w8JChOgikIl888iA1K2jeeaeVdNCilzAmHryFKPmwRlpkFmsTCRg/NoIegpNv6iyMhgr 2dCzb7nkX08j3OPHZo1TbvSxnCggvsYHslPyf4P3bEEqN5vIf+dxOmKvvqm2KJ/dsSum eQ/g== X-Gm-Message-State: ALQs6tBiSNnXyvsdvM793VvGNDnkCxe1J/KN1KPooXnqanAGi2HTlC3D 61IsreBoyDT5HWTsOZnzyTVqnGfP X-Received: by 2002:adf:c5cb:: with SMTP id v11-v6mr8904877wrg.151.1525101937021; Mon, 30 Apr 2018 08:25:37 -0700 (PDT) Received: from ?IPv6:2a02:908:1257:4460:1ab8:55c1:a639:6740? ([2a02:908:1257:4460:1ab8:55c1:a639:6740]) by smtp.gmail.com with ESMTPSA id z5-v6sm8753982wrm.61.2018.04.30.08.25.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 08:25:36 -0700 (PDT) Reply-To: christian.koenig@amd.com Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. To: Andrey Grodzovsky , christian.koenig@amd.com, "Eric W. Biederman" Cc: David.Panariti@amd.com, Oleg Nesterov , amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org References: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com> <1524583836-12130-3-git-send-email-andrey.grodzovsky@amd.com> <87muxsbmkp.fsf@xmission.com> <8840ac96-50c4-f94d-eb7c-f007940163f3@amd.com> <877eowa5qh.fsf@xmission.com> <20180425135552.GD7592@redhat.com> <20180425171757.GA10441@redhat.com> <874ljyu98e.fsf@xmission.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <407d81f4-54d4-64f4-c1a4-0095aa90bfb6@gmail.com> Date: Mon, 30 Apr 2018 17:25:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 30.04.2018 um 16:32 schrieb Andrey Grodzovsky: > > > On 04/30/2018 08:08 AM, Christian König wrote: >> Hi Eric, >> >> sorry for the late response, was on vacation last week. >> >> Am 26.04.2018 um 02:01 schrieb Eric W. Biederman: >>> Andrey Grodzovsky writes: >>> >>>> On 04/25/2018 01:17 PM, Oleg Nesterov wrote: >>>>> On 04/25, Andrey Grodzovsky wrote: >>>>>> here (drm_sched_entity_fini) is also a bad idea, but we still >>>>>> want to be >>>>>> able to exit immediately >>>>>> and not wait for GPU jobs completion when the reason for reaching >>>>>> this code >>>>>> is because of KILL >>>>>> signal to the user process who opened the device file. >>>>> Can you hook f_op->flush method? >> >> THANKS! That sounds like a really good idea to me and we haven't >> investigated into that direction yet. >> >>>> But this one is called for each task releasing a reference to the >>>> the file, so >>>> not sure I see how this solves the problem. >>> The big question is why do you need to wait during the final closing a >>> file? >> >> As always it's because of historical reasons. Initially user space >> pushed commands directly to a hardware queue and when a processes >> finished we didn't need to wait for anything. >> >> Then the GPU scheduler was introduced which delayed pushing the jobs >> to the hardware queue to a later point in time. >> >> This wait was then added to maintain backward compability and not >> break userspace (but see below). >> >>> The wait can be terminated so the wait does not appear to be simply a >>> matter of correctness. >> >> Well when the process is killed we don't care about correctness any >> more, we just want to get rid of it as quickly as possible (OOM >> situation etc...). >> >> But it is perfectly possible that a process submits some render >> commands and then calls exit() or terminates because of a SIGTERM, >> SIGINT etc.. In this case we need to wait here to make sure that all >> rendering is pushed to the hardware because the scheduler might need >> resources/settings from the file descriptor. >> >> For example if you just remove that wait you could close firefox and >> get garbage on the screen for a millisecond because the remaining >> rendering commands where not executed. >> >> So what we essentially need is to distinct between a SIGKILL (which >> means stop processing as soon as possible) and any other reason >> because then we don't want to annoy the user with garbage on the >> screen (even if it's just for a few milliseconds). >> >> Constructive ideas how to handle this would be very welcome, cause I >> completely agree that what we have at the moment by checking >> PF_SIGNAL is just a very very hacky workaround. > > What about changing PF_SIGNALED to  PF_EXITING in > drm_sched_entity_do_release > > -       if ((current->flags & PF_SIGNALED) && current->exit_code == > SIGKILL) > +      if ((current->flags & PF_EXITING) && current->exit_code == > SIGKILL) > > From looking into do_exit and it's callers , current->exit_code will > get assign the signal which was delivered to the task. If SIGINT was > sent then it's SIGINT, if SIGKILL then SIGKILL. That's at least a band aid to stop us from abusing PF_SIGNALED. But additional to that change, can you investigate when f_ops->flush() is called when the process exists normally, because of SIGKILL or because of some other signal? Could be that this is more closely to what we are searching for, Christian. > > Andrey > > >> >> Thanks, >> Christian. > > >> >>> >>> Eric >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx