Received: by 10.192.165.148 with SMTP id m20csp4851407imm; Tue, 24 Apr 2018 09:21:56 -0700 (PDT) X-Google-Smtp-Source: AIpwx48PrHKaX1Q0AFWO/aPX2GZddG5GE6gyE8hUA5wYA1JwJfUY3pxJ2UBdhOXJV1cavVE1xgOL X-Received: by 2002:a17:902:8692:: with SMTP id g18-v6mr14167449plo.152.1524586916903; Tue, 24 Apr 2018 09:21:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524586916; cv=none; d=google.com; s=arc-20160816; b=OZKQgRfuEOYo4ckwZ3w2tTQKU9Wz1KLdNaHfu5g7lZVMI1xWsSO9jVfghlQMTINYPd KRR3ZcW6XSRxw4jdZaPzgYmCtmrKgqm3M5ZQUzPWqEwoRGWaw0wigGmH9q9rKtSWsyY0 69iGA+BSlgjTTqentqCbC1T+68FoWhIXreY4fRkJYb+QGt6OMueYvpKyPZFQdpeTlyzs RS2lzOXRI3mRL/hD/0qvTtm+fCCF4l9RQb6HR0pdQEQzaBck8Aq1ldE/b7TmQBPkb9v2 4HMl+faADwTKYzpdnVjgK+/yb1LeiQ/+bg3k7cP/uxwzEagBnrH9NswfI6wk0FPEAWh4 LXTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=wLWTm3bwGFmZfUmJ2xAKpzQv7eE7vM7/PT7++O29cug=; b=G0PzXS/dWNPIr1tCZLu8ymZqeo6DNUV0Hr/9KNZMeNT2dr15+WhPB5x16z7prArCae mM4ZKfUkPYlTEFuFk9ZjdUQc5N/KFDQsdyF5hhaOrMZ1/1gSv5kXbkdS9dzVx06iyYwH tMhSyTyntBZ8AnTP2hkfQis4FQzfzsOaL33BtZot9RPII196e0OD72dyUQ/LzvJUCKJf 0CD4v8cguYN6SJAzr9+c/0KO1KXUJPmFP28Tx6Itlsxf0ITH0LpotCQtH0m+riDvMJDV C5DWn7PrxQt9YhlO7xLIYglTIucCHt6DTGhR3f4pbJ09idJpWZItuwCm6Tks4/ot1C0W nUow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amdcloud.onmicrosoft.com header.s=selector1-amd-com header.b=ZU5qKsfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i11si11777948pgt.19.2018.04.24.09.21.41; Tue, 24 Apr 2018 09:21:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amdcloud.onmicrosoft.com header.s=selector1-amd-com header.b=ZU5qKsfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752582AbeDXQU0 (ORCPT + 99 others); Tue, 24 Apr 2018 12:20:26 -0400 Received: from mail-cys01nam02on0075.outbound.protection.outlook.com ([104.47.37.75]:19091 "EHLO NAM02-CY1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752473AbeDXQUU (ORCPT ); Tue, 24 Apr 2018 12:20:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector1-amd-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=wLWTm3bwGFmZfUmJ2xAKpzQv7eE7vM7/PT7++O29cug=; b=ZU5qKsfPfgsdxrNzKTkMbnQYwIlb8bBYsS3enhhSh9A8Tr13nDlifrOD9RVFlsj86r5cfg7daS8Gu5fsz8bJhj8Qw0eJ2dIK0+YiD5gXQYb78/A1SrTjZmC356b+pYdXU54HsiLSq3PPstsDqg+axTSLxXDqjiftosT1zGRs4UU= Received: from DM5PR12MB2440.namprd12.prod.outlook.com (52.132.141.33) by DM5PR12MB1324.namprd12.prod.outlook.com (10.168.238.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.696.13; Tue, 24 Apr 2018 16:20:17 +0000 Received: from DM5PR12MB2440.namprd12.prod.outlook.com ([fe80::60:6669:42e6:f409]) by DM5PR12MB2440.namprd12.prod.outlook.com ([fe80::60:6669:42e6:f409%13]) with mapi id 15.20.0696.019; Tue, 24 Apr 2018 16:20:16 +0000 From: "Panariti, David" To: "Grodzovsky, Andrey" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" CC: "Deucher, Alexander" , "Koenig, Christian" , "oleg@redhat.com" , "akpm@linux-foundation.org" , "ebiederm@xmission.com" Subject: Re: [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang. Thread-Topic: [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang. Thread-Index: AQHT2+GBbg9vSzZK6E2CBnt82gNLC6QQD8FQgAACnICAAALF7Q== Date: Tue, 24 Apr 2018 16:20:16 +0000 Message-ID: References: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com> <1524583836-12130-4-git-send-email-andrey.grodzovsky@amd.com> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=David.Panariti@amd.com; x-originating-ip: [65.223.155.30] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5PR12MB1324;7:R4r8djNh/QcE5oP6hNDpCNqCT/sKzwDU4BSnhQabmaEBN9NrAilQ1Z4/ZQO3bZYr/7MRStJC0/kpWLikaBznx36wogBv3x5cZYxE+hVSxL6aSryUWJ099Cg/lMNt3+OeXW3evKx5+TwyksGGUSMrWnUxImgKrOUsnISJ01VZpE19OFoXEjAOxsXN2Xzy4CzTKEUpFqWGwWiJknvYWaUEw93q8E0ShMq1Vpimc4XqYK07J0RinGEocGMKbRUalO56;20:jj9c1J+IWzKIlUADn8CDcrpwmIr2Y+yWi9hTQ6C0cb5KwLpy02jyzqgurMwoq67jlQMhCyo9jJvmsFkFayQziLaEpEaU19L+e0lUf7o1XXviSWj1ByQq4Cm571N3SxJn7lBqnOYRCI73VPkmPKPT3zlRZC51gU6NihK6ZBHKdBsUR/hxmR18kagYJLVVnHhlY+a+XYyMUgm09ZKiU3WpsgxeqTcn1u9u0qG5Vb3n7HHAX01iull6KyN4Vx5JfIIt x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-forefront-antispam-report: SFV:SKI;SCL:-1;SFV:NSPM;SFS:(10009020)(346002)(396003)(39380400002)(366004)(39860400002)(376002)(189003)(199004)(13464003)(99286004)(2201001)(486006)(110136005)(25786009)(54906003)(105586002)(229853002)(3660700001)(5660300001)(2900100001)(316002)(86362001)(68736007)(446003)(74316002)(7736002)(305945005)(4326008)(6246003)(7696005)(8936002)(33656002)(6116002)(3846002)(11346002)(26005)(76176011)(476003)(53546011)(59450400001)(6506007)(93886005)(55016002)(102836004)(53936002)(9686003)(5250100002)(97736004)(8676002)(6436002)(81156014)(72206003)(106356001)(66066001)(3280700002)(2501003)(81166006)(2906002)(478600001)(14454004)(21314002);DIR:OUT;SFP:1101;SCL:1;SRVR:DM5PR12MB1324;H:DM5PR12MB2440.namprd12.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(2017052603328)(7153060)(7193020);SRVR:DM5PR12MB1324; x-ms-traffictypediagnostic: DM5PR12MB1324: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055)(767451399110)(217544274631240); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(3231232)(944501410)(52105095)(6055026)(6041310)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123564045)(20161123558120)(6072148)(201708071742011);SRVR:DM5PR12MB1324;BCL:0;PCL:0;RULEID:;SRVR:DM5PR12MB1324; x-forefront-prvs: 0652EA5565 received-spf: None (protection.outlook.com: amd.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: AwxxkIzVVwIeb1ft3KVqfcL0E1wHux/pndxIgwBgQIrVcROUAW+pFicpe9eHffxBM31ql9HsR4S+2rIVY+554PwJHEGerb1iX0rSZTyWpPzhsHdTZ05Xghnp92gUVftpg6Fd0uY3eWaSKxs+O+3lmBAOjXiqtfr59V49MQ3zwE3I4kJ/SmjAE1QgkoHnrIn5 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 8b22c993-1709-453f-bcc2-08d5a9ff4431 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8b22c993-1709-453f-bcc2-08d5a9ff4431 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Apr 2018 16:20:16.7251 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB1324 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Kind of dma_fence_wait_killable, except that we don't have such API > (maybe worth adding ?) Depends on how many places it would be called, or think it might be called.= Can always factor on the 2nd time it's needed. Factoring, IMO, rarely hurts. The factored function can easily be visited = using `M-.' ;-> Also, if the wait could be very long, would a log message, something like "= xxx has run for Y seconds." help? I personally hate hanging w/no info. regards, davep ________________________________________ From: Grodzovsky, Andrey Sent: Tuesday, April 24, 2018 11:58:19 AM To: Panariti, David; linux-kernel@vger.kernel.org; amd-gfx@lists.freedeskto= p.org Cc: Deucher, Alexander; Koenig, Christian; oleg@redhat.com; akpm@linux-foun= dation.org; ebiederm@xmission.com Subject: Re: [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover = from ring hang. On 04/24/2018 11:52 AM, Panariti, David wrote: > Hi, > > It looks like there can be an infinite loop if neither of the if()'s beco= me true. > Is that an impossible condition? That intended, we want to wait until either the fence signals or fatal signal received, we don't want to terminate the wait if fence is not signaled even when interrupted by non fatal signal. Kind of dma_fence_wait_killable, except that we don't have such API (maybe worth adding ?) Andrey > > -----Original Message----- > From: Andrey Grodzovsky > Sent: Tuesday, April 24, 2018 11:31 AM > To: linux-kernel@vger.kernel.org; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian ; Panariti, David ; oleg@red= hat.com; akpm@linux-foundation.org; ebiederm@xmission.com; Grodzovsky, Andr= ey > Subject: [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover fr= om ring hang. > > If the ring is hanging for some reason allow to recover the waiting by se= nding fatal signal. > > Originally-by: David Panariti > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/am= d/amdgpu/amdgpu_ctx.c > index eb80edf..37a36af 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > @@ -421,10 +421,16 @@ int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *c= tx, unsigned ring_id) > > if (other) { > signed long r; > - r =3D dma_fence_wait_timeout(other, false, MAX_SCHEDULE_TIM= EOUT); > - if (r < 0) { > - DRM_ERROR("Error (%ld) waiting for fence!\n", r); > - return r; > + > + while (true) { > + if ((r =3D dma_fence_wait_timeout(other, true, > + MAX_SCHEDULE_TIMEOUT)) >=3D 0) > + return 0; > + > + if (fatal_signal_pending(current)) { > + DRM_ERROR("Error (%ld) waiting for fence!\n= ", r); > + return r; > + } > } > } > > -- > 2.7.4 >