Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752298AbeAJWos (ORCPT + 1 other); Wed, 10 Jan 2018 17:44:48 -0500 Received: from gateway23.websitewelcome.com ([192.185.50.250]:37159 "EHLO gateway23.websitewelcome.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751956AbeAJWor (ORCPT ); Wed, 10 Jan 2018 17:44:47 -0500 X-Greylist: delayed 1500 seconds by postgrey-1.27 at vger.kernel.org; Wed, 10 Jan 2018 17:44:47 EST Date: Wed, 10 Jan 2018 15:58:49 -0600 Message-ID: <20180110155849.Horde.DDGbi3ysasL2eHmvZ4k8adb@gator4166.hostgator.com> From: "Gustavo A. R. Silva" To: Felix Kuehling Cc: Oded Gabbay , Alex Deucher , Christian =?utf-8?b?S8O2bmln?= , David Airlie , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences References: <20180110165008.GA10691@embeddedor.com> In-Reply-To: User-Agent: Horde Application Framework 5 Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4166.hostgator.com X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - embeddedor.com X-BWhitelist: no X-Source-IP: 108.167.133.22 X-Source-L: Yes X-Exim-ID: 1eZOOX-003RKW-Qe X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: gator4166.hostgator.com [108.167.133.22]:39483 X-Source-Auth: garsilva@embeddedor.com X-Email-Count: 1 X-Source-Cap: Z3V6aWRpbmU7Z3V6aWRpbmU7Z2F0b3I0MTY2Lmhvc3RnYXRvci5jb20= X-Local-Domain: yes Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Felix, Quoting Felix Kuehling : > Hi Gustavo, > > Thanks for catching that. When returning a fault, I think you also need > to srcu_read_unlock(&kfd_processes_srcu, idx). > > However, instead of returning an error, I think I'd prefer to skip PDDs > that can't be found with continue statements. That way others would > still suspend and resume successfully. Maybe just print a WARN_ON for > PDDs that aren't found, because that's an unexpected situation, > currently. Maybe in the future it could be normal thing if we ever > support GPU hotplug. > I got it. In that case, what do you think about the following patch instead? index a22fb071..4ff5f0f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -461,7 +461,8 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev) hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { mutex_lock(&p->mutex); pdd = kfd_get_process_device_data(dev, p); - if (pdd->bound != PDD_BOUND_SUSPENDED) { + + if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) { mutex_unlock(&p->mutex); continue; } @@ -501,6 +502,11 @@ void kfd_unbind_processes_from_device(struct kfd_dev *dev) mutex_lock(&p->mutex); pdd = kfd_get_process_device_data(dev, p); + if (WARN_ON(!pdd)) { + mutex_unlock(&p->mutex); + continue; + } + if (pdd->bound == PDD_BOUND) pdd->bound = PDD_BOUND_SUSPENDED; mutex_unlock(&p->mutex); Thank you for the feedback. -- Gustavo > Regards, >   Felix > > > On 2018-01-10 11:50 AM, Gustavo A. R. Silva wrote: >> In case kfd_get_process_device_data returns null, there are some >> null pointer dereferences in functions kfd_bind_processes_to_device >> and kfd_unbind_processes_from_device. >> >> Fix this by null checking pdd before dereferencing it. >> >> Addresses-Coverity-ID: 1463794 ("Dereference null return value") >> Addresses-Coverity-ID: 1463772 ("Dereference null return value") >> Signed-off-by: Gustavo A. R. Silva >> --- >> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> index a22fb071..29d51d5 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> @@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev) >> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { >> mutex_lock(&p->mutex); >> pdd = kfd_get_process_device_data(dev, p); >> + >> + if (!pdd) { >> + pr_err("Process device data doesn't exist\n"); >> + mutex_unlock(&p->mutex); >> + return -EFAULT; >> + } >> + >> if (pdd->bound != PDD_BOUND_SUSPENDED) { >> mutex_unlock(&p->mutex); >> continue; >> @@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct >> kfd_dev *dev) >> mutex_lock(&p->mutex); >> pdd = kfd_get_process_device_data(dev, p); >> >> + if (!pdd) { >> + mutex_unlock(&p->mutex); >> + return; >> + } >> + >> if (pdd->bound == PDD_BOUND) >> pdd->bound = PDD_BOUND_SUSPENDED; >> mutex_unlock(&p->mutex);