Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752239AbeAJW0M (ORCPT + 1 other); Wed, 10 Jan 2018 17:26:12 -0500 Received: from mail-co1nam03on0058.outbound.protection.outlook.com ([104.47.40.58]:51856 "EHLO NAM03-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750782AbeAJW0K (ORCPT ); Wed, 10 Jan 2018 17:26:10 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Felix.Kuehling@amd.com; Subject: Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences To: "Gustavo A. R. Silva" Cc: Oded Gabbay , Alex Deucher , =?UTF-8?Q?Christian_K=c3=b6nig?= , David Airlie , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <20180110165008.GA10691@embeddedor.com> <20180110155849.Horde.DDGbi3ysasL2eHmvZ4k8adb@gator4166.hostgator.com> From: Felix Kuehling Organization: AMD Inc. Message-ID: <413c85ee-aca0-fd86-4792-0d620b9251e2@amd.com> Date: Wed, 10 Jan 2018 17:26:05 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180110155849.Horde.DDGbi3ysasL2eHmvZ4k8adb@gator4166.hostgator.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-CA X-Originating-IP: [165.204.55.251] X-ClientProxiedBy: DM5PR0401CA0028.namprd04.prod.outlook.com (10.167.104.169) To CY4PR1201MB0229.namprd12.prod.outlook.com (10.172.78.150) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: fc4c376d-2252-44f5-5c43-08d5587925a5 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(2017052603307)(7153060)(7193020);SRVR:CY4PR1201MB0229; X-Microsoft-Exchange-Diagnostics: 1;CY4PR1201MB0229;3:6sh8RU/pjdqzKahg9dQDLFED+DTAgCRniNbArTPuGyA5Irrz5+Fw6e6L//xBEg6WDCvAwhwULVhzBDwjUJZ4cIWP7vO5gBBf/kQDWhvCkkToaxRuDa1t97ecl0eWq0w6YUKYppCK7hHBgHu0sWRhFwt/PiZt8kpC0RMye46VqOXY5gSXaOgt0DDxmY4ITlvxHCcVE8VgZIep+mpKTVUKGZ8egXMC1JMUu+cneyfJMNW3knoGUUs4n6B0f+l0YMbS;25:9egrcjpcY8nB+xVFdPshUqTtn3qioVTDIVngwjcLnJK0DcMZ0QNiYoFTyIuLhkCVbj7hnPadZ18J+aOTVNf8SVz2tvQcyTeJA2SO6LUlwSLSeEys0mbQxFNPFC5OteDuUS1np87LkFg9vx4AMcGZILhw84JxhukKYGxEAW61+Vo8/RkRp+4J9oO1I2b+spKcDZ/FfwqbyCO8E59mAmrv9MM1qjLk76WnahtlJl90AP7TwKWZpWBKmjENCHX2ETLl4ufSxTq2W3j4ZKaSmL6fwPXE/pjvbrPospjZRtfXiX9H7ahQczD9DY1nT5B10+AsopN6fXelv0JekepoObD/sA==;31:IQZrfDX99ZtVEBkUrxGRyy+92cuiEsOMLx2ZlHDCiRv4mqPXH7bbjqtkO1lGOCmc0CrV0knSccUYMGeWtJJNsrTHiDYoPQK52WjPP40je8HsbB8jXi3Hi/Mxa1JO8eqaI4C+4cdYuPGNZi01lqCGnV2ey9KEioBhMGWOgtMHg/Ir+AyE9AG/sM41ftSwIzO3yA1KfueZSCxQdX3vzyneZBd0aUxgmGGuW4AZ5l+Mb0M= X-MS-TrafficTypeDiagnostic: CY4PR1201MB0229: X-Microsoft-Exchange-Diagnostics: 1;CY4PR1201MB0229;20:OGgHozcf+OuM3EX8uihP32/Gl559ngV2M9rlYaOQbHcP1ibRZACRM2BMWMhbM0bSnpoMfobl5tNIGnbXPoWONOmgFw9su0tEjWpfO+krwtNoyHUS+wzlYd4klBQ9kPIVIGdJ7E0rkA1fnrcCtSEggN61L1jigXoPKxBoSgXS6YmSk42NyrSifNQQR12P9olovCK1l57rouPoP0L+S5cN04Tqgtm8jGace8tuZlJF0vyoPweNshWb8cPEOrF77AjlAq6KDkjfwI5e72/a0o16J/LCNndelvwYfVmcvbBQtHmmkVr+OgV6jbx2OwCoy9zTDFe345MTFwmk0WFFZbLfs/AInaeEXirpXRdY2uO4hnZ2AuZB4qo5ohqJSWBWgGYEOx3KHE4+OeplRkOZYPrV7Z18qUymeMyKoPsZoXq+uI7fB1DyrwHhP1j8UjCSh9WFv97wsUeBg07obG4+7X4dPFl1qi1BU/bDbGOLN83kOuzwkubLrbWVtWKCNQM8hpCC;4:Vc7+UEF+EyoccKO6zueEsnDm7fTercMK+4Kp6Kr89RdvRVZgRHPXw35KcEXt/+canR9L5Pt/7d+QhUenjd+vatoaODr0X4bjMdIvqTUXWgTRkWmimoNUc1HrKl4bQjFIFpYAMfzCfLDGoTk0tMuNyBVVV+m7pJq68xN6tRukIukE47CPDCu1QZrMQMwcw5aBEQ6hsitDFYhd7aog2PO9bkPqqa00FYH1ooQBlmPy9/Y1KZDpKI9wz25FCaIWMNr0zvtb0GYHbrEzM0Let+ix4GPPot1dvWLg/q216grQho4XnfTflSaaexVf0Zr6d0eU X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(10201501046)(3231023)(944501075)(6055026)(6041268)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(20161123558120)(6072148)(201708071742011);SRVR:CY4PR1201MB0229;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:CY4PR1201MB0229; X-Forefront-PRVS: 0548586081 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(396003)(39380400002)(39860400002)(376002)(346002)(366004)(189003)(76104003)(377424004)(199004)(24454002)(3846002)(8676002)(2906002)(386003)(6116002)(81166006)(97736004)(81156014)(72206003)(52116002)(2486003)(478600001)(575784001)(76176011)(52146003)(59450400001)(86362001)(23676004)(36756003)(31686004)(36916002)(53546011)(16576012)(316002)(8936002)(65806001)(66066001)(105586002)(345774005)(47776003)(65956001)(7736002)(305945005)(16526018)(54906003)(50466002)(229853002)(58126008)(77096006)(83506002)(68736007)(2950100002)(6666003)(6916009)(64126003)(65826007)(90366009)(39060400002)(5660300001)(2870700001)(6486002)(4326008)(31696002)(106356001)(25786009)(6246003)(53936002);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR1201MB0229;H:[172.27.225.16];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjEyMDFNQjAyMjk7MjM6WjVDYzhKMWJYa3YxcUZlbXFSZnRGcVBO?= =?utf-8?B?Z3NRV080b1N4ZEw2VEgrUUM2ZFRrK05wTEJ4TnpKU0gwNUp4cXJDV0Y2TlYr?= =?utf-8?B?SGQyS29SU2ppUDBDVTRGQVNIdkVsZWk2RVNrWVN1YjlLeUVoVTFCdlVHaDdI?= =?utf-8?B?eHI3b0h2RXJLR1lkN2gvcU04MmFyZHRiem02R2ppbzdaUlZMQjdhZ3dNYzc3?= =?utf-8?B?VGtvR0N4NkxuM0pybGZDTnhXSElGdStYaU5zVEsybWZ4R291dEI3Ty9VSVY1?= =?utf-8?B?bWRvWFM1MlBwRWJMWVIzM3BaSzUzbFdVZVpKTDVjU0dVUVk4QjVNeGU3TlVx?= =?utf-8?B?VEM5MGZEZUxENEptRkZ1V1BXVDFRVmF1cVNIamdHa0hxcEg2NUdnUjl2eUFL?= =?utf-8?B?YXFqUUZRRndyY1UvTTNvYVp0d0hCV2U4UlZBaUhkWkExVU9YWkhiZXVzVzNN?= =?utf-8?B?U2VMK0V0NWk3TnVYc0VQS1NuV2J3ZDk5Yk5YWlVPMTVvWGNCV2VMVnRQT2dJ?= =?utf-8?B?ZWNPVTROcEhJWllpdHBpR2p6dFptUTcxcGZxTHk1ZnVrcVlUK1lUYjFiZXBk?= =?utf-8?B?cWtCd1V5TGJUK01hQzVyYTNibXlnc2w1aDJGSUp2ZW9wMGdjNnMzd2VFRHlp?= =?utf-8?B?R3NWcG02R2ZLa3dqbm5SMC9tVmV6T3pIKzF6emllWE14RlhJYm1WRXpHTFEr?= =?utf-8?B?N3owcDkrUVMyejJKWWJGdWFTcjNKMklkSG43NlBRQWZvQ3ozeWNIRFNtL2hw?= =?utf-8?B?eWtGQ0xBaGRSaXoxUVZxSkQyWnZlZ2tOQmVUVlNPSHRZZWlZaE02YU5ZS0FF?= =?utf-8?B?aE9Ccm84LzRBM2FsQ0xaL09DM1MrZWVQUzFFaE1hMXdzbXZJVVF2azlncDZ0?= =?utf-8?B?TUxWWDAvY1pIZVJyeklqZ3ZrazkvTU5tUTFlZU9RVFRuMmYyUklRMGxZNDRD?= =?utf-8?B?OFFQSFFVeklsV2UvUDA0V1dtVTJYQVpKYUl5MkN4bTNobUpUejJPNlRXR3Az?= =?utf-8?B?cm91SlF5eXJDbVBzYi9uYWFrdVFQVXBkNllMQjRrb1RFcjdiRFNnSUh0NjIv?= =?utf-8?B?ZmxNbk5IRzNLWXJmNEQ4dE8zY3IvNGx1NldzcmdsVEp6N2JMbG50TGJsTWxp?= =?utf-8?B?SjR3bGRwWDd4QllTL3E3ekxlWU9ZK3VlYVV0MXdGdFVSb0R6WUFHaWo5VmZr?= =?utf-8?B?bXI0ak1mbFl5VW5xaGZKdmw4M2hVV0Jrc2RrT1N3alFlMWJ5Q0p4RVdvRklN?= =?utf-8?B?QmFPejc4bTlHby9VNGtIaC9sVElFeWNlRVJYUkx5V0s2ZmsvSEZTMmxEVmNn?= =?utf-8?B?TVZJMXB3d1V2bzVkTWpnemhITmZPSlBvMmQ1S3FtdzN2aVU4RkUyRy9BYlBq?= =?utf-8?B?R3ZaZTllZXBkQVM3Wms4ZGNjM3oxVnFGM2hkU1RBZFJzem1KMGZ6N1ZlZFNR?= =?utf-8?B?TFR3czlGekh0UnBXc0ZFZm1oOERla2hzY1ljUXRjcTZETk45dDNSN3RUOWN4?= =?utf-8?B?KzlpNTRhUnJyLzlLaEM5M29yZitYOGlRY0R2UHFObmdkSDRjRTBHaitUMEVx?= =?utf-8?B?TnVHMXgzbFN4TVdhL2hINnIraWpTaHVpS21zUlowOFNXeGg5VVkvTkJmOE5F?= =?utf-8?B?d1RKdlp3anYrVnVRVlZHTTJUZk5QWmZnU1kxNTFxNWY4ei80NmhOYUN6aTR0?= =?utf-8?B?VXRtb1NXampHMGJ6Z2JxWXRpUUtsYnJKbkovaXo1YUVxMEk0RTl0ZVE4M3hy?= =?utf-8?B?bVFEczhCdGdYNHRiZ1ZPL1g0SGdBdkI5dzNNRitQckVaWHliQlpwTldWNXFu?= =?utf-8?B?SElNOC8ycXNuWGFWcGpZZ3BmTDhwQmkrYkZjM1UyT2xwLzdQc1lDdm84R0RO?= =?utf-8?B?RkF5bFVSMXpIdG9ESmZ1VkFiY25KSkxGdzhJQXkyMnZjSU5jUFN2ci9mQ2tr?= =?utf-8?B?ZnVMUEdEQzJsM2VLYk1QeERUejA0TXhlNWtFdy9oMGo2RW01VmN2NWxrSWFm?= =?utf-8?B?Q2R3Z1RJMktveUdmcldCcTFYaXVmWnhXWDk2RlVpRitxSTQwVW16cEt4YXdF?= =?utf-8?B?S0d2eFM5eFlZM3ZKUzBiL2dZcENIZ1hySisxbzNMdllJWVVtNVh2aXBTbWpj?= =?utf-8?Q?YsoAqVZp2lrNp0axaUwcU+DCqQNf0S+TrDDqUR676LrNvT?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR1201MB0229;6:3C10q2fiPPoQl9oRs9EDvXQU550D8nYgJ3BBXF0KEydJ7lgJBdrpBAT/hcLmtZfeyxSd1Tlrx3xG2/iC2d9sEOr1DPdV8QS2jItMfyIXXSMnbQOQYnd4OHOkuk7z49P2fyJmY7gP36PbMlyvg/PwSeymsitS4KlEtYlQz1WkzTjMn99aMkw80T1kQA31mGBGcYyKIxVJJxjVTmtps14kza/6XcOaQW0Li08tPMB2/0So4JSaBlU964Kgywv4wC+pJ4ybSwispHKtjYNJQNFA4aFO4/cL2wI8b4ASgfmXpXEiGUcgExEwORdhZeRHStPwrD0ptU8LOKANcVtuX17Sq5b1t+Hbj3/FpY33PEWPUnQ=;5:K4dQxoes1cMQ/3jlh/CbLy42+5rK8Wn7aGNL0U0mZbcL0Jx0OoE2mmHmrosP124hFXdyBeiGmgvf5wrM/v+j68+20rqe0j78xtTb+mrGPDZEJSWR2m24rtiuS3eQALcf1511PlImNuDX665DlKWhVdFdfq8hOh+ltyylmVxNZ54=;24:Dm/gUKWGZ8CtdRgP3zt/ljIG+kqIXalIya/DvZ2870H78EdGEk++xZES9QDb6lYO2+qWUj5d2c5ZUK9mjEMcFAYnUoZ+LKWmgHWSIWCNd4Y=;7:EywiBvbjZVu12xqwwCQIl3MxrAkrvplKooZ8Hpx5T0EMqXmIDUlCNRO/vfhlGm9KM4qiOe91ta9CMZm9+So0dJWTNRGr3ZGI3SuphvZMyNjuV+kXSSPxQq5MV8nc38J5eGd50JNYLB8D53+ctQp3JDwodslsc4plSSSdn8qkZHXpAtxBV8BRmOQvYIsdTRjGDiszyZ2ro0sBF2G9+DoJQtkL/wFykMSUxm2NRKiexiOj1AF9m7Ajdfp8EDPVW/c0 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR1201MB0229;20:hvjS+nt0oYntdMK9LGudxiSFz//pJfOCuVFq/AJpp1bLBgBIfS7L3oAz346mxyov0mTlgG+QzimyCE09ZRtiHqGtyN3sH/Iw5NzxGXvTBaHmBhyZindQRZZ/9SwRXK+ehULfAu66+5bUdD7J2JQ7ojfI9c/uDj07p/Fb2b40IHcPk+T2IFPiARXCspZwrkrLoBjq4Ork3BIYjjeUh9NRzMWK0Argxq9Ucmos9XE2XGP880IKWkDa9XiwzTn/D1Sa X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2018 22:26:08.2829 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fc4c376d-2252-44f5-5c43-08d5587925a5 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1201MB0229 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Yeah, this looks good to me. Regards,   Felix On 2018-01-10 04:58 PM, Gustavo A. R. Silva wrote: > Hi Felix, > > Quoting Felix Kuehling : > >> Hi Gustavo, >> >> Thanks for catching that. When returning a fault, I think you also need >> to srcu_read_unlock(&kfd_processes_srcu, idx). >> >> However, instead of returning an error, I think I'd prefer to skip PDDs >> that can't be found with continue statements. That way others would >> still suspend and resume successfully. Maybe just print a WARN_ON for >> PDDs that aren't found, because that's an unexpected situation, >> currently. Maybe in the future it could be normal thing if we ever >> support GPU hotplug. >> > > I got it. In that case, what do you think about the following patch > instead? > > index a22fb071..4ff5f0f 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > @@ -461,7 +461,8 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev) >         hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { >                 mutex_lock(&p->mutex); >                 pdd = kfd_get_process_device_data(dev, p); > -               if (pdd->bound != PDD_BOUND_SUSPENDED) { > + > +               if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) { >                         mutex_unlock(&p->mutex); >                         continue; >                 } > @@ -501,6 +502,11 @@ void kfd_unbind_processes_from_device(struct > kfd_dev *dev) >                 mutex_lock(&p->mutex); >                 pdd = kfd_get_process_device_data(dev, p); > > +               if (WARN_ON(!pdd)) { > +                       mutex_unlock(&p->mutex); > +                       continue; > +               } > + >                 if (pdd->bound == PDD_BOUND) >                         pdd->bound = PDD_BOUND_SUSPENDED; >                 mutex_unlock(&p->mutex); > > > Thank you for the feedback. > -- > Gustavo > >> Regards, >>   Felix >> >> >> On 2018-01-10 11:50 AM, Gustavo A. R. Silva wrote: >>> In case kfd_get_process_device_data returns null, there are some >>> null pointer dereferences in functions kfd_bind_processes_to_device >>> and kfd_unbind_processes_from_device. >>> >>> Fix this by null checking pdd before dereferencing it. >>> >>> Addresses-Coverity-ID: 1463794 ("Dereference null return value") >>> Addresses-Coverity-ID: 1463772 ("Dereference null return value") >>> Signed-off-by: Gustavo A. R. Silva >>> --- >>>  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++ >>>  1 file changed, 12 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> index a22fb071..29d51d5 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> @@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev >>> *dev) >>>      hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { >>>          mutex_lock(&p->mutex); >>>          pdd = kfd_get_process_device_data(dev, p); >>> + >>> +        if (!pdd) { >>> +            pr_err("Process device data doesn't exist\n"); >>> +            mutex_unlock(&p->mutex); >>> +            return -EFAULT; >>> +        } >>> + >>>          if (pdd->bound != PDD_BOUND_SUSPENDED) { >>>              mutex_unlock(&p->mutex); >>>              continue; >>> @@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct >>> kfd_dev *dev) >>>          mutex_lock(&p->mutex); >>>          pdd = kfd_get_process_device_data(dev, p); >>> >>> +        if (!pdd) { >>> +            mutex_unlock(&p->mutex); >>> +            return; >>> +        } >>> + >>>          if (pdd->bound == PDD_BOUND) >>>              pdd->bound = PDD_BOUND_SUSPENDED; >>>          mutex_unlock(&p->mutex); > > > > > >