Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp234269pxb; Thu, 17 Feb 2022 02:58:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJxwkhm+fwV1R/tjA3HAVjkAh/MXOIXJNRpszr1QPG4FrHw3lMFnwEbSalUjYhg/pmqdnAj6 X-Received: by 2002:a17:902:d892:b0:14e:e074:7ff7 with SMTP id b18-20020a170902d89200b0014ee0747ff7mr2253171plz.29.1645095523949; Thu, 17 Feb 2022 02:58:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645095523; cv=none; d=google.com; s=arc-20160816; b=iGpJmFOI9fPQz3XTJ48urOEU7Iz162lhVZ1LQFKatOkrn9waH22pjayrFZSeULYe3K bpaY0kWQ6ajhBkff9VB7cV6DXdFL3y9Tl2p6HgKuY6Pt6kBSJ6+OXU/RuNBHwlz4pHLS IRjdvSYTfODpWmnwNw9goGjdjsB9A8W4e+MSHme+6G6N6KhyUh7nguOOPO914TsYlU3V u6ZosIDodleVnwKATto0mrxuLahmjA89j182gt0nwsu6SfJ4PSjAA07QxoKfhXxcR2t8 6Vdq7kuph4Gy0/13E/r0CrWV7NId3J6+gme0QAVG8UW7N1RyWkIj/HtZ3LrZ8uW1KECo PCsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=FVhqmkH6yc31MloKbV/NsodtEi0k7Jmsk5XVPv0CBzk=; b=F3IuBnQ+85d/I8xagr3QT+oO+WN9ajfC42JYPHN2vlu2yb6nuYYa1kMgfCZbD0fegm t3r0S/kscrAh3A+92rdzsKOSYrUVaxP5tpQkIVxSJW3UYz87buUj75cajVJrweBpAhB7 rWptRm/FjNNFSxPcs5Tas17V8pY2p6yXY2CnWQgZVatD10Q06st0e3YK3Ru8/spGJLNB EsYXPT79JYY+fcFtJpIfYTW78e0ybfB6VBZJ2zX9GPgSEane8mvLMPrXJOPQjyvHdZL3 aXo5tqRrRyTWsfjiKDntsRLE36lb2FDbHoyA6C5Qe4prwVKCdB9m9at7D3V/HKbLPAe5 ExBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Vo1V1cH5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q8si3837656pfk.281.2022.02.17.02.58.30; Thu, 17 Feb 2022 02:58:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Vo1V1cH5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239070AbiBQKNy (ORCPT + 99 others); Thu, 17 Feb 2022 05:13:54 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:59112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239065AbiBQKNv (ORCPT ); Thu, 17 Feb 2022 05:13:51 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9691A143469; Thu, 17 Feb 2022 02:13:36 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id v25so3608081oiv.2; Thu, 17 Feb 2022 02:13:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FVhqmkH6yc31MloKbV/NsodtEi0k7Jmsk5XVPv0CBzk=; b=Vo1V1cH5WWS/Rldjvk0N7Q8JDm1Uad2uP3jDgcyuHEOXedhkZOMxejMfFjVa5Cubt6 U1eBSItErjVYqgOt+h5Ex6pdRJmYl6mwABclufO9w77bbSXcv1ch6UedDg+UfoILGENR /wA8hirqYV8zYA5gIzAFDxnyLgIGQHU5/udM7vJdiZR2hDD2m01ZriCiVIL2ue8Y8CmN IEpbJKv1zIPQFlEeojUtE9DgQS3+MVv3KBlfUeYL0aD4fY7hEWpsndcf3c6YwG3ZxclH +kjUZ+ZbS7cJ/qSu9Id7eNbbIkmTSllXAlEFqsrcI4n8ztcNfSrqwKee74Re5hskuQ6a 76hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FVhqmkH6yc31MloKbV/NsodtEi0k7Jmsk5XVPv0CBzk=; b=lWLF6eNbmIMU9Kr0Hpb3+UO6uYlJri55XYZs0535xwmIjbKvw0TxdUeptZ86vVxuhD 7ggEG/HHBPmzvTsaRVWsixkPb2aXHCsxp55zuCgL0LkV4vZxfrdsGRHviSdgDgyPi2Q3 XGxF4MFpAOqejloa9FkpHoaWRJyulHethBjoaTpWorfdFBLNEpNcBYkYjWW66mgdb/3q Tgx2Zji2Fw9NzQuH/tX/eiZK2YgIbH4300mLl7hnH3+xonb/5JmSvIOI35GXC3ZX93IB cKGLLouToY32ep0L3ITWaqOfi6+sWWWsm6TDm2VYV2bqkVxYpEblnskh9Fyz1hpfQ+vK 0aGg== X-Gm-Message-State: AOAM5311wJqmwUcHx9u8HybNRM3zZPOal8rbKpUc0mVGjZDUENH33C6J AyvwpqgcrbOvottMnVeP/4hdZeASBpDJiLZUm/I= X-Received: by 2002:a05:6808:188b:b0:2d4:70f2:3cfa with SMTP id bi11-20020a056808188b00b002d470f23cfamr824941oib.168.1645092815615; Thu, 17 Feb 2022 02:13:35 -0800 (PST) MIME-Version: 1.0 References: <20220217090440.4468-1-qiang.yu@amd.com> <5d3fdd2c-e74a-49f4-2b28-32c06483236f@amd.com> In-Reply-To: From: Qiang Yu Date: Thu, 17 Feb 2022 18:13:24 +0800 Message-ID: Subject: Re: [PATCH] drm/amdgpu: check vm bo eviction valuable at last To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Qiang Yu , Alex Deucher , "Pan, Xinhui" , David Airlie , Daniel Vetter , Sumit Semwal , linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org, dri-devel , amd-gfx@lists.freedesktop.org, Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 17, 2022 at 5:46 PM Christian K=C3=B6nig wrote: > > Am 17.02.22 um 10:40 schrieb Qiang Yu: > > On Thu, Feb 17, 2022 at 5:15 PM Christian K=C3=B6nig > > wrote: > >> Am 17.02.22 um 10:04 schrieb Qiang Yu: > >>> Workstation application ANSA/META get this error dmesg: > >>> [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16= ) > >>> > >>> This is caused by: > >>> 1. create a 256MB buffer in invisible VRAM > >>> 2. CPU map the buffer and access it causes vm_fault and try to move > >>> it to visible VRAM > >>> 3. force visible VRAM space and traverse all VRAM bos to check if > >>> evicting this bo is valuable > >>> 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() > >>> will set amdgpu_vm->evicting, but latter due to not in visible > >>> VRAM, won't really evict it so not add it to amdgpu_vm->evicted > >>> 5. before next CS to clear the amdgpu_vm->evicting, user VM ops > >>> ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) > >>> but fail in amdgpu_vm_bo_update_mapping() (check > >>> amdgpu_vm->evicting) and get this error log > >>> > >>> This error won't affect functionality as next CS will finish the > >>> waiting VM ops. But we'd better make the amdgpu_vm->evicting > >>> correctly reflact the vm status and clear the error log. > >> Well NAK, that is intentional behavior. > >> > >> The VM page tables where considered for eviction, so setting the flag = is > >> correct even when the page tables later on are not actually evicted. > >> > > But this will unnecessarily stop latter user VM ops in ioctl before CS > > even when the VM bos are not evicted. > > Won't this have any negative effect when could do better? > > No, this will have a positive effect. See the VM was already considered > for eviction because it is idle. > > Updating it immediately doesn't necessarily make sense, we should wait > with that until its next usage. > > Additional to that this patch doesn't really fix the problem, it just > mitigates it. > > Eviction can fail later on for a couple of reasons and we absolutely > need to check the flag instead of the list in amdgpu_vm_ready(). The flag only for both flag and list? Looks like should be both as the list indicate some vm page table need to be updated and could delay the user update with the same logic as you described above. Regards, Qiang > > Regards, > Christian. > > > > > Regards, > > Qiang > > > >> What we should rather do is to fix amdgpu_vm_ready() to take a look at > >> the flag instead of the linked list. > >> > >> Regards, > >> Christian. > >> > >>> Signed-off-by: Qiang Yu > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 85 ++++++++++++++-------= ---- > >>> 1 file changed, 47 insertions(+), 38 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/dr= m/amd/amdgpu/amdgpu_ttm.c > >>> index 5a32ee66d8c8..88a27911054f 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >>> @@ -1306,45 +1306,11 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgp= u_device *adev, struct ttm_tt *ttm, > >>> return flags; > >>> } > >>> > >>> -/* > >>> - * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a = buffer > >>> - * object. > >>> - * > >>> - * Return true if eviction is sensible. Called by ttm_mem_evict_firs= t() on > >>> - * behalf of ttm_bo_mem_force_space() which tries to evict buffer ob= jects until > >>> - * it can find space for a new object and by ttm_bo_force_list_clean= () which is > >>> - * used to clean out a memory space. > >>> - */ > >>> -static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object= *bo, > >>> - const struct ttm_place *pla= ce) > >>> +static bool amdgpu_ttm_mem_eviction_valuable(struct ttm_buffer_objec= t *bo, > >>> + const struct ttm_place *pl= ace) > >>> { > >>> unsigned long num_pages =3D bo->resource->num_pages; > >>> struct amdgpu_res_cursor cursor; > >>> - struct dma_resv_list *flist; > >>> - struct dma_fence *f; > >>> - int i; > >>> - > >>> - /* Swapout? */ > >>> - if (bo->resource->mem_type =3D=3D TTM_PL_SYSTEM) > >>> - return true; > >>> - > >>> - if (bo->type =3D=3D ttm_bo_type_kernel && > >>> - !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo))) > >>> - return false; > >>> - > >>> - /* If bo is a KFD BO, check if the bo belongs to the current pr= ocess. > >>> - * If true, then return false as any KFD process needs all its = BOs to > >>> - * be resident to run successfully > >>> - */ > >>> - flist =3D dma_resv_shared_list(bo->base.resv); > >>> - if (flist) { > >>> - for (i =3D 0; i < flist->shared_count; ++i) { > >>> - f =3D rcu_dereference_protected(flist->shared[i= ], > >>> - dma_resv_held(bo->base.resv)); > >>> - if (amdkfd_fence_check_mm(f, current->mm)) > >>> - return false; > >>> - } > >>> - } > >>> > >>> switch (bo->resource->mem_type) { > >>> case AMDGPU_PL_PREEMPT: > >>> @@ -1377,10 +1343,53 @@ static bool amdgpu_ttm_bo_eviction_valuable(s= truct ttm_buffer_object *bo, > >>> return false; > >>> > >>> default: > >>> - break; > >>> + return ttm_bo_eviction_valuable(bo, place); > >>> } > >>> +} > >>> > >>> - return ttm_bo_eviction_valuable(bo, place); > >>> +/* > >>> + * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a = buffer > >>> + * object. > >>> + * > >>> + * Return true if eviction is sensible. Called by ttm_mem_evict_firs= t() on > >>> + * behalf of ttm_bo_mem_force_space() which tries to evict buffer ob= jects until > >>> + * it can find space for a new object and by ttm_bo_force_list_clean= () which is > >>> + * used to clean out a memory space. > >>> + */ > >>> +static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object= *bo, > >>> + const struct ttm_place *pla= ce) > >>> +{ > >>> + struct dma_resv_list *flist; > >>> + struct dma_fence *f; > >>> + int i; > >>> + > >>> + /* Swapout? */ > >>> + if (bo->resource->mem_type =3D=3D TTM_PL_SYSTEM) > >>> + return true; > >>> + > >>> + /* If bo is a KFD BO, check if the bo belongs to the current pr= ocess. > >>> + * If true, then return false as any KFD process needs all its = BOs to > >>> + * be resident to run successfully > >>> + */ > >>> + flist =3D dma_resv_shared_list(bo->base.resv); > >>> + if (flist) { > >>> + for (i =3D 0; i < flist->shared_count; ++i) { > >>> + f =3D rcu_dereference_protected(flist->shared[i= ], > >>> + dma_resv_held(bo->base.resv)); > >>> + if (amdkfd_fence_check_mm(f, current->mm)) > >>> + return false; > >>> + } > >>> + } > >>> + > >>> + /* Check by different mem type. */ > >>> + if (!amdgpu_ttm_mem_eviction_valuable(bo, place)) > >>> + return false; > >>> + > >>> + /* VM bo should be checked at last because it will mark VM evic= ting. */ > >>> + if (bo->type =3D=3D ttm_bo_type_kernel) > >>> + return amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)); > >>> + > >>> + return true; > >>> } > >>> > >>> static void amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, = loff_t pos, >