Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4362761pxf; Tue, 30 Mar 2021 06:12:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzF+K7RmsfxyT2qNCXJLN62hCeHXcx9X9FxtHb1uT+uCt+04fFmy51kTBOkBg2/Capr5UWl X-Received: by 2002:a17:906:a1c8:: with SMTP id bx8mr32586210ejb.381.1617109933064; Tue, 30 Mar 2021 06:12:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617109933; cv=none; d=google.com; s=arc-20160816; b=HN3ufFBlrj3RDLg3iAtXTYxLVbsteu1Ujlgkcu6gAIIQT25qIhLwvzm5O2oEs58huc DJ4Gp8BNwVv0yGz+4vKrz2qIqIWn1nrl5r9gc3p+XK2xBUtykHQPLYi8DXlwQJ7dpFJR 7nts+8oU9H10kBia4epmW3iZqYtNwEH9WSW30XfpV1VbqrvqjPlzmWE81vyojMvU+AOg /qMKTfY34YUKUXOh1nUJkXtZe+WMiA31qeXl7dz2JHg3CPO3FODfq/hhk3tAZEoPFT7L 1PeQRmOzPSkgMAhwiaHK/wGPxa+yQ/obCTJKayQFxnAydRqnLJJ0qJs2Dm66ySAYDNwU FEwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=ytEQr57LsoFeSHuQr3uswn2OCrv8xq8V7UP+85NuBXE=; b=ldxuFhh+sXnAV17+Rt+LdgWvNRaQ/N+Dmt22/cROdQfVcGKIYLtp55ECicgLcTlgZE MNsci73CeIowIxWKyuzRQG/maERghJDnIDky7lmjVoU5abB7/Svu/nEGJ5j5ZdJKmBc3 CVitE/GMOlOxhbeLI+w3i0N487OwzyU1u5fmGLaUygWlHc5Aywugb53f/SzKlV3noIBe YodrfAOFKNzM5fTw+pILJegASBGmbmC5yQg4REMrmAgQFEO9sH2X7XlLYdrwPm5dkipr QRmfeCU2MIYgUt2+bfKRHZKozyF34U/k9pzV6+uLwHqSEN3HImuY81Y3MZKxfFyUCYo+ HB5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j14si9708592edw.465.2021.03.30.06.11.49; Tue, 30 Mar 2021 06:12:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232120AbhC3NIb (ORCPT + 99 others); Tue, 30 Mar 2021 09:08:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232000AbhC3NIA (ORCPT ); Tue, 30 Mar 2021 09:08:00 -0400 X-Greylist: delayed 449 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 30 Mar 2021 06:07:59 PDT Received: from redcrew.org (redcrew.org [IPv6:2a02:2b88:2:1::1cde:1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72158C061574; Tue, 30 Mar 2021 06:07:59 -0700 (PDT) Received: from server.danny.cz (19.161.broadband4.iol.cz [85.71.161.19]) by redcrew.org (Postfix) with ESMTP id D101369; Tue, 30 Mar 2021 15:00:06 +0200 (CEST) Received: from talos.danny.cz (talos.danny.cz [192.168.160.68]) by server.danny.cz (Postfix) with SMTP id 88900DA004; Tue, 30 Mar 2021 15:00:06 +0200 (CEST) Date: Tue, 30 Mar 2021 15:00:04 +0200 From: Dan =?UTF-8?B?SG9yw6Fr?= To: Christian =?UTF-8?B?S8O2bmln?= Cc: Xi Ruoyao , Christian =?UTF-8?B?S8O2bmln?= , Alex Deucher , David Airlie , Felix Kuehling , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , stable@vger.kernel.org Subject: Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems Message-Id: <20210330150004.857ae73704c3533692cf79f0@danny.cz> In-Reply-To: References: <20210329175348.26859-1-xry111@mengyan1223.wang> <9a11c873-a362-b5d1-6d9c-e937034e267d@gmail.com> <84b3911173ad6beb246ba0a77f93d888ee6b393e.camel@mengyan1223.wang> <97c520ce107aa4d5fd96e2c380c8acdb63d45c37.camel@mengyan1223.wang> <7701fb71-9243-2d90-e1e1-d347a53b7d77@gmail.com> <368b9b1b7343e35b446bb1028ccf0ae75dc2adc4.camel@mengyan1223.wang> <71e3905a5b72c5b97df837041b19175540ebb023.camel@mengyan1223.wang> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.32; powerpc64le-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 Mar 2021 14:55:01 +0200 Christian König wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > >> On 2021-03-29 21:36 +0200, Christian König wrote: > >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > >>>> Hi Christian, > >>>> > >>>> I don't think there is any constraint implemented to ensure `num_entries % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`: > >>>> > >>>>          /* validate the parameters */ > >>>>          if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > >>>>              size == 0 || size & AMDGPU_GPU_PAGE_MASK) > >>>>                  return -EINVAL; > >>>> > >>>> /* snip */ > >>>> > >>>>          saddr /= AMDGPU_GPU_PAGE_SIZE; > >>>>          eaddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> > >>>> /* snip */ > >>>> > >>>>          mapping->start = saddr; > >>>>          mapping->last = eaddr; > >>>> > >>>> > >>>> If we really want to ensure (mapping->last - mapping->start + 1) % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > >>>> "AMDGPU_GPU_PAGE_MASK" > >>>> in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > >>> Yeah, good point. > >>> > >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with > >>>> this > >>>> change. > >>> Well in theory it is possible that we always fill the GPUVM on a 4k > >>> basis while the native page size of the CPU is larger. Let me double > >>> check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. it's a gitlab shortcut for https://gitlab.freedesktop.org/agd5f/linux/-/commit/fe001e70a55d0378328612be1fdc3abfc68b9ccc Dan > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) >                 } >                 dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >                 dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > -               dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > +               dev_info->gart_page_size = > dev_info->virtual_address_alignment; >                 dev_info->cu_active_number = adev->gfx.cu_info.number; >                 dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >                 dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > Thanks, > Christian. > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University > > >