Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4318464pxf; Tue, 30 Mar 2021 05:06:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwX5ln/2v9wLUz4qo5frsLcezPZPMlycPhzcfPd4EyBVSl2aiji8Icvx+B4IggPtuHKq3YB X-Received: by 2002:a17:907:5090:: with SMTP id fv16mr18512773ejc.90.1617106000520; Tue, 30 Mar 2021 05:06:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617106000; cv=none; d=google.com; s=arc-20160816; b=qe6AwezrUl4We72QvnRRsIqxU6c2NYTmB1x5/VVvGQL4aGG+cUa93X68cDODZCSH4D FrGVgllrg+r0/8SeFZu2VVvv3IzADD6FsWC+04WKA8NrdEzkM7/Clx1XYhKnsZPUMEiA 0PLU+iT6j0RONKcAYT+s+adXGn818x57XMyFjjoq5I9JSYDbGpjNgzAI7eoSSp9aBNRd hhRfvJ2x1p55HXMGvOj90e3gnBgGROjdPDxmNuyp90i7GCt3aqjsmq3AAKnV+v5vY+j4 1PgOUYF+WXWj0Ev6r1Hn07MpRnVwRUwwfbKVKRAheja4NpTUZw/wtEZ61F383o+eez4/ 4EIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=9n4noZb5E2aEglOTZKnCbCPj/j/5SXMSR6/nQrd4Z1Y=; b=Fvogf2OVeif2YhzRVgNz3//vmRFUe/vJ/APF1rK5roHXB46+5Xjh3awfcehE/YGqIS flR4Mg4n31gKkj37kRGkUh3dOB4YFvubSnSWKR0UkaVdjCsHNFLDClhopa8yrZUq7ycK fTZ7CTUY0TLYOMrttyD2PHsScCVckuTrJY4t0eiGqqnfdYbPQcaBwt0tMLjZ5Q6XB8xC S+qAN3kCy81h1Eps59mJ7S8ys1xNxfwgRrwfxaRgswh9RLaPNU78sq3olN2OWEQU/4Tq T1sager8jf0mkp4Nrbptl0u4njsbDRHVJ2h6Gaw5kE2/tEWY2cwZJXooLKg1jLbSL0G4 MVeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mengyan1223.wang header.s=mail header.b=kxJCRiya; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mengyan1223.wang Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u19si15863613edo.583.2021.03.30.05.06.17; Tue, 30 Mar 2021 05:06:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mengyan1223.wang header.s=mail header.b=kxJCRiya; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mengyan1223.wang Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232024AbhC3MFQ (ORCPT + 99 others); Tue, 30 Mar 2021 08:05:16 -0400 Received: from mengyan1223.wang ([89.208.246.23]:59640 "EHLO mengyan1223.wang" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231977AbhC3MEz (ORCPT ); Tue, 30 Mar 2021 08:04:55 -0400 Received: from [IPv6:240e:35a:1037:8a00:70b2:e35d:833c:af3e] (unknown [IPv6:240e:35a:1037:8a00:70b2:e35d:833c:af3e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@mengyan1223.wang) by mengyan1223.wang (Postfix) with ESMTPSA id 221F36594D; Tue, 30 Mar 2021 08:04:40 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mengyan1223.wang; s=mail; t=1617105894; bh=9n4noZb5E2aEglOTZKnCbCPj/j/5SXMSR6/nQrd4Z1Y=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=kxJCRiyaZuIykRdtEdnLeNCB1VYo85ccJt3fleCbfgrj3r1qyGVrwAZUVJU3uF4kI i2iXYTcWqsr3T9rnfMMri2Y9+l/4fwm39xu9PFgdZMh25sqJY+J5RRAGRRPMeQSJiq DxbKp7oM/F44iT7w/Y+z7ZvV+2CvBu3whU2JmmsQ0oneTHcOmyv+KCH1046dUZ8vdI PIYzQ5rPlInEZLchxh0a6QnGkiQ86o1QIbbEWB6UVGdnuQEx79v8JDWtPT31SMob5Y eDRbFnIiSC1UXko1ZhyEc5kwmZqMr/2PJenlzN1xhgcigd87MpfgzPlYhjhhKZ7Ppm Sy8Tfjjy6aBsQ== Message-ID: <71e3905a5b72c5b97df837041b19175540ebb023.camel@mengyan1223.wang> Subject: Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems From: Xi Ruoyao To: Christian =?ISO-8859-1?Q?K=F6nig?= , Alex Deucher , Christian =?ISO-8859-1?Q?K=F6nig?= Cc: David Airlie , Felix Kuehling , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Dan =?ISO-8859-1?Q?Hor=E1k?= , amd-gfx@lists.freedesktop.org, Daniel Vetter , stable@vger.kernel.org Date: Tue, 30 Mar 2021 20:04:32 +0800 In-Reply-To: <368b9b1b7343e35b446bb1028ccf0ae75dc2adc4.camel@mengyan1223.wang> References: <20210329175348.26859-1-xry111@mengyan1223.wang> <9a11c873-a362-b5d1-6d9c-e937034e267d@gmail.com> <84b3911173ad6beb246ba0a77f93d888ee6b393e.camel@mengyan1223.wang> <97c520ce107aa4d5fd96e2c380c8acdb63d45c37.camel@mengyan1223.wang> <7701fb71-9243-2d90-e1e1-d347a53b7d77@gmail.com> <368b9b1b7343e35b446bb1028ccf0ae75dc2adc4.camel@mengyan1223.wang> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > On 2021-03-29 21:36 +0200, Christian König wrote: > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > Hi Christian, > > > > > > I don't think there is any constraint implemented to ensure `num_entries % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`: > > > > > >          /* validate the parameters */ > > >          if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > > > > > > >              size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > >                  return -EINVAL; > > > > > > /* snip */ > > > > > >          saddr /= AMDGPU_GPU_PAGE_SIZE; > > >          eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > /* snip */ > > > > > >          mapping->start = saddr; > > >          mapping->last = eaddr; > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > "AMDGPU_GPU_PAGE_MASK" > > > in "validate the parameters" with "PAGE_MASK". It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of "AMDGPU_GPU_PAGE_MASK" :(. > > Yeah, good point. > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > this > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > basis while the native page size of the CPU is larger. Let me double > > check the code. On my platform, there are two issues both causing the VM corruption. One is fixed by agd5f/linux@fe001e7. Another is in Mesa from userspace: it uses `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver expects it to use `dev_info->virtual_address_alignment`. If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue and make virtual_address_alignment = 4K. Otherwise, we should fortify the parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the userspace will just get an EINVAL, instead of a slient VM corruption. And someone should tell Mesa developers to fix the code in this case. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University