Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp1504570rdb; Sat, 20 Jan 2024 01:36:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IFJ1B3Z4AGDZYc3y/hl22UKRhyUob6I0Fu6Qmi120B/ur7UAtAhg7Z5/v5aWcwH5+4T3ANo X-Received: by 2002:a17:90a:6e06:b0:28e:44e7:fad6 with SMTP id b6-20020a17090a6e0600b0028e44e7fad6mr480024pjk.90.1705743385689; Sat, 20 Jan 2024 01:36:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705743385; cv=pass; d=google.com; s=arc-20160816; b=m83sPjXfFJwQ2awUEHFc+qrWHQMBMLPwTt7JaaBnJ39Vln4l142hl5jFn+bHXIMg0T vasLelLcyhC0CQgNSf8GmCpR4WxXwtTDhoi7+m4hA8uuG2QhifQsilbqHaA47tWnk1cK TifSqXlSx/ZI406H3/uznlv9DMxBumDoYpjKLiV9NftA9xnShZpuq1YAdIDFHvtDzR+E m7O2RaIpYghN3J698ASn5Jy206kqTKZoN4qqxDHJvsopyyLkAGTNQEs0bBbX4Wum+lfd aGDT7grRItwSARaRCOo/LhfABiRHZXv58SLaZsIvm0rF/n/aiO6XmHC7M2kJhLzsuVpC wLTA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=xyoV1t8aKEfAQW2cATwsxg7a8BFBl4JU4b4mWpoQqG4=; fh=SAoDr7NMcDuydgqiihWzBu1WRl+PWWrzQF2rBSwK2TM=; b=B8A747NKcotcg8ewKBxYjJTmc6Wfwk2Rha9VNnN8lrzStBsYHR1unsVImm/dfHa29s AzZMZEIeBQI2CV7zhYr0bFHPZ+KWTpxqY92YIO6qvdSyNcFQrushZ3zCJD7Hq3PV9QhT 84AOWwlCZRBTMwN9dqMsHTf0SmI4OkxZ6/KJ5HNiK58U63r1T/aGcX3p1+peWuQax1aK Nr6FWHemKlykX0UHrhXFlZ+tEaXLXtWpY3bLNI4/mw+Mhm7VT5cULL6sMt+nWkOqkfFg FoRQsivakyJrmTmE1EZ8Bw4u4PtL7O8O3mcDn88YEpCvEeCEapCbBIvL2zbmQAUwxdGv wGTg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@qq.com header.s=s201512 header.b="tZr/TgsK"; arc=pass (i=1 dkim=pass dkdomain=qq.com); spf=pass (google.com: domain of linux-kernel+bounces-31729-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31729-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id a2-20020a17090ad80200b0029058c97da7si2012629pjv.140.2024.01.20.01.36.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Jan 2024 01:36:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31729-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@qq.com header.s=s201512 header.b="tZr/TgsK"; arc=pass (i=1 dkim=pass dkdomain=qq.com); spf=pass (google.com: domain of linux-kernel+bounces-31729-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31729-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B57EA285109 for ; Sat, 20 Jan 2024 07:10:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7CB532101; Sat, 20 Jan 2024 07:10:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="tZr/TgsK" Received: from out203-205-221-191.mail.qq.com (out203-205-221-191.mail.qq.com [203.205.221.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F36C61FBC; Sat, 20 Jan 2024 07:10:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.191 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705734612; cv=none; b=ijZTHBS/hUEA+Mod9gvbaMv0eeGfgiBtKLr/wn5zEJhxgTvWu/lxxb8smINOY/0XFayy1iDnr5QgDP4ahJ7HdG7TrFH8QDdJhCASVe1agLdm+qvDw3/W1snzSwT3qqYB2dSfR49yEYPyAri3QBYEczc+K/g0JeRb6326H48Q+WM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705734612; c=relaxed/simple; bh=DwXSmU/cQfccTNeaCLP8QOcPkdv1xaPdbaF8ErxT5PA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=rFgEq/dW89OpQmIQigyJ1kcr/3m7Fg/LclkuMTe/qr8GtCarVt/2I7Eu10LO/Er8B37pCtZLlPMypnBLagnK1aXaZUTxb0sgd9LB/HsF0ar8ldsolbS31IpQgV8k/uj4h0pUq9N2iw+70jKL2sXB1hhL8D56DxY7IAWPaiEI5+Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name; spf=none smtp.mailfrom=cyyself.name; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=tZr/TgsK; arc=none smtp.client-ip=203.205.221.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=cyyself.name Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=cyyself.name DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1705734594; bh=xyoV1t8aKEfAQW2cATwsxg7a8BFBl4JU4b4mWpoQqG4=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=tZr/TgsKCpgkOWNFSioi0h4IJG5x+EVyhhIYoMnAB2UKCPsEEMw7FPu/7vezFQBV6 QWSLOKVUphNz6jcdwbEj/l71C4K4POomEjYAG8Z6K0j/Cxv/yRKNtmeUUrykPfFOl0 ilBsPZDxq51L84SeKwaMDiAiE2+klBGGQjwi9i5Q= Received: from [IPV6:240e:379:227b:4f00:cd09:8a0a:600e:c38] ([240e:379:227b:4f00:cd09:8a0a:600e:c38]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 273AB8E0; Sat, 20 Jan 2024 15:09:51 +0800 X-QQ-mid: xmsmtpt1705734591tujaje9qv Message-ID: X-QQ-XMAILINFO: NiDupExshEc7/Ce0oCF0ITlo7eRWhrzB6nlMTHy+ibkGPS2vJZZqmzOkIfUhHP lqtmyLmYNWFPhG49RlXuuNrqcch58e7pYQOhLngwkBR+C3HgBzJmj7KjnS3g+Nm7wDty6i03qVYE OGwBF9AClUMjH0K3WM7BRRpI/VZBsqNygbF5/BOfAzarng5X6E9I1IVaxQ8f0atYGkeKel+vxl2Z ly4AxuWjWvYm29z5PAh1biE5dkElDjB0qxwZYrAau3q39p+vPCpQj17OEpU88UuhOIt7o1AIOm5G Ofr5DqHWB8C3n1uk/5WTmohFbGNXo7sUzW/mBOLoj4yYHFvOVl5mMLpaPptiaLYfMUVUGV/Q5uhg NrqmDPzIcbFTxUvN/Dn+l7dfk6mFJ/SrqiEsKe2HlW7V4IOrSDDuuB0tGlgqdrhFEhX73p+wGIr8 LdCocRkul0O7MkYWK40DQ4wJBRL3J8nMMbD4Ua67Vw57KudTKLfD3tvDmMqMeTGO2jDVSCfuKYg3 TWNX1VJCV4tVJtz8eIhutNPubaADxmigGZHnHFnB5cWVyuZ4TCp6d3oq/DrZECDdeGep4N2x+aJc scnVAjHSMUi5rQAESklo6olqPGg578fA9GqvDy7H58yrEDBOPdxPC/d0144mEKxpDWln3N6yiZdF pLZX01AAxkU8VwMAlXSQ1DuNG2EXyoFEi0hCJq0B872G5OkeVmGqSOs6ltp9BzJXD21HI3hJBbHS R514suSvcnwgUENql1sbp6M5y7tpHRU61YXGiDhSI4HmSCMXx1LCKFcp6bW2dl45OwULQhS+uYZl nFXaCDdvnwtk9s1f9f9k8N5hpSHhEEZu90MUbp0tWcKDvZpuixysCq/8+Rmjb1DUpV5zsPEldVPn IihwtZcYWBBLs5WIjq9fSbW8dtiL7J6Shwj44gWe0LFkLYPA7Beu+vQVGNl1QMVcgDmAHgLEg/Sl uEujmMui7BX3CXxUIwb1HdfRQzXm2kpSRpmOv6Lp1kUUQjxW+e+w== X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= X-OQ-MSGID: <5b25a10f-b3b1-47ae-b8af-34689d30d908@cyyself.name> Date: Sat, 20 Jan 2024 15:09:51 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v10 0/4] RISC-V: mm: Make SV48 the default address space Content-Language: en-US To: Charlie Jenkins Cc: alexghiti@rivosinc.com, anup@brainfault.org, aou@eecs.berkeley.edu, conor@kernel.org, jrtc27@jrtc27.com, konstantin@linuxfoundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, mick@ics.forth.gr, palmer@rivosinc.com, paul.walmsley@sifive.com, rdunlap@infradead.org References: <20230809232218.849726-1-charlie@rivosinc.com> From: Yangyu Chen In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/20/24 14:49, Charlie Jenkins wrote: > On Sat, Jan 20, 2024 at 02:13:14PM +0800, Yangyu Chen wrote: >> Thanks for your reply. >> >> On 1/20/24 09:34, Charlie Jenkins wrote: >>> On Sun, Jan 14, 2024 at 01:26:57AM +0800, Yangyu Chen wrote: >>>> Hi, Charlie >>>> >>>> Although this patchset has been merged I still have some questions about >>>> this patchset. Because it breaks regular mmap if address >= 38 bits on >>>> sv48 / sv57 capable systems like qemu. For example, If a userspace program >>>> wants to mmap an anonymous page to addr=(1<<45) on an sv48 capable system, >>>> it will fail and kernel will mmaped to another sv39 address since it does >>> >>> Thank you for raising this concern. To make sure I am understanding >>> correctly, you are passing a hint address of (1<<45) and expecting mmap >>> to return 1<<45 and if it returns a different address you are describing >>> mmap as failing? If you want an address that is in the sv48 space you >>> can pass in an address that is greater than 1<<47. >>> >>>> not meet the requirement to use sv48 as you wrote: >>>> >>>>> else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \ >>>>> mmap_end = VA_USER_SV48; \ >>>>> else \ >>>>> mmap_end = VA_USER_SV39; \ >>>> >>>> Then, How can a userspace program create a mmap with a hint if the address >>>>> = (1<<38) after your patch without MAP_FIXED? The only way to do this is >>>> to pass a hint >= (1<<47) on mmap syscall then kernel will return a random >>>> address in sv48 address space but the hint address gets lost. I think this >>> >>> In order to force mmap to return the address provided you must use >>> MAP_FIXED. Otherwise, the address is a "hint" and has no guarantees. The >>> hint address on riscv is used to mean "don't give me an address that >>> uses more bits than this". This behavior is not unique to riscv, arm64 >>> and powerpc use a similar scheme. In arch/arm64/include/asm/processor.h >>> there is the following code: >>> >>> #define arch_get_mmap_base(addr, base) ((addr > DEFAULT_MAP_WINDOW) ? \ >>> base + TASK_SIZE - DEFAULT_MAP_WINDOW :\ >>> base) >>> >>> arm64/powerpc are only concerned with a single boundary so the code is simpler. >>> >> >> As you say, this code in arm64/powerpc will not meet the issue I address. >> For example, If the addr here is (1<<50) on arm64, the arch_get_mmap_base >> will return base+TASK_SIZE-DEFAULT_MAP_WINDOW which is (1<> And this behavior on arm64/powerpc/x86 does not break anything since we will >> use a larger address space if the hint address is specified on the address > >> DEFAULT_MAP_WINDOW. The corresponding behavior on RISC-V should be if the >> hint address > BIT(47) then use Sv57 address space and use Sv48 when the >> hint address > BIT(38) if we want Sv39 by default. >> >> However, your patch needs the address >= BIT(47) rather than BIT(38) to use >> Sv48 and address >= BIT(56) to use Sv57, thus breaking existing userspace >> software to create mapping on the hint address without MAP_FIXED set. > > Code that needs mmap to provide a specific address must use MAP_FIXED. > On riscv, it was decided that the address returned from mmap cannot be > greater than the hint address. This is currently implemented by using > the largest address space that can fit into the hint address. It may be > possible that this range can be extended to use all of the addresses > that are less than or equal to the hint address. > So this decision might be wrong. It requires some userspace software to modify their mmap flags to fit with this. For example, a binary translate JIT compiler already probes this platform is capable with Sv48, then want to create mapping on some address specified on the mmap hint to align with foreign binary native address but also provide a fallback path with performance overhead. Your patch here will always let userspace software use a fallback path with performance overhead until the userspace software changes its syscall to use MAP_FIXED. But it is not required in x86, arm64, powerpc. > From reading the code even on arm64 if you pass an address that is > greater than DEFAULT_MAP_WINDOW it is not guaranteed that mmap will > return an address that is greater than DEFAULT_MAP_WINDOW. It may still > be provide an address that is less than DEFAULT_MAP_WINDOW if it fails > to find an address above. This seems like this would also break your use > case. > Yeah. As I said before, this patch will always let userspace software use a fallback path and this only happens in RISC-V. Make default sv48 is right, but RISC-V implementation for this and changing the hint address behavior might be wrong. And x86, arm64, powerpc already use 48-bit address space by default but do not change the meaning of hint address on mmap. >> >>>> violate the principle of mmap syscall as kernel should take the hint and >>>> attempt to create the mapping there. >>> >>> Although the man page for mmap does say "on Linux, the kernel will pick >>> a nearby page boundary" it is still a hint address so there is no strict >>> requirement (and the precedent has already been set by arm64/powerpc). >>> >> >> Yeah. There is no strict requirement. But currently x86/arm64/powerpc works >> in this situation well. The hint address on these ISAs is not used as the >> upper bound to allocating the address. However, on RISC-V, you treat this as >> the upper bound. >> >>>> >>>> I don't think patching in this way is right. However, if we only revert >>>> this patch, some programs relying on mmap to return address with effective >>>> bits <= 48 will still be an issue and it might expand to other ISAs if >>>> they implement larger virtual address space like RISC-V sv57. A better way >>>> to solve this might be adding a MAP_48BIT flag to mmap like MAP_32BIT has >>>> been introduced for decades. >>>> >>>> Thanks, >>>> Yangyu Chen >>>> >>> >>> - Charlie >>> >>