Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1094954lqo; Fri, 17 May 2024 10:25:57 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUf2LKcLlGy0X+NSKlm3d68vPbO3VcRctNWJMXuOOfAgrnHFcydyvASGPtB5Jl0UqD3417rflfW8fpCjLcSciwTewMEaKFaIxUWKd3c5g== X-Google-Smtp-Source: AGHT+IHW/pk/X5vSApJZwxeMmbFYiRkqPjeHHnETp8ZmbMPJsuoWVmHkaKu49BZGs0UQdn38FnVe X-Received: by 2002:a05:6a21:3992:b0:1b0:2f28:ade5 with SMTP id adf61e73a8af0-1b02f28af68mr5439791637.15.1715966756867; Fri, 17 May 2024 10:25:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715966756; cv=pass; d=google.com; s=arc-20160816; b=I86FFeZ2qtqb3J7OdGeM8vDG2Ce5kWpqda04XqELuk7CgBZvREAoyWIaAkio5ixWx6 JNoHT1S5KRK+2GNPL8ih3D1JKG5qu1TmkKuybPQ+eU8VhVPCN3f8uFhs7IzT7YRr2ZmS LXujJWVa1jF5MAE47f+QiJPkEJSRDnJDnuylRCmCv91MEZAQA9ANCxnaLehNuG59HhpX bO273/1xB90xZ6Wo4jcINdDxiMvo0h7ZopwuCbZoXA4znVUenVwZv4SMzZPjqX/vLF4r re5j7w/Ni+fuB/Ky0VrZsGGJ13+jrYa4rSniK0RXMzzQu4eHboju0mfHfYCEgz86jR0R kebw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date; bh=XqrScOJw8Rl+wCHSSgooeOOsu0jKHNpyc8vt/0A3Sww=; fh=XvtqSOddXZ85/xZ9k59aVXw++aXZHHFuY6JfSKF4R8o=; b=wZJi7aPIifT2yI1SiNC10UHIWzynQlzO95DO0hnVOgNOE1HEBSbl7mo14XKsh1mwFi iw+tGykutJE2bBHAN54f+jiEO6p99QEEytCBsHSYz5VyjP5FbTwQ4t8zQf2w8bHle/eg v7H0oLvK441wt8okIsChdRmXVB+du3mHK52MTt+DhXVdusCFM6WqWt8n1z3GGehVz7QB TKIpDC/ch0m7hvi0sAZ+TCbPZImyKTmkFxRXhUGnV0o94kP+5i4gNCcUv5kDjq76776a IlI8VO9Hl3etjnYR8A8j5VkX84iGNmw5U5B1+UK+DJfBwRywJACISbPr4cApA7qux9QA KsiA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-182406-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182406-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6580f9e28d8si4589273a12.575.2024.05.17.10.25.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 10:25:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182406-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-182406-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182406-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DCF45B2167F for ; Fri, 17 May 2024 17:25:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6FFD013DDA0; Fri, 17 May 2024 17:25:47 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE77638DD6 for ; Fri, 17 May 2024 17:25:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715966747; cv=none; b=VYWdr6TjlqSB8A0lIFvE/C5sNNsc2yVAQpWf0xto0EfPfyBRZwWeqHM5v+EWED2Pigad+szXNZmlzgg3FfhBBt41OIUBR/aXqHbFDnDaAfjRpUKv3quG/PEmD8XedG5+Hr1oMh4t7QjpdBRdqek1x0CpJOkuus8kw0qg5YspSaM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715966747; c=relaxed/simple; bh=xTrgfjrD4g3Hlii3WVx0sS/D+vbmZjqWJkxgae/P9qs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=meb2vAnAuYGb2CnK5cDpL+4psQgTelG2ov0xY9XsPxRe0mYgPA6apdSLJJprThaItp76DRGbBVcYV/dzJePpNcmemsnqTw79aEPSH59Pk6XT8RR0fcit5vdGu5fuY0EnT5jvp6oikHZa1t7uhuDTpyNDZ9/N8kE10EoID1Q47dY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24867C2BD10; Fri, 17 May 2024 17:25:44 +0000 (UTC) Date: Fri, 17 May 2024 18:25:42 +0100 From: Catalin Marinas To: Yang Shi Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com, cl@gentwo.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions Message-ID: References: <20240507223558.3039562-1-yang@os.amperecomputing.com> <6066e0da-f00a-40fd-a5e2-d4d78786c227@os.amperecomputing.com> <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote: > On 5/14/24 3:39 AM, Catalin Marinas wrote: > > It would be good to understand why openjdk is doing this instead of a > > plain write. Is it because it may be racing with some other threads > > already using the heap? That would be a valid pattern. > > Yes, you are right. I think I quoted the JVM justification in earlier email, > anyway they said "permit use of memory concurrently with pretouch". Ah, sorry, I missed that. This seems like a valid reason. > > A point Will raised was on potential ABI changes introduced by this > > patch. The ESR_EL1 reported to user remains the same as per the hardware > > spec (read-only), so from a SIGSEGV we may have some slight behaviour > > changes: > > > > 1. PTE invalid: > > > > a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR == 0 in sigcontext with your patch. Without this > > patch, the PTE is mapped as PTE_RDONLY first and a subsequent > > fault will report SIGSEGV with ESR_EL1.WnR == 1. > > I think I can do something like the below conceptually: > > if is_el0_atomic_instr && !is_write_abort > ??? force_write = true > > if VM_READ && !VM_WRITE && force_write == true Nit: write implies read, so you only need to check !write. > ??? vm_flags = VM_READ > ??? mm_flags ~= FAULT_FLAG_WRITE > > Then we just fallback to read fault. The following write fault will trigger > SIGSEGV with consistent ABI. I think this should work. So instead of reporting the write fault directly in case of a read-only vma, we let the core code handle the read fault and first and we retry the atomic instruction. > > b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR == 0, so no change from current behaviour, unless we > > fix the patch for (1.a) to fake the WnR bit which would change the > > current expectations. > > > > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in > > hardware, no need to fix ESR_EL1 up. > > > > The patch would have to address (1) above but faking the ESR_EL1.WnR bit > > based on the vma flags looks a bit fragile. > > I think we don't need to fake the ESR_EL1.WnR bit with the fallback. I agree, with your approach above we don't need to fake WnR. > > Similarly, we have userfaultfd that reports the fault to user. I think > > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with > > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are > > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP > > and UFFD_PAGEFAULT_FLAG_WRITE set. > > I don't quite get what the problem is. IIUC, uffd just needs a signal from > kernel to tell this area will be written. It seems not break the semantic. > Added Peter Xu in this loop, who is the uffd developer. He may shed some > light. Not really familiar with uffd but just looking at the code, if a handler is registered for both MODE_MISSING and MODE_WP, currently the atomic instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the do_anonymous_page() path). If the page is mapped by the uffd handler as the zero page, a restart of the instruction would signal UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page() path). With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on the first attempt, just like having a STR instruction instead of separate LDR + STR (as the atomics behave from a fault perspective). However, I don't think that's a problem, the uffd handler should cope with an STR anyway, so it's not some unexpected combination of flags. -- Catalin