Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3198172ybi; Tue, 2 Jul 2019 03:47:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqz0PumEi1MmJlP+ptNXdi7dnvjFSrCJoig6U9dpsjXV0hZDXQcXjyDqIntVhIvJu+ZYc/55 X-Received: by 2002:a17:90a:b00b:: with SMTP id x11mr4925990pjq.120.1562064423644; Tue, 02 Jul 2019 03:47:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562064423; cv=none; d=google.com; s=arc-20160816; b=qH6o1KYibyPcFBJZxwYtH6FT13tUBBWYH0Wx1VohBubrP+cFC3h7GuWPHqu98XuWP8 Xl3qfu86ZxxAFSdEfMbVZOo9MuGB8wkdhRF2DPJHk3PIUBf99HasO1FPbsYi16h52YNe ahBvYYKA0uM97y+Ao2VnUeaYSYWGZR/21D70HtKIqklauKTpS9vfAdz2rihvMi2EHftf 84uTHxCuEbyXvMLsKJtZq3TMBx6Bf4rbE4KYc0z+KUoAC4IdNgXQObR3uAnvX4EhR2tl 1uLjOSMjhv94L77Ypp/F9o8ynqv6VDpGVtdZCBdYasrga1SoUjwGW595tIMmvux+uy9J KIlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=THN3+PpJjdx5mj3Aw9wlxfnV2z8ifWOFkRXpst3SCk8=; b=CN8AK+NZ0T8nN+hlOuS4+mc52PXRzYEePE+NgisMqT3iiy4I9KKCvrXKktHAyclWly tGMourWWUthbviEyFgWP/irm0N6TnUG2MuYUqvLDv9fzbBHXmlJt6NuptPGOLxbarSub tHhNTlWy20Rsog24xMIdWgp49UiymYA/KdI7CvdvzMwnUqLQnO0T7RYG1beyJaHWOOsb wf1DHeKjEsyDahm73/yo85mKYub17r6uFg+ZsA+Kdf/kwk4+il2rOe5kS6kMYKnihses lLTyDHVjfmL98OzAKr9KFYkDILjQ/1WQcTQOsnET2GkCHGDk8FYmCoJYQvtxJ/z+ilFw I6Mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lmOIvgqY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z10si11564378pgh.30.2019.07.02.03.46.48; Tue, 02 Jul 2019 03:47:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lmOIvgqY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727048AbfGBKqK (ORCPT + 99 others); Tue, 2 Jul 2019 06:46:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:59184 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726291AbfGBKqK (ORCPT ); Tue, 2 Jul 2019 06:46:10 -0400 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A49912089C; Tue, 2 Jul 2019 10:46:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1562064369; bh=dOb4MTdKAgRzGQ+yE9sL+9KKQb2/t5teCLbYFvKOmYg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lmOIvgqYnhG6gcE4s/cT6vnAWChkKch0CdTkyDJV0+BS7qniUrgc23Phe5RJbw6LA WhbJFIEyA/+cY8ckrjvDHQ2okIpABjtyaPYorAnbvD8RfKiWkEWOHxbebbCC1LUVS4 SgcDDwf0jt386rr72W5xAUO/Cy5BS4FWTKNGlAYY= Date: Tue, 2 Jul 2019 11:46:04 +0100 From: Will Deacon To: Vineet Gupta Cc: Peter Zijlstra , Will Deacon , "Paul E. McKenney" , arcml , lkml , "linux-arch@vger.kernel.org" Subject: Re: single copy atomicity for double load/stores on 32-bit systems Message-ID: <20190702104603.g3qssgfhfhvryhnu@willie-the-truck> References: <2fd3a455-6267-5d21-c530-41964a4f6ce9@synopsys.com> <20190531082112.GH2623@hirez.programming.kicks-ass.net> <73510bc7-8386-746c-ed1e-422fb5adaec5@synopsys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <73510bc7-8386-746c-ed1e-422fb5adaec5@synopsys.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 01, 2019 at 08:05:51PM +0000, Vineet Gupta wrote: > On 5/31/19 1:21 AM, Peter Zijlstra wrote: > > On Thu, May 30, 2019 at 11:22:42AM -0700, Vineet Gupta wrote: > >> Had an interesting lunch time discussion with our hardware architects pertinent to > >> "minimal guarantees expected of a CPU" section of memory-barriers.txt > >> > >> > >> | (*) These guarantees apply only to properly aligned and sized scalar > >> | variables. "Properly sized" currently means variables that are > >> | the same size as "char", "short", "int" and "long". "Properly > >> | aligned" means the natural alignment, thus no constraints for > >> | "char", two-byte alignment for "short", four-byte alignment for > >> | "int", and either four-byte or eight-byte alignment for "long", > >> | on 32-bit and 64-bit systems, respectively. > >> > >> > >> I'm not sure how to interpret "natural alignment" for the case of double > >> load/stores on 32-bit systems where the hardware and ABI allow for 4 byte > >> alignment (ARCv2 LDD/STD, ARM LDRD/STRD ....) > > > > Natural alignment: !((uintptr_t)ptr % sizeof(*ptr)) > > > > For any u64 type, that would give 8 byte alignment. the problem > > otherwise being that your data spans two lines/pages etc.. > > > >> I presume (and the question) that lkmm doesn't expect such 8 byte load/stores to > >> be atomic unless 8-byte aligned > >> > >> ARMv7 arch ref manual seems to confirm this. Quoting > >> > >> | LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, VLDR, > >> | VSTM, and VSTR instructions are executed as a sequence of word-aligned word > >> | accesses. Each 32-bit word access is guaranteed to be single-copy atomic. A > >> | subsequence of two or more word accesses from the sequence might not exhibit > >> | single-copy atomicity > >> > >> While it seems reasonable form hardware pov to not implement such atomicity by > >> default it seems there's an additional burden on application writers. They could > >> be happily using a lockless algorithm with just a shared flag between 2 threads > >> w/o need for any explicit synchronization. > > > > If you're that careless with lockless code, you deserve all the pain you > > get. > > > >> But upgrade to a new compiler which > >> aggressively "packs" struct rendering long long 32-bit aligned (vs. 64-bit before) > >> causing the code to suddenly stop working. Is the onus on them to declare such > >> memory as c11 atomic or some such. > > > > When a programmer wants guarantees they already need to know wth they're > > doing. > > > > And I'll stand by my earlier conviction that any architecture that has a > > native u64 (be it a 64bit arch or a 32bit with double-width > > instructions) but has an ABI that allows u32 alignment on them is daft. > > So I agree with Paul's assertion that it is strange for 8-byte type being 4-byte > aligned on a 64-bit system, but is it totally broken even if the ISA of the said > 64-bit arch allows LD/ST to be augmented with acq/rel respectively. > > Say the ISA guarantees single-copy atomicity for aligned cases (i.e. for 8-byte > data only if it is naturally aligned) and in lack thereof programmer needs to use > the proper acq/release Apologies if I'm missing some context here, but it's not clear to me why the use of acquire/release instructions has anything to do with single-copy atomicity of unaligned accesses. The ordering they provide doesn't necessarily prevent tearing, although a CPU architecture could obviously provide that guarantee if it wanted to. Generally though, I wouldn't expect the two to go hand-in-hand like you're suggesting. Will