Feedback-ID: iad51458e:Fastmail
Date:   Tue, 31 Jan 2023 14:03:32 -0800
From:   Boqun Feng <boqun.feng@gmail.com>
To:     Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc:     Peter Zijlstra <peterz@infradead.org>,
        Jules Maselbas <jmaselbas@kalray.eu>,
        Will Deacon <will@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Arnd Bergmann <arnd@arndb.de>, linux-arch@vger.kernel.org,
        linux-kernel@vger.kernel.org,
        Alan Stern <stern@rowland.harvard.edu>,
        Andrea Parri <parri.andrea@gmail.com>,
        Nicholas Piggin <npiggin@gmail.com>,
        David Howells <dhowells@redhat.com>,
        Jade Alglave <j.alglave@ucl.ac.uk>,
        Luc Maranget <luc.maranget@inria.fr>,
        "Paul E. McKenney" <paulmck@kernel.org>,
        Akira Yokosawa <akiyks@gmail.com>,
        Daniel Lustig <dlustig@nvidia.com>,
        Joel Fernandes <joel@joelfernandes.org>,
        Hernan Ponce de Leon <hernan.poncedeleon@huaweicloud.com>,
        Paul =?iso-8859-1?Q?Heidekr=FCger?= <paul.heidekrueger@in.tum.de>,
        Marco Elver <elver@google.com>,
        Miguel Ojeda <ojeda@kernel.org>,
        Alex Gaynor <alex.gaynor@gmail.com>,
        Wedson Almeida Filho <wedsonaf@gmail.com>,
        Gary Guo <gary@garyguo.net>,
        =?iso-8859-1?Q?Bj=F6rn?= Roy Baron <bjorn3_gh@protonmail.com>
Subject: Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in
 generic atomic ops
Message-ID: <Y9mQNNhzkOF/+uuC@boqun-archlinux>
References: <20230126173354.13250-1-jmaselbas@kalray.eu>
 <Y9Oy9ZAj/DQ7O+6e@hirez.programming.kicks-ass.net>
 <20230127134946.GJ5952@tellis.lin.mbt.kalray.eu>
 <Y9Pg+aNM9f48SY5Z@hirez.programming.kicks-ass.net>
 <Y9RLpYGmzW1KPksE@boqun-archlinux>
 <2f4717b3-268f-8db3-e380-4af0a5479901@huaweicloud.com>
 <Y9gOjzGaWy2hIAmu@boqun-archlinux>
 <2f121e2d-8e4c-de99-5672-93350fbb52af@huaweicloud.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2f121e2d-8e4c-de99-5672-93350fbb52af@huaweicloud.com>
Precedence: bulk

On Tue, Jan 31, 2023 at 04:08:29PM +0100, Jonas Oberhauser wrote:
> 
> 
> On 1/30/2023 7:38 PM, Boqun Feng wrote:
> > On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
> > > 
> > > On 1/27/2023 11:09 PM, Boqun Feng wrote:
> > > > On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
> > > > > > I also noticed that GCC has some builtin/extension to do such things,
> > > > > > __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> > > > > > can be used in the kernel.
> > > > > On a per-architecture basis only, the C/C++ memory model does not match
> > > > > the Linux Kernel memory model so using the compiler to generate the
> > > > > atomic ops is somewhat tricky and needs architecture audits.
> > > > Hijack this thread a little bit, but while we are at it, do you think it
> > > > makes sense that we have a config option that allows archs to
> > > > implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
> > > > between two memory models, but the option is only for fallback/generic
> > > > implementation so we can put extra barriers/orderings to make things
> > > > guaranteed to work.
> > > Another is that the C11 model is more about atomic locations than atomic
> > > accesses, and there are several places in the kernel where a location is
> > > accessed both atomically and non-atomically. This API mismatch is more
> > > severe than the semantic differences in my opinion, since you don't have
> > > guarantees of what the layout of atomics is going to be.
> > > 
> > True, but the same problem for our asm implemented atomics, right? My
> > plan is to do (volatile atomic_int *) casts on these locations.
> 
> Do you? I think LKMM atomic types are always exactly as big as the
> underlying types.
> With C you might get into a case where the atomic_int is actually [lock ;
> non-atomic int] and when you access the location in a mixed way, you will
> non-atomically read the lock part but the atomic accesses modify the int
> part (protected by the lock).
> 

Well, I plan is to check ATOMIC_*_LOCK_FREE and have Static_assert() on
the side of atomic_int ;-)

> > 
> > > Perhaps you could instead rely on the compiler builtins? Note that this may
> > These are less formal/defined to me, and not much research on them I
> > assume, I'd rather not use them.
> 
> I think that it's easy enough to define a formal model of these that is a
> bit conservative, and then just add mb()s to make them safe.
> 

I actually have a similar idea about the communication betweet Rust and
kernel C with atomic variables: Rust can use its standard atomics (i.e.
C11/LLVM atomics) but with mb()s to make them safe when talking with C.

Of course, no problem about pure Rust code using pure LLVM atomics.

> > 
> > > invalidate some progress properties, e.g., ticket locks become unfair if the
> > > increment (for taking a ticket) is implemented with a CAS loop (because a
> > > thread can fail forever to get a ticket if the ticket counter is contended,
> > > and thus starve). There may be some linux atomics that don't map to any
> > > compiler builtins and need to implemented with such CAS loops, potentially
> > > leading to such problems.
> > > 
> > > I'm also curious whether link time optimization can resolve the inlining
> > > issue?
> > > 
> > For Rust case, cross-language LTO is needed I think, and last time I
> > tried, it didn't work.
> 
> In German we say "Was noch nicht ist kann ja noch werden", translated as
> "what isn't can yet become", I don't feel like putting too much effort into

Not too much compared to wrapping LKMM atomics with Rust using FFI,

Using FFI:

	impl Atomic {
		fn read_acquire(&self) -> i32 {
			// SAFTEY:
			unsafe { atomic_read_acquire(self as _) }
		}
	}

Using standard atomics:

	impl Atomic {
		fn read_acquire(&self) -> i32 {
			// self.0 is a Rust AtomicI32
			compiler_fence(SeqCst); // Rust not support volatile atomic yet
			self.0.load(Acquire)
		}
	}

Needless to say, if we really need LKMM atomics in Rust, it's kinda my
job to implement these, so not much different for me ;-) Of course, any
help is appreciate!

> something that hardly affects performance and will hopefully become obsolete
> at some point in the near future.
> 
> > 
> > > I think another big question for me is to which extent it makes sense
> > > anyways to have shared memory concurrency between the Rust code and the C
> > > code. It seems all the bad concurrency stuff from the C world would flow
> > > into the Rust world, right?
> > What do you mean by "bad" ;-) ;-) ;-)
> 
> Uh oh. Let's pretend I didn't say anything :D
> 
> > > If you can live without shared Rust & C concurrency, then perhaps you can
> > > get away without using LKMM in Rust at all, and just rely on its (C11-like)
> > > memory model internally and talk to the C code through synchronous, safer
> > > ways.
> > > 
> > First I don't think I can avoid using LKMM in Rust, besides the
> > communication from two sides, what if kernel developers just want to
> > use the memory model they learn and understand (i.e. LKMM) in a new Rust
> > driver?
> 
> I'd rather people think 10 times before relying on atomics to write Rust
> code.
> There may be cases where it can't be avoided because of performance reasons,
> but Rust has a much more convenient concurrency model to offer than atomics.
> I think a lot more people understand Rust mutexes or channels compared to
> atomics.

C also has more convenient concurrency tools in kernel, and I'm happy
that people use them. But there are also people (including me) working
on building these tools/models, inevitably we need to use atomics. And
when we use the atomics, the biggest question is which one to use?
Right, it seems that I'm responsible for the answer because I have
multiple hats on. Trust me, it's not something I like to think about but
I have to ;-)

> Unfortunately I haven't written much driver code so I don't have experience
> to what extent it's generally necessary to rely on atomics :(
> 
> 

[snip some good tech inputs, I will reply later]

> 
> > > But I currently don't see that this implementation would be noticeably
> > > faster than paying the overhead of lack of inline.
> > > 
> > You are not wrong, surely we will need to real benchmark to know. But my
> > rationale is 1) in theory this is faster, 2) we also get a chance to try
> > out code based on LKMM with C11 atomics to see where it hurts. Therefore
> > I asked ;-)
> 
> I think one beautiful thing about open source is that nobody can stop you
> from trying it out yourself :D

I wish this is a personal side project that I can give up whenever I
want ;-)

The key question is that when building something in Rust for Linux using
atomics, and also talking with C side with atomics, how do we handle the
different between two memory models? A few options I can think of:

1.	Use Rust standard atomics and pretend different memory models
	work together (do we have model tools to handle code in
	different models communicating with each other?)

2.	Use Rust standard atomics and add extra mb()s to enforce more
	ordering guarantee.

3.	Implement LKMM atomics in Rust and use them with caution when
	comes to implicit ordering guarantees such as ppo. In fact lots
	of implicit ordering guarantees are available since the compiler
	won't exploit the potential reordering to "optimize", we also
	kinda have tools to check:

		https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf

	A good part of using Rust is that we may try out a few tricks
	(with proc-macro, compiler plugs, etc) to express some ordering
	expection, e.g. control dependencies.

	Two suboptions are:

	3.1	Implement LKMM atomics in Rust with FFI
	3.2	Implement LKMM atomics in Rust with Rust standard
		atomics

I'm happy to figure out pros and cons behind each option.

Regards,
Boqun

> It's currently not something I would personally put resources into though.
> You could pick a DPDK or CK algorithm and port it to a version using FFI
> instead of using atomics directly, and measure the impact on some
> microbenchmarks.
> Then consider that the end-to-end impact on Linux will probably be at least
> one or two orders of magnitude less.
> 
> Have fun, jonas
>