Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753896AbaARM0A (ORCPT ); Sat, 18 Jan 2014 07:26:00 -0500 Received: from e36.co.us.ibm.com ([32.97.110.154]:55286 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753802AbaARMZ6 (ORCPT ); Sat, 18 Jan 2014 07:25:58 -0500 Date: Sat, 18 Jan 2014 04:25:48 -0800 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Linus Torvalds , Matt Turner , Waiman Long , Linux Kernel , Ivan Kokshaysky , Daniel J Blueman , Richard Henderson Subject: Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Message-ID: <20140118122548.GX10038@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140114234443.GY10038@linux.vnet.ibm.com> <20140115023958.GA10038@linux.vnet.ibm.com> <20140115080753.GW31570@twins.programming.kicks-ass.net> <20140115205346.GF10038@linux.vnet.ibm.com> <20140115232134.GM31570@twins.programming.kicks-ass.net> <20140116103659.GO7572@laptop.programming.kicks-ass.net> <20140118100105.GV10038@linux.vnet.ibm.com> <20140118113406.GY30183@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140118113406.GY30183@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14011812-3532-0000-0000-000004D78123 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote: > On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote: > > OK, I will bite... Aside from fine-grained code timing, what code could > > you write to tell the difference between a real one-byte store and an > > RMW emulating that store? > > Why isn't fine-grained code timing an issue? I'm sure Alpha people will > love it when their machine magically keels over every so often. > > Suppose we have two bytes in a word that get concurrent updates: > > union { > struct { > u8 a; > u8 b; > }; > int word; > } ponies = { .word = 0, }; > > then two threads concurrently do: > > CPU0: CPU1: > > ponies.a = 5 ponies.b = 10 > > > At which point you'd expect: a == 5 && b == 10 > > However, with a rmw you could end up like: > > > load r, ponies.word > load r, ponies.word > and r, ~0xFF > or r, 5 > store ponies.word, r > and r, ~0xFF00 > or r, 10 << 8 > store ponies.word, r > > which gives: a == 0 && b == 10 > > The same can be had on a single CPU if you make the second RMW an > interrupt. > > > In fact, we recently had such a RMW issue on PPC64 although from a > slightly different angle, but we managed to hit it quite consistently. > See commit ba1f14fbe7096. > > The thing is, if we allow the above RMW 'atomic' store, we have to be > _very_ careful that there cannot be such overlapping stores, otherwise > things will go BOOM! > > However, if we already have to make sure there's no overlapping stores, > we might as well write a wide store and not allow the narrow stores to > begin with, to force people to think about the issue. Ah, I was assuming atomic rmw, which for Alpha would be implemented using the LL and SC instructions. Yes, lots of overhead, but if the CPU designers chose not to provide a load/store byte... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/