Received: by 10.223.164.221 with SMTP id h29csp14704wrb; Fri, 3 Nov 2017 04:56:28 -0700 (PDT) X-Google-Smtp-Source: ABhQp+SPh2Vjt39hYeqOXSgSWBzAzgVEa5gqgsCbCT2iAMWFQEg569sT0jpsc4o59Mo7iITf4pqP X-Received: by 10.99.7.133 with SMTP id 127mr6841766pgh.147.1509710188788; Fri, 03 Nov 2017 04:56:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509710188; cv=none; d=google.com; s=arc-20160816; b=aspvkNpO1ZItPHpqTGF0nh/2d7dnl0WPtp5NgTBN4gkI6Lc6L0kYfJE3Z6/wTj86tc UZqZ51zqBu8PdbPSZmt6Dx0wnjgOiHi/mU5Lyr0JRYzxQYEeSAxbtCj2JqRZ2tHyRaaw PWwHvsE5JsE9Syfpt13okgp1d2ekqarN3ruCl14+Pz3kW+501O9Hds1ZTPl725F9okV/ SVMKXMus0hVGOfqIiNi3Ja5Aw1k9GdWAWVgdY2gPHWyAQfONMYGN8D8yzESRkzqBvooa 0kYjnzXRxpDZ/YLkF0WfWAVUgV/t4Vdtg87Bkir1SMVLaYcPkLi5xToBoyIqmaog0Cj0 hTsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :dlp-reaction:dlp-version:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:arc-authentication-results; bh=KgANhvxoqX3ngNFSXn4G239/3pPXOMa4IrDBHAXWgow=; b=a7daUvMY7sc/7VCGsD5I2v8Ax7iedUWUGNzpgMW3YurlLCG5lRLOv6EVoZv0v97WRJ 7ZjXjYxEfO5lOUrdertooXsZCbc6/2JxZN7Ucfaymq9JmlYWuKdO8trE5oGsn9J5hp01 1i5K8kXw+wrTQ+8rY68Nc8HeYyQaiDTtHKKYTnKhrNYMG4C31EdbeMdHvQY/NsvPwjKc 1ojPQ6uQrK3fN1iCqOfjWO/I/6/JHaUtaIBNR2ZJeqkpk5KmqBPkil0/wxbZZyrGPnfr fdfghn8FYdpgc5bgBEUm0wcUCQC5ICgc2rdMMeLGXDbSP/0UjHMNwRDZn/hzUzdQ3HVf bAzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 143si5939557pge.666.2017.11.03.04.56.15; Fri, 03 Nov 2017 04:56:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756171AbdKCLzX convert rfc822-to-8bit (ORCPT + 97 others); Fri, 3 Nov 2017 07:55:23 -0400 Received: from mga14.intel.com ([192.55.52.115]:23719 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754765AbdKCLzV (ORCPT ); Fri, 3 Nov 2017 07:55:21 -0400 Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Nov 2017 04:55:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,338,1505804400"; d="scan'208";a="171071271" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by fmsmga005.fm.intel.com with ESMTP; 03 Nov 2017 04:55:18 -0700 Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 3 Nov 2017 11:55:17 +0000 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.180]) by irsmsx111.ger.corp.intel.com ([169.254.2.30]) with mapi id 14.03.0319.002; Fri, 3 Nov 2017 11:55:17 +0000 From: "Reshetova, Elena" To: Peter Zijlstra CC: "linux-kernel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "keescook@chromium.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "ishkamiel@gmail.com" , Will Deacon , Paul McKenney , "stern@rowland.harvard.edu" , "parri.andrea@gmail.com" , "boqun.feng@gmail.com" , "dhowells@redhat.com" , "david@fromorbit.com" Subject: RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Thread-Topic: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Thread-Index: AQHTS+92qOdaqfxtzUSi29Fjx7+W86LxWH8AgAXrypCAAGnUAIAGNEwQgANK0gCAACnGAA== Date: Fri, 3 Nov 2017 11:55:17 +0000 Message-ID: <2236FBA76BA1254E88B949DDB74E612B802B92EC@IRSMSX102.ger.corp.intel.com> References: <1508756984-18980-1-git-send-email-elena.reshetova@intel.com> <20171023131224.GC3165@worktop.lehotels.local> <2236FBA76BA1254E88B949DDB74E612B802B6A2E@IRSMSX102.ger.corp.intel.com> <20171027135624.GY3165@worktop.lehotels.local> <2236FBA76BA1254E88B949DDB74E612B802B89B8@IRSMSX102.ger.corp.intel.com> <20171102135742.7o4urtltgvhr6dku@hirez.programming.kicks-ass.net> In-Reply-To: <20171102135742.7o4urtltgvhr6dku@hirez.programming.kicks-ass.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTRlOWE5ZTMtMjY5NS00M2RjLWI4YjctZTk5MzkzYjg3YjVmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjIuNS4xOCIsIlRydXN0ZWRMYWJlbEhhc2giOiJxbE5pdjNwc3NLUmRBYzVcL1E2ZE5BU0lRZVR0K1A2azNDMW1NUTZMVllrc0RtWmlvN2pIK3ZCd0lMMGRWb3MrTyJ9 x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Thu, Nov 02, 2017 at 11:04:53AM +0000, Reshetova, Elena wrote: > > > Well refcount_dec_and_test() is not the only function that has different > > memory ordering specifics. So, the full answer then for any arbitrary case > > according to your points above would be smth like: > > > > for each substituted function (atomic_* --> refcount_*) that actually > > has changes in memory ordering *** perform the following: > > - mention the difference > > - mention the actual code place where the change would affect ( > > various put and etc. functions) > > - potentially try to understand if it would make a difference for > > this code (again here is where I am not sure how to do it properly) > > > > > > *** the actual list of these functions to me looks like: > > > 1) atomic_inc_not_zero -> refcount_inc_not_zero. Change is from > > underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed () First > > one implies SMP-conditional general memory barrier (smp_mb()) on each > > side of the actual operation (at least according to documentation, In > > reality it goes through so many changes, ifdefs and conditionals that > > one gets lost Easily in the process). > > That matches _inc(), which doesn't imply anything. So you are saying that atomic_inc_not_zero has the same guarantees (meaning no guarantees) as atomic_inc? If yes, then I am now confused here because atomic_inc_not_zero based on atomic_cmpxchg() which according to this https://elixir.free-electrons.com/linux/latest/source/Documentation/memory-barriers.txt#L2527 does imply the smp_mb().... > > The reasoning being that you must already have obtained a pointer to the > object; and that will necessarily include enough ordering to observe the > object. Therefore increasing the refcount doesn't require further > constraints. > > > Second one according to the comment implies no memory barriers > > whatsoever, BUT "provides a control dependency which will order future > > stores against the inc" So, every store (not load) operation (I guess > > we are talking here only about store operations that touch the same > > object, but I wonder how it is defined in terms of memory location? > > Memory location is irrelevant. I was just trying to understand to what "stores" does control dependency barrier applies here? You mean it applies to absolutely all stores on all objects? I guess we are talking about the same object here, just trying to understand how object is defined in terms of memory location. > > > (overlapping?) that comes inside "if refcount_inc_not_zero(){}" cause > > would only be executed if functions returns true. > > The point is that a CPU must NOT speculate on stores. So while it can > speculate a branch, any store inside the branch must not become visible > until it can commit to that branch. > > The whole point being that possible modifications to the object to which > we've _conditionally_ acquired a reference, will only happen after we're > sure to indeed have acquired this reference. > > Otherwise its similar to _inc(). Yes, now I understand this part. > > > So, practically what we might "loose" here is any updates on the > > object protected by this refcounter done by another CPU. But for this > > purpose we expect the developer to take some other lock/memory barrier > > into use, right? > > Correct, object modification had better be serialized. Refcounts cannot > (even with atomic_t) help with that. > > > We only care of incrementing the refcount atomically and make sure we > > don't do anything with object unless it is ready for us to be used. > > Just so.. > > > If yes, then I guess it might be a big change for the code that > > previously relied on atomic-provided smp_mb() barriers and now instead > > needs to take an explicit locks/barriers by itself. > > Right, however such memory ordering should be explicitly documented. > Unknown and hidden memory ordering is a straight up bug, because > modifications to the code (be they introducing refcount_t or anything > else) can easily break things. Yes, this is what has been mentioned before many times, but again reality might be different, so better be prepared here also. > > > 2) atomic_** -> refcount_add_not_zero. Fortunately these are super > > rare and need to see per each case dependent on actual atomic function > > substituted. > > See inc_not_zero. > > > 3) atomic_add() --> refcount_add(). This should not make any change > > since both do not provide memory ordering at all, but there is a > > comment in the refcount.c that says that refcount_add " provide a > > control dependency and thereby orders future stores". How is it done > > given that refcount_add is void returning function?? I am fully > > confused with this one. > > Weird, mostly comes from being implemented using add_not_zero I suppose. Yes, underlying is add_not_zero, but is it still correct to talk about any control dependencies here? How would it possibly look in the code? What is the surrounding if statement? > > > 4) atomic_dec () --> refcount_dec (). This one we have discussed > > extensively before. Again first one implies SMP-conditional general > > memory barrier (smp_mb()) on each side of the actual operation. > > No, atomic_dec() does not in fact imply anything.. > > > Second one only provides "release" ordering meaning that prior > > both loads and stores must be completed before the operation. > > So, is it correct to express the difference in this way: > > > > atomic_dec (): refcount_dec (): > > > > smp_mb(); smp_mb(); > > do_atomic_dec_operation; do_atomic_dec_operation; > > smp_mb(); /*note no any barrier here! */ > > > No, on two points: atomic_dec() does not imply _anything_ and while > refcount_dec() does the release, that is distinctly different from > smp_mb() before. Oh, yes, sorry this got confused. So, here we actually have the opposite situation: refcount variant kind of provides a bit more order here than atomic variant. > > C C-peterz-release-vs-mb > > { > } > > P0(int *a, int *b, int *c) > { > WRITE_ONCE(*a, 1); > // smp_mb(); > smp_store_release(b, 1); > WRITE_ONCE(*c, 1); > } > > P1(int *a, int *b, int *c) > { > r3 = READ_ONCE(*c); > smp_rmb(); > r2 = smp_load_acquire(b); > r1 = READ_ONCE(*a); > } > > exists (1:r1=0 /\ 1:r2=0 /\ 1:r3=1) > > That happens without the smp_mb(), once you put that back its no longer > allowed. > > The reason is that smp_store_release() does not constrain the store to > @c and that is allowed to happen before, whereas with smp_mb() that > store can not happen before the store to @a. > > smp_store_release() only affects the prior load/stores not anything > later, whereas smp_mb() imposes full order. This is a good example, thank you, helps understanding greatly. > > > 5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4) > > but in addition we have a control flow barrier. > > No, this one was in fact a full barrier and is now a release. > > > So, is it correct to express the difference in this way: > > > > atomic_dec_and_test (): refcount_dec_and_test (): > > > > smp_mb(); smp_mb(); > > if (do_atomic_dec_and_test;) { if (do_atomic_dec_and_test;){ > > /* 1 --> 0 transition happened */ /* 1 --> 0 transition happened */ > > smp_mb(); /* in refcounter case we assume > > do_anything including free(); that we are the last user of > object > > so that no concurrent access can > happen*/ > > /* control_flow_barrier here */ <-- > ---- which one btw???? > > /* only need to guarantee that we > free after > > reaching zero so we are good > here */ > > } } > smp_mb(); > > or better: > > if ( ({ smp_mb(); ret = do_atomic_dec_and_test(); smp_mb(); ret }) ) > > The difference being that the smp_mb happens on both branches of the > condition (or in fact before the branch). > > Same note as the above; the smp_mb() on the refcount_dec_and_test() side > is strictly stronger than the release. What is the correct way to show the control flow barriers in the example? The memory-barriers.txt only uses READ/WRITE_ONCE() notation. > > > 6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5) > > Yes, with the same note as for add*, these really should not be used. > > > 7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(). > > Should be identical to 4) > > Correct; was fully ordered, is now a release. > > > Lock functions such as refcount_dec_and_lock() & > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as > > they atomic counterparts. > > Nope. The atomic_dec_and_lock() provides smp_mb() while > refcount_dec_and_lock() merely orders all prior load/store's against all > later load/store's. Yes, I confused the whole thing again. Yes, atomic_dec_and_lock() is based on atomic_add_unless() which impies smp_mb(). > > The difference is subtle and involves at least 3 CPUs. I can't seem to > write up anything simple, keeps turning into monsters :/ Will, Paul, > have you got anything simple around? > > > Is the above a correct view on this? I think that instead of putting > > this whole info in the individual commits, how about making some > > documentation subsection (in memory-barriers or atomics or on its own, > > don't know what is better) with examples like above? > > > Then in each individual commit I can point to the respective line in > > the doc and say that here is a particular memory guarantee change and > > in the case of this particular code it would apply to your > > put_pi_state() or whatever else function. Otherwise keeping this info > > only in commits do not benefit any future users of refcount. > > Sure you can write a refcount-vs-atomic.txt file to call out these > differences. Ok, I will try collecting the relevant examples from this thread as it goes and will try to send some first version on Monday. > > > > So while memory-barriers.txt is a longish document, it is readable with > > > a bit of time and effort. There are also tools being developed that can > > > help people validate their code. > > > > Could you please point to the tools? Might help my understanding also. > > https://github.com/aparri/memory-model Thank you, will try to check! > > > So, I actually went and tried to __slowly__ read the > > memory-barriers.txt which indeed makes a lot of sense when you read it > > in full (not partially like it did it before). But the issue with this > > doc is not so much the length but not always consequent way of giving > > the information to unprepared reader. > > Fair enough; that document has grown over the years as has our > understanding on the subject. I would imagine it is indeed somewhat > inconsistent. > > Note that there's work done on better documents and updates to this one. > One document that might be good to read (I have not in fact had time to > read it myself yet :-(): > > https://github.com/aparri/memory- > model/blob/master/Documentation/explanation.txt Ok, let me try this one. Anyhow it would take a while for all this info to settle properly in my head, there are a lot of details to keep in mind all the time.... > > > I have many examples that I have wondered for a while what is meant with > > certain term or notion somewhere at the beginning of the doc > > only to discover a fully specified answer somewhere after line 1800. > > > I do have many suggestions that might help to improve the readability > > of the document but it would take me at least a day to write them down > > (and it is going to be a long read for anyone), so before I go do it, > > I want to know if anyone is interested to hear :) Anyway this would be > > a separate exercise to do if people want. > > Yes, in general feedback on that document is appreciated. Cc'ed a number > of more memory ordering people. > > That said; we have to rely on maintainers to be aware of 'some' memory > ordering requirements, otherwise it will be exceedingly hard to find > them. You mean hard to find requirements or maintainers that keep all these details in their heads? :) > > One tell-tale sign is if there are otherwise unmatched memory barriers > in the code. In that case they could end up being matched by atomic or > lock ops. And at that point we need to be careful and reverse engineer > what ordering is required. So, you mean that if I see somewhere that there is let's say smp_rmb(), but then can't find any matching smp_wmb() or smp_mb() then I should check if it has been done "underneath" using one of atomic functions. I think this is a good tip for checking actually, I will start looking into these. Might be good tip to put into doc also as suggestion for people to double check their code when/if they convert. Best Regards, Elena. From 1582987625899432008@xxx Thu Nov 02 20:30:42 +0000 2017 X-GM-THRID: 1582046402032606032 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread