Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756114AbbHZPES (ORCPT ); Wed, 26 Aug 2015 11:04:18 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:36437 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751690AbbHZPEQ (ORCPT ); Wed, 26 Aug 2015 11:04:16 -0400 Date: Wed, 26 Aug 2015 18:04:12 +0300 From: "Kirill A. Shutemov" To: "Paul E. McKenney" Cc: Vlastimil Babka , Andrew Morton , "Kirill A. Shutemov" , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Johannes Weiner , Michal Hocko , David Rientjes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Lameter Subject: Re: [PATCHv3 4/5] mm: make compound_head() robust Message-ID: <20150826150412.GA16412@node.dhcp.inet.fi> References: <1439976106-137226-1-git-send-email-kirill.shutemov@linux.intel.com> <1439976106-137226-5-git-send-email-kirill.shutemov@linux.intel.com> <20150820163643.dd87de0c1a73cb63866b2914@linux-foundation.org> <20150821121028.GB12016@node.dhcp.inet.fi> <55DC550D.5060501@suse.cz> <20150825183354.GC4881@node.dhcp.inet.fi> <20150825201113.GK11078@linux.vnet.ibm.com> <55DCD434.9000704@suse.cz> <20150825211954.GN11078@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150825211954.GN11078@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3246 Lines: 71 On Tue, Aug 25, 2015 at 02:19:54PM -0700, Paul E. McKenney wrote: > On Tue, Aug 25, 2015 at 10:46:44PM +0200, Vlastimil Babka wrote: > > On 25.8.2015 22:11, Paul E. McKenney wrote: > > > On Tue, Aug 25, 2015 at 09:33:54PM +0300, Kirill A. Shutemov wrote: > > >> On Tue, Aug 25, 2015 at 01:44:13PM +0200, Vlastimil Babka wrote: > > >>> On 08/21/2015 02:10 PM, Kirill A. Shutemov wrote: > > >>>> On Thu, Aug 20, 2015 at 04:36:43PM -0700, Andrew Morton wrote: > > >>>>> On Wed, 19 Aug 2015 12:21:45 +0300 "Kirill A. Shutemov" wrote: > > >>>>> > > >>>>>> The patch introduces page->compound_head into third double word block in > > >>>>>> front of compound_dtor and compound_order. That means it shares storage > > >>>>>> space with: > > >>>>>> > > >>>>>> - page->lru.next; > > >>>>>> - page->next; > > >>>>>> - page->rcu_head.next; > > >>>>>> - page->pmd_huge_pte; > > >>>>>> > > >>> > > >>> We should probably ask Paul about the chances that rcu_head.next would like > > >>> to use the bit too one day? > > >> > > >> +Paul. > > > > > > The call_rcu() function does stomp that bit, but if you stop using that > > > bit before you invoke call_rcu(), no problem. > > > > You mean that it sets the bit 0 of rcu_head.next during its processing? > > Not at the moment, though RCU will splat if given a misaligned rcu_head > structure because of the possibility to use that bit to flag callbacks > that do nothing but free memory. If RCU needs to do that (e.g., to > promote energy efficiency), then that bit might well be set during > RCU grace-period processing. Ugh.. :-/ > > That's > > bad news then. It's not that we would trigger that bit when the rcu_head part of > > the union is "active". It's that pfn scanners could inspect such page at > > arbitrary time, see the bit 0 set (due to RCU processing) and think that it's a > > tail page of a compound page, and interpret the rest of the pointer as a pointer > > to the head page (to test it for flags etc). > > On the other hand, if you avoid scanning rcu_head structures for pages > that are currently waiting for a grace period, no problem. RCU does > not use the rcu_head structure at all except for during the time between > when call_rcu() is invoked on that rcu_head structure and the time that > the callback is invoked. > > Is there some other page state that indicates that the page is waiting > for a grace period? If so, you could simply avoid testing that bit in > that case. No, I don't think so. For compound pages most of info of its state is stored in head page (e.g. page_count(), flags, etc). So if we examine random page (pfn scanner case) the very first thing we want to know if we stepped on tail page. PageTail() is what I wanted to encode in the bit... What if we change order of fields within rcu_head and put ->func first? Can we expect this pointer to have bit 0 always clear? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/