Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751922Ab3FYQ4O (ORCPT ); Tue, 25 Jun 2013 12:56:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:19833 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751060Ab3FYQ4M (ORCPT ); Tue, 25 Jun 2013 12:56:12 -0400 Date: Tue, 25 Jun 2013 12:55:56 -0400 From: Dave Jones To: Steven Rostedt Cc: Oleg Nesterov , "Paul E. McKenney" , Linux Kernel , Linus Torvalds , "Eric W. Biederman" , Andrey Vagin Subject: Re: frequent softlockups with 3.10rc6. Message-ID: <20130625165556.GA16170@redhat.com> Mail-Followup-To: Dave Jones , Steven Rostedt , Oleg Nesterov , "Paul E. McKenney" , Linux Kernel , Linus Torvalds , "Eric W. Biederman" , Andrey Vagin References: <20130623143634.GA2000@redhat.com> <20130623150603.GA32313@redhat.com> <20130623160452.GA11740@redhat.com> <20130624020014.GB12811@redhat.com> <20130624143928.GA20659@redhat.com> <1372085549.18733.162.camel@gandalf.local.home> <20130624160012.GB5993@redhat.com> <1372091079.18733.168.camel@gandalf.local.home> <20130624165140.GB8572@redhat.com> <1372093476.18733.170.camel@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1372093476.18733.170.camel@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1918 Lines: 41 On Mon, Jun 24, 2013 at 01:04:36PM -0400, Steven Rostedt wrote: > On Mon, 2013-06-24 at 12:51 -0400, Dave Jones wrote: > > On Mon, Jun 24, 2013 at 12:24:39PM -0400, Steven Rostedt wrote: > > > > > > Ah, this is the first victim of my new 'check sanity of nodes during list walks' patch. > > > > It's doing the same prev->next next->prev checking as list_add and friends. > > > > I'm looking at getting it into shape for a 3.12 merge after some other preparatory patches > > > > go into 3.11 > > > > > > OK, and you may need to make an exception for the ring buffer. To do a > > > lockless swap out of the reader page for one of the pages in the buffer, > > > it uses the 2 LSB as flags. Notice the "next=ffff880243288001", that "1" > > > is a flag that states the next page is the "header" page (next to be > > > read). We use cmpxchg to update the pages to handle races between the > > > reader and writer. > > > > I just had a plumber come visit to replace my toilet. > > I think even he would say "dude, gross" about that hack. > > Wow, that hack made you so sick you needed to replace your toilet? > > Note, the idea of using the 2 LSB bits of pointers came from -rt. Where > we do the same with the rt_mutex owner. While I've been spinning wheels trying to reproduce that softlockup bug, On another machine I've been refining my list-walk debug patch. I added an ugly "ok, the ringbuffer is playing games with lower two bits" special case. But what the hell is going on here ? next->prev should be prev (ffff88023c6cdd18), but was 00ffff88023c6cdd. (next=ffff880243288001). (trace comes from the same ringbuffer code) Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/