Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933145AbbGGUQC (ORCPT ); Tue, 7 Jul 2015 16:16:02 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:49305 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932818AbbGGUPz (ORCPT ); Tue, 7 Jul 2015 16:15:55 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DrDACZMpxVXWZJAg5TCBaCfFRggyC8RYV3AoFeTQEBAQEBAQdEQEEFg14BAQQjFR4iARAIAxgCAgUWCwICCQMCAQIBJwoUBgEMAQUCAogpDrZSljoBAQEBAQEEAQEBAQEBHIEhiiqEL1cHgmiBQwEEjQqHDpl0ik+CMhyBYy4xAQEBgkgBAQE Message-ID: <559C3371.2030704@internode.on.net> Date: Wed, 08 Jul 2015 05:45:45 +0930 From: Arthur Marsh User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: Mathieu Desnoyers , Peter Zijlstra CC: linux-kernel@vger.kernel.org, Rusty Russell , rostedt , Oleg Nesterov , "Paul E. McKenney" Subject: Re: lock-up with module: Optimize __module_address() using a latched RB-tree References: <55997889.5020101@internode.on.net> <20150706100447.GX3644@twins.programming.kicks-ass.net> <559A545A.80508@internode.on.net> <20150706103246.GY3644@twins.programming.kicks-ass.net> <559B63A2.4030601@internode.on.net> <20150707072951.GM3644@twins.programming.kicks-ass.net> <1736781680.1883.1436286785932.JavaMail.zimbra@efficios.com> In-Reply-To: <1736781680.1883.1436286785932.JavaMail.zimbra@efficios.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1900 Lines: 53 Mathieu Desnoyers wrote on 08/07/15 02:03: > ----- On Jul 7, 2015, at 3:29 AM, Peter Zijlstra peterz@infradead.org wrote: > >> On Tue, Jul 07, 2015 at 02:59:06PM +0930, Arthur Marsh wrote: >>> I had a single, non-reproducible case of the same lock-up happening on my >>> other machine running the Linus git head kernel in 64-bit mode. >> >> Hmm, disturbing.. I've had my machines run this stuff for weeks and not >> had anything like this :/ >> >> Do you have a serial cable between those machines? serial console output >> will allow capturing more complete traces than these pictures can and >> might also aid in capturing some extra debug info. >> >> In any case, I'll go try and build some debug code. > > Arthur: can you double-check if you load any module with --force ? > This could cause a module header layout mismatch, which can be an > issue with the changes done by the identified commit: the module > header layout changes there. > > Also, I'm attaching a small patch which serializes both updates and > reads of the module rbree. Can you try it out ? If the problem > still shows with the spinlocks in place, that would mean the issue > is *not* a race between latched rbtree updates and traversals. > > Thanks! > > Mathieu > I'm not aware of any modules being loaded with --force . I've applied the patch, thanks! The resultant kernel locked up as follows: http://www.users.on.net/~arthur.marsh/20150708469.jpg http://www.users.on.net/~arthur.marsh/20150708470.jpg Sorry that the first image isn't as clear as the second - it only appears for a few seconds. Hopefully these will provide some clue as to what is happening. Arthur. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/