Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753854AbcC3No7 (ORCPT ); Wed, 30 Mar 2016 09:44:59 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:40401 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753354AbcC3No6 (ORCPT ); Wed, 30 Mar 2016 09:44:58 -0400 Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU To: Steven Haigh , xen-devel , linux-kernel@vger.kernel.org References: <56F4A816.3050505@crc.id.au> <56F52DBF.5080006@oracle.com> <56F545B1.8080609@crc.id.au> <56F54EE0.6030004@oracle.com> <56F56172.9020805@crc.id.au> <56F5653B.1090700@oracle.com> <56F5A87A.8000903@crc.id.au> <56FA4336.2030301@crc.id.au> <56FA8DDD.7070406@oracle.com> <56FABF17.7090608@crc.id.au> <56FAC3AC.9050802@crc.id.au> Cc: "gregkh@linuxfoundation.org" From: Boris Ostrovsky Message-ID: <56FBD851.9030608@oracle.com> Date: Wed, 30 Mar 2016 09:44:49 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <56FAC3AC.9050802@crc.id.au> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1745 Lines: 42 On 03/29/2016 02:04 PM, Steven Haigh wrote: > Greg, please see below - this is probably more for you... > > On 03/29/2016 04:56 AM, Steven Haigh wrote: >> Interestingly enough, this just happened again - but on a different >> virtual machine. I'm starting to wonder if this may have something to do >> with the uptime of the machine - as the system that this seems to happen >> to is always different. >> >> Destroying it and monitoring it again has so far come up blank. >> >> I've thrown the latest lot of kernel messages here: >> http://paste.fedoraproject.org/346802/59241532 > So I just did a bit of digging via the almighty Google. > > I started hunting for these lines, as they happen just before the stall: > BUG: Bad rss-counter state mm:ffff88007b7db480 idx:2 val:-1 > BUG: Bad rss-counter state mm:ffff880079c638c0 idx:0 val:-1 > BUG: Bad rss-counter state mm:ffff880079c638c0 idx:2 val:-1 > > I stumbled across this post on the lkml: > http://marc.info/?l=linux-kernel&m=145141546409607 > > The patch attached seems to reference the following change in > unmap_mapping_range in mm/memory.c: >> - struct zap_details details; >> + struct zap_details details = { }; > When I browse the GIT tree for 4.4.6: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/memory.c?id=refs/tags/v4.4.6 > > I see at line 2411: > struct zap_details details; > > Is this something that has been missed being merged into the 4.4 tree? > I'll admit my kernel knowledge is not enough to understand what the code > actually does - but the similarities here seem uncanny. The patch that you are referring to is trying to fix a bug in a feature that's not in the mainline yet ("mm, oom: introduce oom reaper"). -boris