Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757204AbcC2OPN (ORCPT ); Tue, 29 Mar 2016 10:15:13 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:17909 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755953AbcC2OPL (ORCPT ); Tue, 29 Mar 2016 10:15:11 -0400 Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU To: Steven Haigh , xen-devel , linux-kernel@vger.kernel.org References: <56F4A816.3050505@crc.id.au> <56F52DBF.5080006@oracle.com> <56F545B1.8080609@crc.id.au> <56F54EE0.6030004@oracle.com> <56F56172.9020805@crc.id.au> <56F5653B.1090700@oracle.com> <56F5A87A.8000903@crc.id.au> <56FA4336.2030301@crc.id.au> Cc: "gregkh@linuxfoundation.org" From: Boris Ostrovsky Message-ID: <56FA8DDD.7070406@oracle.com> Date: Tue, 29 Mar 2016 10:14:53 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <56FA4336.2030301@crc.id.au> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3155 Lines: 68 On 03/29/2016 04:56 AM, Steven Haigh wrote: > > Interestingly enough, this just happened again - but on a different > virtual machine. I'm starting to wonder if this may have something to do > with the uptime of the machine - as the system that this seems to happen > to is always different. > > Destroying it and monitoring it again has so far come up blank. > > I've thrown the latest lot of kernel messages here: > http://paste.fedoraproject.org/346802/59241532 Would be good to see full console log. The one that you posted starts with an error so I wonder what was before that. Have you tried this on bare metal, BTW? And you said this is only observed on 4.4, not 4.5, right? > > Interestingly, around the same time, /var/log/messages on the remote > syslog server shows: > Mar 29 17:00:01 zeus systemd: Created slice user-0.slice. > Mar 29 17:00:01 zeus systemd: Starting user-0.slice. > Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root. > Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root. > Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice. > Mar 29 17:00:01 zeus systemd: Stopping user-0.slice. > Mar 29 17:01:01 zeus systemd: Created slice user-0.slice. > Mar 29 17:01:01 zeus systemd: Starting user-0.slice. > Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root. > Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root. > Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94 > offset -0.002247 sec > Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice. > Mar 29 17:08:34 zeus systemd: Stopping user-0.slice. > Mar 29 17:10:01 zeus systemd: Created slice user-0.slice. > Mar 29 17:10:01 zeus systemd: Starting user-0.slice. > Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root. > Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root. > Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice. > Mar 29 17:10:01 zeus systemd: Stopping user-0.slice. > Mar 29 17:20:01 zeus systemd: Created slice user-0.slice. > Mar 29 17:20:01 zeus systemd: Starting user-0.slice. > Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root. > Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root. > Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice. > Mar 29 17:20:01 zeus systemd: Stopping user-0.slice. > Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout > (limit 1min)! > Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed > out. Terminating. > Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed > out. Killing. > Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after > SIGKILL. Ignoring. > Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm > timed out. Killing. > Mar 29 17:38:26 zeus systemd: systemd-logind.service still around after > final SIGKILL. Entering failed mode. > Mar 29 17:38:26 zeus systemd: Unit systemd-logind.service entered failed > state. > Mar 29 17:38:26 zeus systemd: systemd-logind.service failed. These may be result of your system not feeling well, which is not surprising. -boris