Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3371905imm; Tue, 29 May 2018 06:12:50 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrSw9nWVIX9S7FtysulvUtc6O+KGwKSCq+4yyGbmi8JQM2KrxgY+gO9mx5nMr+QwKShQ0je X-Received: by 2002:a62:9c0d:: with SMTP id f13-v6mr17166814pfe.15.1527599570015; Tue, 29 May 2018 06:12:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527599569; cv=none; d=google.com; s=arc-20160816; b=uYy1DIxb2LXvX2YMzonFGkC5wg2kUblxEjqHFeKRAOhobpWUIXyoQAW3bxzO8AU5sR tfkqVEnLjxwJgszT+69EspS26+LgVH9RB9D4FtYCC29qsBjpcitxajGz6wQmCMVG4Zk7 9GZSkCsJorn3za2+5nXtyPVloAW3oF/7M5j91/ea1YVweGR2NnSn6RRs0DsshQrGS3kX AarjN45PD0AoFkEHVAIqSpjkFaWyowpLc7KwNnm21WVItFnCrJEK7F+3dLAyyfogLa18 4YEwdOxTvIyByVAYET33/HXkSc7ZUF8XwAfYU1cWK4xa2QWdT/X7q6zjMOWdAYDYQiej KHww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:date:cc:to:from:subject :arc-authentication-results; bh=Lui1VuWwm6CReclzXW7thg232BcaUVpi2JGuN/31wK0=; b=AGo0Z4dAgs/4Kds6NYUmSJ4Q2SZ8EAWx8wMPchCQcMVg6a5C7sq5DMBQGUXqRZ7EIS 6iKIMVzkd8CUgnbWtP0pznT400F13KCHPGFf5XzoUf3/Kb56IWFZbXtKhTRNKZHxAHID FkhtTDCopecI3bgWnTrwWLcLOgYwZs6N1byceRCV5E83/qSGiHMRmqyJSjRd9U8YBfJU rfvK2iujGo6Kxqfw7caEZjSJmQh1mPYm7KBfkqvbDlFkt9txLVqrro84Jn2D54gXxNFb 4oXIaIBrIRAjU15Kr8ekSHfzakGfqwxG/3CahPtLg4IwxaqN7lk8f+guU5ZL/k4LHCEg gjBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31-v6si19527012plj.216.2018.05.29.06.12.36; Tue, 29 May 2018 06:12:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934309AbeE2NLQ (ORCPT + 99 others); Tue, 29 May 2018 09:11:16 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:41932 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934669AbeE2NLG (ORCPT ); Tue, 29 May 2018 09:11:06 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4TD8MGE053769 for ; Tue, 29 May 2018 09:11:05 -0400 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0a-001b2d01.pphosted.com with ESMTP id 2j951k7dap-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 29 May 2018 09:11:04 -0400 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 29 May 2018 07:11:04 -0600 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e36.co.us.ibm.com (192.168.1.136) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 29 May 2018 07:11:00 -0600 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4TD9iSv9634284 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 29 May 2018 06:09:44 -0700 Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7AB0A6A03F; Tue, 29 May 2018 07:09:44 -0600 (MDT) Received: from [9.124.221.53] (unknown [9.124.221.53]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP id 2DAD46A03B; Tue, 29 May 2018 07:09:41 -0600 (MDT) Subject: Re: [next-20180517][ppc] watchdog: CPU 88 self-detected hard LOCKUP @ update_cfs_group+0x30/0x150 From: Abdul Haleem To: Nicholas Piggin Cc: sachinp , Stephen Rothwell , linux-kernel , linux-next , linuxppc-dev Date: Tue, 29 May 2018 18:39:40 +0530 In-Reply-To: <20180521165056.5f3dceeb@roar.ozlabs.ibm.com> References: <1526883300.19317.18.camel@abdul> <20180521165056.5f3dceeb@roar.ozlabs.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18052913-0020-0000-0000-00000E090A5C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009096; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000264; SDB=6.01039372; UDB=6.00531955; IPR=6.00818500; MB=3.00021358; MTD=3.00000008; XFM=3.00000015; UTC=2018-05-29 13:11:02 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18052913-0021-0000-0000-000061A315D4 Message-Id: <1527599380.3777.3.camel@abdul> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-29_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805290150 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2018-05-21 at 16:50 +1000, Nicholas Piggin wrote: > Ah, it's POWER8. > > I'm betting we have a bug with nohz timer offloading somewhere. > > I *think* we may have seen similar on P9 as well, but that may be > related to problems with stop states. > > Can you reproduce it easily? I'm thinking maybe adding some > tracepoints that track decrementer settings and interrupts, and > nohz offload activity might show something up. Yes, the problem is reproducible consistently on our CI setup and today It triggered on 4.17.0-rc6 (mainline) too. -- Regard's Abdul Haleem IBM Linux Technology Centre