Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5332733yba; Tue, 30 Apr 2019 13:00:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqyohpA5H2QRp2ABJmTD987n7B3NKdDf8zBuqAY4pV+qw6YQBStLAlGlCy8HUd8QsOWestjl X-Received: by 2002:a17:902:7783:: with SMTP id o3mr18716371pll.159.1556654451970; Tue, 30 Apr 2019 13:00:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556654451; cv=none; d=google.com; s=arc-20160816; b=hiMXuka6jbV8LLezdeXCtp1z00ztHSgd+XjS6jK89KmcReK3jjWhQTm+oeJpjmVjeL DwHOIY7g8MDeHd13GATudouZpB2gwqw2IaL5AZWvt4YEvaaOhaoPJK7vIC9RPgsuZkmf CitONYcii4NiFL2SfRWMeaAlQdfPkh8+uX1PyaTVa6SQRB0xz66EZiyWDBrFOPFdP1z1 XZ6JJ9ORsrUMyvLFBZAbWuePHgJNq5DPoBU/nuKbn2Yi+tA1mhVcBj6oLQ82Pa/Rrdgb T/vJCPzCYyL6y1whO//Sia2hTKU1x0mmQo5NXwBlzmZUmkSOxSEm6QSe56L0nlwZYyM0 ah9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:in-reply-to:subject:cc:to:from:user-agent :references; bh=KawgKy4dsWwBFFlSfqs1yQfW/iMF/MYQMjU894RfQ14=; b=hGUt+2ox1BfNMK1S7x+IQInRQIOf7GwoyWRPGgtjUTf+m8pZM0Kvhu0/ATnzUdJ+/d Wsye/heBuzbZl40o8Kh+b1GnSpf+xhs2yiCy/jNZ9o09nO8bHLT1sGz6cSDKY+oTZBXy zrjoRPU6CjLSOzmgJLWmpwJCu9L5q41oEx/wXO8iALDctjsLHl75elPR/os2YP9f5fYZ 4TjqTvM9exM403xdgSvUaVWsyW85uuS//+5Tpa7rWrgJIB3GhP/gkBTC3iQdbnPaFMNs zhPJiqlImQaH7xV11Dd26vHTMrCajPIn8ajFO8JSvNrjlzNMBETUQE6faJEbXk9rZb0y chgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k13si10743719pfa.289.2019.04.30.13.00.31; Tue, 30 Apr 2019 13:00:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726115AbfD3T7i convert rfc822-to-8bit (ORCPT + 99 others); Tue, 30 Apr 2019 15:59:38 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:58300 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725996AbfD3T7i (ORCPT ); Tue, 30 Apr 2019 15:59:38 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3UJmE6L046452; Tue, 30 Apr 2019 15:59:29 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0b-001b2d01.pphosted.com with ESMTP id 2s6v6728nb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Apr 2019 15:59:29 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x3UE1AAI025463; Tue, 30 Apr 2019 14:03:22 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma01dal.us.ibm.com with ESMTP id 2s4eq3t43s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Apr 2019 14:03:22 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3UJxRTa61735068 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 30 Apr 2019 19:59:27 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E7857805E; Tue, 30 Apr 2019 19:59:27 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 048AE7805F; Tue, 30 Apr 2019 19:59:24 +0000 (GMT) Received: from morokweng.localdomain (unknown [9.85.212.9]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTPS; Tue, 30 Apr 2019 19:59:24 +0000 (GMT) References: <20190423223914.3882-1-bauerman@linux.ibm.com> <877ebbsb8u.fsf@linux.ibm.com> User-agent: mu4e 1.0; emacs 26.1 From: Thiago Jung Bauermann To: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org, Gautham R Shenoy , linux-kernel@vger.kernel.org, Nicholas Piggin , Michael Bringmann , Tyrel Datwyler , Vaidyanathan Srinivasan Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU In-reply-to: <877ebbsb8u.fsf@linux.ibm.com> Date: Tue, 30 Apr 2019 16:59:18 -0300 Message-ID: <87v9yve02x.fsf@morokweng.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-30_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904300118 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Nathan, Thanks for reviewing the patch! Nathan Lynch writes: > Thiago Jung Bauermann writes: >> This can be a problem because if the busy loop finishes too early, then the >> kernel may offline another CPU before the previous one finished dying, >> which would lead to two concurrent calls to rtas-stop-self, which is >> prohibited by the PAPR. >> >> Since the hotplug machinery already assumes that cpu_die() is going to >> work, we can simply loop until the CPU stops. >> >> Also change the loop to wait 100 µs between each call to >> smp_query_cpu_stopped() to avoid querying RTAS too often. > > [...] > >> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> index 97feb6e79f1a..d75cee60644c 100644 >> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu) >> msleep(1); >> } >> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { >> - >> - for (tries = 0; tries < 25; tries++) { >> + /* >> + * rtas_stop_self() panics if the CPU fails to stop and our >> + * callers already assume that we are going to succeed, so we >> + * can just loop until the CPU stops. >> + */ >> + while (true) { >> cpu_status = smp_query_cpu_stopped(pcpu); >> if (cpu_status == QCSS_STOPPED || >> cpu_status == QCSS_HARDWARE_ERROR) >> break; >> - cpu_relax(); >> + udelay(100); >> } >> } > > I agree with looping indefinitely but doesn't it need a cond_resched() > or similar check? If there's no kernel or hypervisor bug, it shouldn't take more than a few tens of ms for this loop to complete (Gautham measured a maximum of 10 ms on a POWER9 with an earlier version of this patch). In case of bugs related to CPU hotplug (either in the kernel or the hypervisor), I was hoping that the resulting lockup warnings would be a good indicator that something is wrong. :-) Though perhaps adding a cond_resched() every 10 ms or so, with a WARN_ON() if it loops for more than 50 ms would be better. I'll send an alternative patch. -- Thiago Jung Bauermann IBM Linux Technology Center