Received: by 2002:ac0:8845:0:0:0:0:0 with SMTP id g63csp174321img; Wed, 27 Feb 2019 19:33:57 -0800 (PST) X-Google-Smtp-Source: AHgI3IYI81F81FFHBCgvvquOqYxLA32z/h5FeoLASlPj/KxlhJX++lTTzdGDOAQE4fZYkzRCM5Yg X-Received: by 2002:a62:465d:: with SMTP id t90mr5330467pfa.181.1551324837822; Wed, 27 Feb 2019 19:33:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551324837; cv=none; d=google.com; s=arc-20160816; b=jPhpleQ0cIQtv1DXkpMNZPYGq1l0lm5I9mEBFXl5EfwnTY6KBkrMBy4WIKQxP6icr+ WVMVjTTrQwPVjEIL0QwrVVxX4KRBHMav6TUy/VfN8F4Vb9xzvIi2eTIN9jgyQOxmFa3S ALJhYtwdwrJt/fL+RreTPZUbL0XRqFZ+mAtCseRZ0xarTF8tLs9sFyge2GyGE/UJ8FZv UhuJL0Ck5EGf85ApfmqRbm0O9Ugp/L5eUnt9o3QTdT88EeWTPfhFOZBIgq8RvtHLe+r8 D5isICI1h1qu834Qz3j7IsltdrM+/eRCHYMiIwSPsBvHI2zgF01+XyRJHHi/OBy+tpgg dEAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:organization:references:cc :to:from:subject:reply-to:dkim-signature; bh=0bljGLVsqG12OI4vC4omBgVjt8eY+54eDZuH+yfLJs0=; b=JJBQz7N+0sOqtqHtWN12ilM7f457zvAWWaCEwLj13boi55LCYLAlMLLw/LTXeLVmu8 CZbsb+CuTob+evH79mZXRKvSg3w5dJvvBPCpcNz3jz3hiHSC1PKUKYmcwnwakizYtTS+ OG0+WW2e7+/RkOBm2EyOm4wNihQwVg4Ms/jRbHhjP9m0N5D3mXqV11vseIt9dk5WttV8 ptbIru2RnPfKTM3GpHo0KSDEdYM5q7IGHiy2lUykyoeNthoe4lxNk3qAtdtfBcTBLDoN NeiSa2Os1ZTmWgu77DKd9YOdJ5xGXqZzczU1PMF844AoNJLBVsCHxu4cjDQDsvcq2Ssc K5rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=O7gKbW0J; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n19si12300828pff.18.2019.02.27.19.33.39; Wed, 27 Feb 2019 19:33:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=O7gKbW0J; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730452AbfB1DdU (ORCPT + 99 others); Wed, 27 Feb 2019 22:33:20 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:53080 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730131AbfB1DdT (ORCPT ); Wed, 27 Feb 2019 22:33:19 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x1S3T3tJ163836; Thu, 28 Feb 2019 03:33:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=reply-to : subject : from : to : cc : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=0bljGLVsqG12OI4vC4omBgVjt8eY+54eDZuH+yfLJs0=; b=O7gKbW0J7ZjEYIJ4iiEEJQtRE6mop3L81TaoH+lO2/O7Rd4H+lCej1ifO5Biiy9xq/bT cBv+fUw+7pBKvVfh+el0DO0hQV23lCSrQF5hY6kY1uGpYizcDkX3AeP4mCnDaGUmc35i x03a3zIVK37rzmSX4AoR0fq6NALiw7zPmHt7HDwJJ08IzUw1GNdseiEnbru6CEQOxs91 RBZXHGWgoJnKbldJTMCMf6cLhoEk1xe6CNZJ2WapIcg4G5VDZAUb6iL/qDoNl0ZlD4AJ 2VjUjZXhsmGdTWmdacbV+Mez+UG2kVsg7QCiu5SIaE/6UQrbBZ6nPNjDJ0+MdIgu6wK3 bA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2qtupeepv4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Feb 2019 03:33:05 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x1S3X4T2005432 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Feb 2019 03:33:05 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x1S3X36W027915; Thu, 28 Feb 2019 03:33:04 GMT Received: from [10.191.12.205] (/10.191.12.205) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 27 Feb 2019 19:33:03 -0800 Reply-To: zhenzhong.duan@oracle.com Subject: Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention From: Zhenzhong Duan To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Daniel Lezcano , Waiman Long , Srinivas Eeda , kin.cho@oracle.com, linux_lkml_grp@oracle.com References: <1548141807-25825-1-git-send-email-zhenzhong.duan@oracle.com> <019e583c-7bcb-c234-200c-fcdb6c49fbb0@oracle.com> <853e8cf6-aba9-0200-8e39-e362848399ba@oracle.com> <8ffea578-5eb6-f479-7bd4-668df84d930f@oracle.com> Organization: Oracle Message-ID: <0a0d304a-3612-879a-435e-27336e1b67ce@oracle.com> Date: Thu, 28 Feb 2019 11:33:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <8ffea578-5eb6-f479-7bd4-668df84d930f@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9180 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902280021 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/2/18 11:48, Zhenzhong Duan wrote: > On 2019/2/11 5:08, Thomas Gleixner wrote: >> On Sat, 2 Feb 2019, Zhenzhong Duan wrote: >>> On 2019/1/31 22:26, Thomas Gleixner wrote: >> >>>>>> I'm not against the change per se, but I really want to understand >>>>>> why we need all the complexity for something which should never be >>>>>> used in a real world deployment. >>>>>> >>>>> Hmm, it's a strong word of "never be used". Customers may happen to >>>>> use nohpet(sanity test?) and report bug to us. Sometimes they does >>>>> report a bug that reproduce with their customed config. There may >>>>> also be BIOS setting HPET disabled. >>>> >>>> And because the customer MAY do completely nonsensical things (and >>>> there >>>> are a lot more than the HPET) the kernel has to handle all of them? >>> >>> Ok, then. I don't have more suggestion to convince you. >> >> You give up too fast :) > > Ah, because I thought of a simple fix. >> >> The point is, that we really want proper justifications for changes like >> this. Some 'may, could and more handwaving' simply does not cut it. >> >> So if you can just describe a realistic scenario, which does not involve >> thoughtless flipping of BIOS options, then this becomes way more >> palatable. > > I indeed don't see a realistic scenario in a product env needing to use > nohpet. My only justification is now that we have nohpet as kernel > parameter, we should fix the softlockup in large machines for enterprise > use. >> >>> I just think of a simple fix as below. I think it will work for both >>> hpet >>> and pmtmr. We will test it when the env is available. >> >>> --- a/kernel/time/timekeeping.c >>> +++ b/kernel/time/timekeeping.c >>> @@ -1353,6 +1353,7 @@ static int change_clocksource(void *data) >>> >>>          write_seqcount_end(&tk_core.seq); >>>          raw_spin_unlock_irqrestore(&timekeeper_lock, flags); >>> +       tick_clock_notify(); >>> >>>          return 0; >>>   } >>> @@ -1371,7 +1372,6 @@ int timekeeping_notify(struct clocksource *clock) >>>          if (tk->tkr_mono.clock == clock) >>>                  return 0; >>>          stop_machine(change_clocksource, clock, NULL); >>> -       tick_clock_notify(); >>>          return tk->tkr_mono.clock == clock ? 0 : -1; >>>   } >> >> This won't resolve the concurrency issues of HPET or PMTIMER in any >> way. > > Just got chance to test and Kin confirmed it fix the softlockup of > PMTMR(with nohpet) and HPET(without nohpet, revert previous hpet commit) > at bootup stage. > > My understandig is, at bootup stage tick device is firstly initialized > in periodic mode and then switch to one-shot mode. In periodic mode > clock event interrupt is triggered every 1ms(HZ=1000), contention in > HPET or PMTIMER exceeds 1ms and delayed the clock interrupt. Then CPUs > continue to process interrupt one by one without a break, > tick_clock_notify() have no chance to be called and we never switch to > one-shot mode. > > In one-shot mode, the contention is still there but next event is always > set with a future value. We may missed some ticks, but the timer code is > smart enough to pick up those missed ticks. > > By moving tick_clock_notify() in stop_machine, kernel changes to > one-shot mode early before the contention accumulate and lockup system. > >> Instead it breaks the careful orchestrated mechanism of clocksource >> change. > > Sorry, I didn't get a idea how it breaks, tick_clock_notify() is a > simple function setting bitmask in percpu variable. Could you explain a > bit? Hi Thomas, May I have your further comments? I think applying a simple patch to fix both hpet and pmtmr softlockup is better? Thanks Zhenzhong