Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754318AbcDKUGk (ORCPT ); Mon, 11 Apr 2016 16:06:40 -0400 Received: from mail-by2on0129.outbound.protection.outlook.com ([207.46.100.129]:35568 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753998AbcDKUGh (ORCPT ); Mon, 11 Apr 2016 16:06:37 -0400 Authentication-Results: linutronix.de; dkim=none (message not signed) header.d=none;linutronix.de; dmarc=none action=none header.from=hpe.com; Message-ID: <570C03AC.7080404@hpe.com> Date: Mon, 11 Apr 2016 16:06:04 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Thomas Gleixner CC: Ingo Molnar , "H. Peter Anvin" , Jonathan Corbet , LKML , , , Borislav Petkov , Andy Lutomirski , Scott J Norton , Douglas Hatch , Randy Wright Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention References: <1460146305-14666-1-git-send-email-Waiman.Long@hpe.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.134] X-ClientProxiedBy: CY1PR13CA0039.namprd13.prod.outlook.com (10.162.30.177) To DF4PR84MB0316.NAMPRD84.PROD.OUTLOOK.COM (10.162.193.30) X-MS-Office365-Filtering-Correlation-Id: 22c67b6f-a17b-4e66-fb1a-08d36244bdb5 X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;2:u3jkHRtgG0Xro6RSttL03dNfeU3jEWOz/MTs4zvZcZmvOhi6Jns2tMpscd5tM2rq5EaOWnJJ0mt8YGxb3/xz9zpDP8yVAGjmfUN46/oDtLSOSIbJcO3qT6GY1zQ/4QCDzVVPPU/B3fQu0Yn7Hza4SPLu/xIJL6T+dkmPOW+UoQAX6XURPq4z5AAiHoRGSm01;3:ZrqwdTLMYQPKUuEYSNKfI0FtUE51Yo17tweIbKMFp9qA1DSn36IPfwVoImxxtRJKyUniETWCGiX95iq65bVJLQU4sHZn6Sh/HkKkVg3YU2cYag/WOMRJ0ikENlb7UXVq X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DF4PR84MB0316; X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;25:ERvrCrZvtPy8p9GA8RxNUqxdzt90DKVVBYT077oSuc5ef4qbsZ9xPDoIXCrLe91j+gzzSmysr459uK8h1yV6M3Ii9MytuDnUweckHre/zefxmDsnRrvHfOTtyUp4vhD+d8dnbPPnnsdQfbzTAJZaH4uH6XlIW3gLAacbjUnyKODxujUKpIi0PJtNwjKYsLo3B8TGWZnGXsaXNi0LYJzL6xSeLRA4/gFegBF0zFQWYHiSzIESTgZ0TeAyvEormHLR26PcrLvevZCjN9GIaEkJt2Uxe8hOe1NDpeDLxpThM9B4GXIFaDIjlZWSeaindldVO7+o8D9d6V7cXsNvHefDEbmEw8mndleTzmZOMfR8lEi1BdD5po7GVAAB8ZN7HPLu3f9iSr8DNTpwIs8fkW5sbVYXQ6cqu81FSHyA9CzLn5ukvQUZSbKVp05T+8fiHUrbja/qOFWUJQZHU0vFyp5I8JFOeB7zjp6GA+hXLAQPzzx/IPANUrxO+LdCfHRnYvF4iIAvieGDUW8XnVRwOLITgdtK95Cya2IXIN1ybJJnEtbBmZIsyy0Bs5sxRwiNCXMNBs8jOk45GGdMsdngtA5n5y0/sj8VIGn4P914X0k7Zw9jkntjP9WEscj4v4jroB8CCnAkBmj4mxnvyrgC15M6tw== X-LD-Processed: 105b2061-b669-4b31-92ac-24d304d195dc,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;20:axWFbCe5tDRP5Sis3oJyPj9JVO9l63KoibwFM+56EaviG41lbfnj/Lqx5Lsb2njh9XZFAwE7VJGa5onssfzmbSuHkSCWHWN5jKcpCB8QjwzPtOfWAQM3zVNhqQ2RssCwKIAn8fI+BjmzBPi2Mo+pUDsSOOP3gLzfLAVSSHSo4PVyFHEBlp2UFKH07fwJ04zjbTXVedD5Lfbq0NAKIO8YCE5DLKLiKwnE1T6hS30svSoxx36j1wdFfBh0qqX2ELRcBurNr5R1G04bnXizhTFsdcIZUePLBvzAIbL2CHkdmg3dQiKB0wTSNn00KYLGOHLhJgaOx8TpiVefVfAOZ03ebQ==;4:zUWe+VuqlSAghFjRXCAQZ4XFU3dPsovQ34Q9vCFO/EtKzQeubCTVGqXjOaavAz8R6ifSEdtMStaWQ7dGKCcVmT35wpE88mCdSo5qvG8uSXq5xFv/IwY6/24bOVmNshgVJZYdm8tzJa2pp5WxrsOCGFOctpO/EQxPGtQY5SFzgKCH2C5MQHl7BZnMzFx69WdpTids6cDOMdz4TQecspIof0tFTROPlmyYb2OgBbqJO/DOmpWlsgKfcbT+0CDxxOtneAPYLjvOp6dHYgI7CgxNHGF74e8fBDp/45KCWuRe5EfjI5HEWEA84aFSdYn1g5c0ZxhynE/5/RSUAYK8ms/+0CNxhh3Sg9LU7LoYfXWKmmErBm2IpYg2BfXXN4+BCLo1H0fDGvVtEYMws8xYHJEcNg== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026);SRVR:DF4PR84MB0316;BCL:0;PCL:0;RULEID:;SRVR:DF4PR84MB0316; X-Forefront-PRVS: 09090B6B69 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(51914003)(24454002)(377454003)(4326007)(4001350100001)(5008740100001)(64126003)(86362001)(50466002)(6116002)(65806001)(65816999)(3846002)(87266999)(76176999)(54356999)(50986999)(586003)(1096002)(36756003)(164054004)(80316001)(83506001)(117156001)(77096005)(47776003)(59896002)(92566002)(33656002)(81166005)(2950100001)(110136002)(65956001)(66066001)(2906002)(230700001)(189998001)(19580395003)(5004730100002)(42186005);DIR:OUT;SFP:1102;SCL:1;SRVR:DF4PR84MB0316;H:[192.168.142.154];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;DF4PR84MB0316;23:9X2uF99v3ZNpWUFJ3N0IxCLl8PlyImYWqZD1lhv?= =?iso-8859-1?Q?Xs1mJmXMyqS7jq2LtCeMig+TXI0WmOk19k7yDz+LbwoUKdI/hf5UbhOZD6?= =?iso-8859-1?Q?/8XRREGeNjLGCps7gMtF0o2KFurtEurzVJWEPMSr43wKl4H+P/cTTD79e4?= =?iso-8859-1?Q?NlSMTHzqknSO0YXEVnwI1bF4S+Z1wWkqDpmgVwlWirCzMXUiV/frtfWQsB?= =?iso-8859-1?Q?hm6sjOU1eGKcYzf3joQ9x7hDbRgAdGtXFVHMQnURzZ+mqa+rpA9Tckv/ts?= =?iso-8859-1?Q?IEZhf2dVSU9eSHlV2u16lIo3ma4gtNoqWbV+j8GQ6q3JLcfkCyFIFAI4gQ?= =?iso-8859-1?Q?opMk8hrBJhO4XGntBCWfOJvii7Odo2GRdOj54Of9wjxSIZCRDP3bLtSUPL?= =?iso-8859-1?Q?bQOCgYwVfV6bmFYAN8WO7ZolvXpZSISbOZhlNlUB/+uK9iXTC+wHWSFON4?= =?iso-8859-1?Q?ZU/JyZUne6m1e0UT021Uckte7qITjS5VP1or6FPQkCcO4ofq5xn0/NdnPE?= =?iso-8859-1?Q?gNJ33L9tbQzBtKl0Gk+fQenfZGhU21voAvj5BqMoBihozH4DlI9ya6VzaI?= =?iso-8859-1?Q?3f0s5XKEGajzU/maVRc4qRlBkALmrwCFKkPIFyjj+GF+aQWwLzY5Ah27Kf?= =?iso-8859-1?Q?stQ4MLF9rO8gTxnlJan9mDMzXRJJZ7SLyGKipsvf18RF7drfeF9iVsp6ue?= =?iso-8859-1?Q?ygdxFHo7KQ6P2PZHAnsMEDf0Fqdb3+qLn50va3N6YK4KlTu5IPfCKqJp59?= =?iso-8859-1?Q?9DGu7tgWSKSu0qFiIDyymACem1Dov+wV67Ihl05zpE74lmPPA8Pug92wjX?= =?iso-8859-1?Q?kQJ9tZJ5vGeBwndCwZ4VzSEkYx3iCY/WepON047VvGo5gQV6IWNG23AqxA?= =?iso-8859-1?Q?ibLgVvIpZ86WqplquZc+zziwAQDmHKZNGdx+MKWWA8IXmJ3sSSg5j1KC4G?= =?iso-8859-1?Q?E++Az41F9sU/cRG7qK6FAIDH3OanHdjcgJay5bEs3xxxlrzU+GHvCzVNN/?= =?iso-8859-1?Q?sf2gshSatuEjM8gdKhfLtThJTeqXGL0ZeXNNmBkabEAYKVbE3LlCJy9IOA?= =?iso-8859-1?Q?q7Bdsf/qv5FPVfU5yl3Eeqq6JFSo3vGD5EESO/5rHY995R/XvAS2cUsz8K?= =?iso-8859-1?Q?4HtMEprc9psCVV+TyhHd1Tfo8xyR/FM46eO62lqz3HaFG+9Q=3D?= X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;5:FdyeYEhHLobqLsxIike5QiAPToONjn0lzw8NuGjX2pexFqwEbp9RsnBNCM5mn/ze/yDK+7atLA+PV812VduCF03JE+CnzRQuxxJcH9gM/+TgDIllGHveaR9PdRyu20rz42FWq885aSdtrBJrUc2PPA==;24:w6+LxCH8NHYZ5iPMjR95zLXvxsjtwf4OX9EEzNm5z3LqqaKmmFOxz/BXpkA7p8LmE13CtKst1cDEelZb3R2rDnzyIcbQol/D7Y8TuEVPn9U= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Apr 2016 20:06:15.0114 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DF4PR84MB0316 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4236 Lines: 130 On 04/09/2016 08:19 PM, Thomas Gleixner wrote: > On Fri, 8 Apr 2016, Waiman Long wrote: >> This patch attempts to reduce HPET read contention by using the fact >> that if more than one task are trying to access HPET at the same time, >> it will be more efficient if one task in the group reads the HPET >> counter and shares it with the rest of the group instead of each >> group member reads the HPET counter individually. > That has nothing to do with tasks. clocksource reads can happen from almost > any context. The problem is concurrent access on multiple cpus. You are right. I should have used CPU instead. >> This optimization is enabled on systems with more than 32 CPUs. It can >> also be explicitly enabled or disabled by using the new opt_read_hpet >> kernel parameter. > Please not. What's wrong with enabling it unconditionally? > That is nothing wrong to enable it unconditionally. I am just not sure if that is the right thing to do. Since both you and Andy said we should enable it unconditionally, I will do so in the next version of the patch. >> +/* >> * Clock source related code >> */ >> static cycle_t read_hpet(struct clocksource *cs) >> { >> - return (cycle_t)hpet_readl(HPET_COUNTER); >> + int seq, cnt = 0; >> + u32 time; >> + >> + if (opt_read_hpet<= 0) >> + return (cycle_t)hpet_readl(HPET_COUNTER); > This wants to be conditional on CONFIG_SMP. No point in having all that muck > around for an UP kernel. Will do so. >> + seq = READ_ONCE(hpet_save.seq); >> + if (!HPET_SEQ_LOCKED(seq)) { >> + int old, new = seq + 1; >> + unsigned long flags; >> + >> + local_irq_save(flags); >> + /* >> + * Set the lock bit (lsb) to get the right to read HPET >> + * counter directly. If successful, read the counter, save >> + * its value, and increment the sequence number. Otherwise, >> + * increment the sequnce number to the expected locked value >> + * for comparison later on. >> + */ >> + old = cmpxchg(&hpet_save.seq, seq, new); >> + if (old == seq) { >> + time = hpet_readl(HPET_COUNTER); >> + WRITE_ONCE(hpet_save.hpet, time); >> + >> + /* Unlock */ >> + smp_store_release(&hpet_save.seq, new + 1); >> + local_irq_restore(flags); >> + return (cycle_t)time; >> + } >> + local_irq_restore(flags); >> + seq = new; >> + } >> + >> + /* >> + * Wait until the locked sequence number changes which indicates >> + * that the saved HPET value is up-to-date. >> + */ >> + while (READ_ONCE(hpet_save.seq) == seq) { >> + /* >> + * Since reading the HPET is much slower than a single >> + * cpu_relax() instruction, we use two here in an attempt >> + * to reduce the amount of cacheline contention in the >> + * hpet_save.seq cacheline. >> + */ >> + cpu_relax(); >> + cpu_relax(); >> + >> + if (likely(++cnt<= HPET_RESET_THRESHOLD)) >> + continue; >> + >> + /* >> + * In the unlikely event that it takes too long for the lock >> + * holder to read the HPET, we do it ourselves and try to >> + * reset the lock. This will also break a deadlock if it >> + * happens, for example, when the process context lock holder >> + * gets killed in the middle of reading the HPET counter. >> + */ >> + time = hpet_readl(HPET_COUNTER); >> + WRITE_ONCE(hpet_save.hpet, time); >> + if (READ_ONCE(hpet_save.seq) == seq) { >> + if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq) >> + pr_warn("read_hpet: reset hpet seq to 0x%x\n", >> + seq + 1); > This is voodoo programming and actively dangerous. > > CPU0 CPU1 CPU2 > lock_hpet() > T1=read_hpet() wait_for_unlock() > store_hpet(T1) > .... > T2 = read_hpet() > unlock_hpet() > lock_hpet() > T3 = read_hpet() > store_hpet(T3) > unlock_hpet() > return T3 > lock_hpet() > T4 = read_hpet() wait_for_unlock() > store_hpet(T4) > store_hpet(T2) > unlock_hpet() return T2 > > CPU2 will observe time going backwards. > > Thanks, > > tglx That part is leftover code from my testing and debugging effort. I think using local_irq_save() should allow the critical section to be executed without interruption. In this case, I should be able to remove the threshold checking code without harm. Thanks for the review. Cheers, Longman