Received: by 10.223.164.202 with SMTP id h10csp1598672wrb; Mon, 27 Nov 2017 05:03:47 -0800 (PST) X-Google-Smtp-Source: AGs4zMbJIh8jbVPhat1KVhJ7rJ6IOGym5inb5NqvM/M6yaH2+Qe2j9i8dUb9Ghj79qHPrShrDt0H X-Received: by 10.98.138.149 with SMTP id o21mr36777056pfk.225.1511787827708; Mon, 27 Nov 2017 05:03:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511787827; cv=none; d=google.com; s=arc-20160816; b=TUS1z5XWUTGd83BxabezR3i42xwtOzrAoqHgKJOWefdDLkJ/wU6jLoxOLl7SILQioc PREtLIQ5Z9KjGTpUvYkkq1xmEjJL41tOYqAZDZBb5HLwep6O+wUctWA1A/yB60PUAr92 v3u9Au8cRZunDDfLtwA/lDUhfoAMsiabihWvjOM3ybDRmohKZBaxxKzL5EoTVJJY408+ +ahkxu8tghAdDUFrqfHqriojzX87cq31d1DGRDLTXEWdP/pTRFR0O7yF0rEFBdmhVZns tLgOXgnxEv4x0duembapqIYq2buKakgyDf7PF1YtW5stoKi8W7xZnPSKy7zYV47d8Qru cQRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=VyIIfvyiEVGfpmTNy45Ac6zs4e1k/JC0iaCyLeEyxXE=; b=FchFJyb79fiub4OUiisnQQyX3+jew3TB428Ois7eDKOfOSQ9vCiN1WOq3HQBsOggrc jym9++Ooe19usHqGyYS4s3IqZvZNPW8bOrCHGyPaiiNT11RYRfjUPJCbOg+DEXlN+P+W IyqzAMiloMPHByHsED+QZNzgPDqrq1bDWA3FKFNTB+NaCRu38MCdMzqjast1RZHLg4YS jnW2YcNIvN2gSX3aHC8iUOAOkQu1TNoGXMOu8yT7CHdHExVF/Un3nGgRs85MRPip8dVS +azCy9nlHbHxX8vCDAEbFY21zly2INXUIpjijXmAZDjOdnaEkUVF+Sv0YDjMEE9do9do nITA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t80si25364958pfg.139.2017.11.27.05.03.35; Mon, 27 Nov 2017 05:03:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752829AbdK0NAi (ORCPT + 77 others); Mon, 27 Nov 2017 08:00:38 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57110 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752754AbdK0NAg (ORCPT ); Mon, 27 Nov 2017 08:00:36 -0500 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vARCx2AV055836 for ; Mon, 27 Nov 2017 08:00:36 -0500 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0a-001b2d01.pphosted.com with ESMTP id 2egj56uurf-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 27 Nov 2017 08:00:35 -0500 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 27 Nov 2017 13:00:33 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 27 Nov 2017 13:00:31 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vARD0VYa30212172; Mon, 27 Nov 2017 13:00:31 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 63D524C052; Mon, 27 Nov 2017 12:55:30 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 268DA4C06A; Mon, 27 Nov 2017 12:55:30 +0000 (GMT) Received: from mschwideX1 (unknown [9.152.212.220]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Nov 2017 12:55:30 +0000 (GMT) Date: Mon, 27 Nov 2017 14:00:28 +0100 From: Martin Schwidefsky To: Will Deacon Cc: Peter Zijlstra , Sebastian Ott , Ingo Molnar , Heiko Carstens , linux-kernel@vger.kernel.org Subject: Re: [bisected] system hang after boot In-Reply-To: <20171127125455.GC30679@arm.com> References: <20171122182659.GA22648@arm.com> <20171122202217.GO3326@worktop> <20171127114947.GA30679@arm.com> <20171127134918.4da71a78@mschwideX1> <20171127125455.GC30679@arm.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17112713-0016-0000-0000-00000505F2FB X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17112713-0017-0000-0000-00002841D287 Message-Id: <20171127140028.77cfb60a@mschwideX1> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-27_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711270180 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 27 Nov 2017 12:54:56 +0000 Will Deacon wrote: > Hi Martin, > > On Mon, Nov 27, 2017 at 01:49:18PM +0100, Martin Schwidefsky wrote: > > On Mon, 27 Nov 2017 11:49:48 +0000 > > Will Deacon wrote: > > > On Wed, Nov 22, 2017 at 09:22:17PM +0100, Peter Zijlstra wrote: > > > > On Wed, Nov 22, 2017 at 06:26:59PM +0000, Will Deacon wrote: > > > > > > > > > Now, I can't see what the break_lock is doing here other than causing > > > > > problems. Is there a good reason for it, or can you just try removing it > > > > > altogether? Patch below. > > > > > > > > The main use is spin_is_contended(), which in turn ends up used in > > > > __cond_resched_lock() through spin_needbreak(). > > > > > > > > This allows better lock wait times for PREEMPT kernels on platforms > > > > where the lock implementation itself cannot provide 'contended' state. > > > > > > > > In that capacity the write-write race shouldn't be a problem though. > > > > > > I'm not sure why it isn't a problem: given that the break_lock variable > > > can read as 1 for a lock that is no longer contended and 0 for a lock that > > > is currently contended, then the __cond_resched_lock is likely to see a > > > value of 0 (i.e. spin_needbreak always return false) more often than no > > > since it's checked by the lock holder. > > > > Grepping for 'break_lock' the two locking blueprints are the only places > > where the field is written to. Unless I am blind, the associated unlock > > functions do *not* reset 'break_lock'. > > > > Without the raw_##op##_can_lock(lock) check the first of the blueprints > > now looks like this: > > > > void __lockfunc __raw_##op##_lock(locktype##_t *lock) \ > > { \ > > for (;;) { \ > > preempt_disable(); \ > > if (likely(do_raw_##op##_trylock(lock))) \ > > break; \ > > preempt_enable(); \ > > \ > > if (!(lock)->break_lock) \ > > (lock)->break_lock = 1; \ > > while ((lock)->break_lock) \ > > arch_##op##_relax(&lock->raw_lock); \ > > } \ > > (lock)->break_lock = 0; \ > > } \ > > > > All it takes to create an endless loop is two CPUs, the first acquired the > > lock and the second tries to get the lock. After the unsuccessful trylock > > of the second CPU, the first CPU releases the lock and never tries to take > > it again. The second CPU will be stuck in an endless loop. > > Yes, it basically relies on the lock holder never winning that race. > However, Peter's use-case just needs the lock-holder to be able to detect > contention (which is always best-effort anyway), so I think we can make that > "work" by removing the while loop above (see my subsequent diff sent to > Sebastian). Well, what race? The lock hold just has to hold the lock while another CPU tries to get it. There is no particular bad timing involved, just a little bit of contention is enough. And yes, I think removing the while loop on break_lock will work. > It's still questionable, because on a machine with store-buffers you really > want to order writes to break_lock against something else, but it might > happen to fall out depending on the details of the trylock() implementation. Even more, if the compiler "proves" that nobody writes to break_lock it can convert that to "while (1)" loop. > > I guess my best course of action is to remove GENERIC_LOCKBREAK from > > arch/s390/Kconfig to avoid this construct altogether. Let us see what > > breaks if I do that .. > > We could just consider ripping out GENERIC_LOCKBREAK entirely, but I was > hoping we could get a simpler fix in for now. I would opt for removing it entirely. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From 1585224432431023938@xxx Mon Nov 27 13:03:46 +0000 2017 X-GM-THRID: 1584789385193417838 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread