Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp617749rwb; Wed, 9 Nov 2022 06:48:39 -0800 (PST) X-Google-Smtp-Source: AMsMyM7moxNZYr19VZwMgZMlofyNzzddloXskNQg8zV08s92G325hY5eFsLK57Twdck16RgMiHM8 X-Received: by 2002:a17:902:b945:b0:181:c6b6:abc with SMTP id h5-20020a170902b94500b00181c6b60abcmr61346294pls.75.1668005319381; Wed, 09 Nov 2022 06:48:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668005319; cv=none; d=google.com; s=arc-20160816; b=VFNjsLCpFmMpU6fi3HI+PCSq2N0MrTyJ/VwAucjk5S+0ze3M+4R2GR8ZGz5i+qrfkQ /6jHqyv8945CgNP0edYA4/XFAF1/fG3mH8HQarHHEgzBTp1uYZyxpQJb1ajRWO12hYsC c/wxw6BMpdmXvD9wzRzgqC7jMXdd5+nipt/ghX31z/4+99WkpRaTLNLuRGKZNdsQZJR3 l8i34Qnp8ylCkRRsRQBmufRCtL9CcV1wNtVaK2lqVw2iJA4Q/IUQXVzaYxKkkryF1Jtw BZrMzRDnxocgcoYnRmGHwWXB8+Dr0xgny/ZxQvOkWCWgAM6o7AYJqMwAc4aiJ/7BPSZi chWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=H/kupinpSst44eF+wiobEJ/iGFhftSz6MGnYdbqe++0=; b=cIwSrdiGVuymp53I4g47TvJn4swL69F6lXbOQaev/6nm0swo/NFvBf+6Bzov912ZKW WoXPwQpPPaB2ebWOB0uE+lyYlfWdmlTnIk3IC4NNbHUFaF86HeVov9xFZfCrwJ2kr8RQ hW9KgG2J8LOHO52sACSaiHifJssEaF9N86x0GigkcvlKqx8wNaV2LbxJOWelFsF03RmZ Dfg0hggQB68Hqjd/XZxj2RsesJybVmPA2iiJ+1+XdxhE097Kr1F3rGApYxQY5FMIMu0U gKO6bNzr/p89fkA1ZwbhyMw+PYZgpFHL/ey7WDwfpUXpi8Ab8jzJDkelymXBo/RRQ412 gg1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p18-20020a62ab12000000b005633766252asi15045154pff.313.2022.11.09.06.48.27; Wed, 09 Nov 2022 06:48:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229986AbiKINw6 (ORCPT + 92 others); Wed, 9 Nov 2022 08:52:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229809AbiKINw4 (ORCPT ); Wed, 9 Nov 2022 08:52:56 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C61BA1CB24 for ; Wed, 9 Nov 2022 05:52:54 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B54321FB; Wed, 9 Nov 2022 05:53:00 -0800 (PST) Received: from [10.57.3.250] (unknown [10.57.3.250]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DA3363F73D; Wed, 9 Nov 2022 05:52:51 -0800 (PST) Message-ID: <9ca45a07-00ba-9afd-2e25-7bab6cefab0e@arm.com> Date: Wed, 9 Nov 2022 14:52:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: Crash with PREEMPT_RT on aarch64 machine Content-Language: en-US To: Jan Kara , Mark Rutland Cc: Waiman Long , Sebastian Andrzej Siewior , LKML , Thomas Gleixner , Steven Rostedt , Mel Gorman , Peter Zijlstra , Ingo Molnar , Will Deacon , Catalin Marinas References: <20221103115444.m2rjglbkubydidts@quack3> <20221107135636.biouna36osqc4rik@quack3> <359cc93a-fce0-5af2-0fd5-81999fad186b@redhat.com> <20221108174529.pp4qqi2mhpzww77p@quack3> <20221109110133.txft66ukwfw2ifkj@quack3> From: Pierre Gondois In-Reply-To: <20221109110133.txft66ukwfw2ifkj@quack3> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/9/22 12:01, Jan Kara wrote: > On Wed 09-11-22 09:55:07, Mark Rutland wrote: >> On Tue, Nov 08, 2022 at 06:45:29PM +0100, Jan Kara wrote: >>> On Tue 08-11-22 10:53:40, Mark Rutland wrote: >>>> On Mon, Nov 07, 2022 at 11:49:01AM -0500, Waiman Long wrote: >>>>> On 11/7/22 10:10, Sebastian Andrzej Siewior wrote: >>>>>> + locking, arm64 >>>>>> >>>>>> On 2022-11-07 14:56:36 [+0100], Jan Kara wrote: >>>>>>>> spinlock_t and raw_spinlock_t differ slightly in terms of locking. >>>>>>>> rt_spin_lock() has the fast path via try_cmpxchg_acquire(). If you >>>>>>>> enable CONFIG_DEBUG_RT_MUTEXES then you would force the slow path which >>>>>>>> always acquires the rt_mutex_base::wait_lock (which is a raw_spinlock_t) >>>>>>>> while the actual lock is modified via cmpxchg. >>>>>>> So I've tried enabling CONFIG_DEBUG_RT_MUTEXES and indeed the corruption >>>>>>> stops happening as well. So do you suspect some bug in the CPU itself? >>>>>> If it is only enabling CONFIG_DEBUG_RT_MUTEXES (and not whole lockdep) >>>>>> then it looks very suspicious. >>>>>> CONFIG_DEBUG_RT_MUTEXES enables a few additional checks but the main >>>>>> part is that rt_mutex_cmpxchg_acquire() + rt_mutex_cmpxchg_release() >>>>>> always fail (and so the slowpath under a raw_spinlock_t is done). >>>>>> >>>>>> So if it is really the fast path (rt_mutex_cmpxchg_acquire()) then it >>>>>> somehow smells like the CPU is misbehaving. >>>>>> >>>>>> Could someone from the locking/arm64 department check if the locking in >>>>>> RT-mutex (rtlock_lock()) is correct? >>>>>> >>>>>> rtmutex locking uses try_cmpxchg_acquire(, ptr, ptr) for the fastpath >>>>>> (and try_cmpxchg_release(, ptr, ptr) for unlock). >>>>>> Now looking at it again, I don't see much difference compared to what >>>>>> queued_spin_trylock() does except the latter always operates on 32bit >>>>>> value instead a pointer. >>>>> >>>>> Both the fast path of queued spinlock and rt_spin_lock are using >>>>> try_cmpxchg_acquire(), the only difference I saw is the size of the data to >>>>> be cmpxchg'ed. qspinlock uses 32-bit integer whereas rt_spin_lock uses >>>>> 64-bit pointer. So I believe it is more on how the arm64 does cmpxchg. I >>>>> believe there are two different ways of doing it depending on whether LSE >>>>> atomics is available in the platform. So exactly what arm64 system is being >>>>> used here and what hardware capability does it have? >>>> >>>> From the /proc/cpuinfo output earlier, this is a Neoverse N1 system, with the >>>> LSE atomics. Assuming the kernel was built with support for atomics in-kernel >>>> (which is selected by default), it'll be using the LSE version. >>> >>> So I was able to reproduce the corruption both with LSE atomics enabled & >>> disabled in the kernel. It seems the problem takes considerably longer to >>> reproduce with LSE atomics enabled but it still does happen. >>> >>> BTW, I've tried to reproduced the problem on another aarch64 machine with >>> CPU from a different vendor: >>> >>> processor : 0 >>> BogoMIPS : 200.00 >>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm >>> CPU implementer : 0x48 >>> CPU architecture: 8 >>> CPU variant : 0x1 >>> CPU part : 0xd01 >>> CPU revision : 0 >>> >>> And there the problem does not reproduce. So might it be a genuine bug in >>> the CPU implementation? >> >> Perhaps, though I suspect it's more likely that we have an ordering bug in the >> kernel code, and it shows up on CPUs with legitimate but more relaxed ordering. >> We've had a couple of those show up on Apple M1, so it might be worth trying on >> one of those. >> >> How easy is this to reproduce? What's necessary? > > As Pierre writes, on Ampere Altra machine running dbench benchmark on XFS > filesystem triggers this relatively easily (it takes it about 10 minutes to > trigger without atomics and about 30 minutes to trigger with the atomics > enabled). > > Running the benchmark on XFS somehow seems to be important, we didn't see > the crash happen on ext4 (which may just mean it is less frequent on ext4 > and didn't trigger in our initial testing after which we've started to > investigate crashes with XFS). > > Honza It was possible to reproduce on an Ampere eMAG. It takes < 1min to reproduce once dbench is launched and seems more likely to trigger with the previous diff applied. It even sometimes triggers without launching dbench on the Altra. /proc/cpuinfo for eMAG: processor : 0 BogoMIPS : 80.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x50 CPU architecture: 8 CPU variant : 0x3 CPU part : 0x000 CPU revision : 2