Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp576063imm; Fri, 21 Sep 2018 05:03:09 -0700 (PDT) X-Google-Smtp-Source: ANB0VdapJMVM4L4SOdNOt/0nzIl+E8Bv9SRI7GHQcO05FB3liRjezTjoHxVpct9rExtEYEf15aEb X-Received: by 2002:a62:3c7:: with SMTP id 190-v6mr45714199pfd.145.1537531389199; Fri, 21 Sep 2018 05:03:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537531389; cv=none; d=google.com; s=arc-20160816; b=j+XFZowPkYTQ7PnAV5y273cWM07kbgXNZD29aG+nS4RSZUZkE1Q3QcGDTiXokDoNvC 9nQMH/WC4NQWi83U6GZ/SOUOEhouVNPAth4U6qYInNf+iZ/jA6TWHS5h8GYelGkiTRlR OLdCGzrN1bWu8Z9zaMM5rhlnlPneOQyt/VQtM0HlXA17e0SlmqRXZcogsmnsvnnwe5zx UbWPgg9ptWzFQ/t+7PM51PGQG1k6BgvpEywP9Rak+fj5jPkq/yoMznaSN/uRzK7o4Md6 UYDIVinXqiJK0STfmXuPucOLQ9mC5/JWPY072Hr3q23uNsIwOt/WTdgj6/XMx+Yv2gaT WzsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date; bh=uXmVeZA34ZxOTsatoZqyZPIXs6M2C4+Y0A29G8WGBiA=; b=cOJi3V/wBZNUBSvshhJCeUajs8TZ9KMj6WcddVj25JrThyu8mgO2TmttulwhK6tlGG wuECgljVGUgc0Q4+ZBLVvi5oEykqYCDF8IgrE5P4TzzT/a5EqOOJvjN8y2E2cnp/uDWB TAOjDkYyS1kaGqa6nTmGiFFs9AHvW9H/K47pTq/rUJnGURBiD4xT3fPf/kIH5q6kSWwS rHw1cmPsHTgAfYDBLGqefRADT1KuLzYnjXTGlkIMmz92T2EZ/+G2slV4W8Kdx8kgflba iNH/JahxdooRKUnZ9RkqWq2iAM9UBcETDInYWRg8EwwylFoYxaj9aGTzZ7UJW/24jFPq QGNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3-v6si25260508pld.366.2018.09.21.05.02.50; Fri, 21 Sep 2018 05:03:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390013AbeIURvF (ORCPT + 99 others); Fri, 21 Sep 2018 13:51:05 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:39720 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389968AbeIURvF (ORCPT ); Fri, 21 Sep 2018 13:51:05 -0400 Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1g3K8g-0000ey-MZ; Fri, 21 Sep 2018 14:02:26 +0200 Date: Fri, 21 Sep 2018 14:02:26 +0200 From: Sebastian Andrzej Siewior To: linux-kernel@vger.kernel.org Cc: Daniel Wagner , Peter Zijlstra , Will Deacon , x86@kernel.org, Linus Torvalds , "H. Peter Anvin" , Boqun Feng , "Paul E. McKenney" Subject: [Problem] Cache line starvation Message-ID: <20180921120226.6xjgr4oiho22ex75@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We reproducibly observe cache line starvation on a Core2Duo E6850 (2 cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4 cores). The problem can be triggered with a v4.9-RT kernel by starting cyclictest -S -p98 -m -i2000 -b 200 and as "load" stress-ng --ptrace 4 The reported maximal latency is usually less than 60us. If the problem triggers then values around 400us, 800us or even more are reported. The upperlimit is the -i parameter. Reproduction with 4.9-RT is almost immediate on Core2Duo, ARM64 and SKL, but it took 7.5 hours to trigger on v4.14-RT on the Core2Duo. Instrumentation show always the picture: CPU0 CPU1 => do_syscall_64 => do_syscall_64 => SyS_ptrace => syscall_slow_exit_work => ptrace_check_attach => ptrace_do_notify / rt_read_unlock => wait_task_inactive rt_spin_lock_slowunlock() -> while task_running() __rt_mutex_unlock_common() / check_task_state() mark_wakeup_next_waiter() | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock); | . . | raw_spin_unlock_irq(&p->pi_lock); . \ cpu_relax() . - . *IRQ* In the error case we observe that the while() loop is repeated more than 5000 times which indicates that the pi_lock can be acquired. CPU1 on the other side does not make progress waiting for the same lock with interrupts disabled. This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ the other CPU is able to acquire pi_lock and the situation relaxes. This matches Daniel Wagner's observations which he described in [0] on v4.4-RT. It took us weeks to hunt this down as it's really showing true Heisenbug behaviour and instrumentation at the wrong place makes it vanish immediately. Other data points: Replacing the single rep_nop() in cpu_relax() with two consecutive rep_nop() makes it harder to trigger on the Core2Duo machine. Adding a third one seems to resolve it completely, though no long term runs done yet. Also our attempt of instrumenting the lock()/unlock() sequence more deeply with rdtsc() timestamps made it go away. On SKL the identical instrumentation hides it. Peter suggested to do a clwb(&p->pi_lock); before the cpu_relax() in wait_task_inactive() which on both the Core2Duo and the SKL gets runtime patched to clflush(). That hides it as well. On ARM64 we suspected something like this without having real proof a week ago and added a delay loop to cpu_relax(). That "resolved" the issue. By now we have traces on ARM64 which proof it to be the same issue. Daniel reported that disabling ticket locks on 4.4 makes the problem go away, but he hasn't run a long time test yet and as we saw with 4.14 it can take quite a while. While it is pretty reliable reproducible with an RT kernel, this is not an RT kernel issue at all. Mainline has similar or even the same code patterns, but it's harder to observe as there are no latency guarantees for an observer like cyclictest. The fact that an interrupt makes it go away makes it even harder because the rare stalls just go unnoticed because they are canceled early. [0] https://lkml.kernel.org/r/4d878d3e-bd1b-3d6e-bdde-4ce840191093@monom.org Sebastian