Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2945370pxb; Mon, 16 Nov 2020 01:13:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyq764cKqyOagDfDlYlK7YjZYMkurFGwiru/VxkgeErs9Qy0ikLsi+rHEjW+NuXKo7hp02R X-Received: by 2002:a17:906:1f86:: with SMTP id t6mr13510044ejr.356.1605518003083; Mon, 16 Nov 2020 01:13:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605518003; cv=none; d=google.com; s=arc-20160816; b=M0WSU5QsIwp22BOfB9tWNPRRVmi7/LqvyHKYVi6AwIAzsFYIoLFzsXxfi9zUkbDhAr vam/xHM9tjEwbVP2vD91oMyEbvCqiuOoDqffq6iw3VD31rD1cuCjdBW+6SjFZmZcTC2h hFTkGXt9u5F7yeK3bu2OBlz3L86X904oxfBaT4ONVtqy2o5+5+OHRtoc2hhyI0X5BuOF 6ackL7ekmJgkJ6dsdo8cz5xVBdRUOcjTfJfLkBlKVUoudb5Rnm7t4BscWro/JO+QlYLl Gn9b4uVgEadKAXO/kJEnIVWQ38fO47BX/ZF/kpfUfLCPbkYCRGJj1j+Kv9l/NmpkUvMF AaCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:cc:to:from:date; bh=YFBGKQDLHUY/x5A5pqEg6e9kfcKTs2ZbpphbLqmIF1U=; b=f9Xyijk7XMHADBj/KQqdAT21rRtFsk/R3Oa5vxDDLE7CE2jtkMwgMeSRdsExojRXFq ol3LYJUqtKX3RtP4RfTkAF7mCzzA1pc5jxdUA7xlq1tcRjjCBBHL9do83199roaUWjF+ gml9MnBWtm2XTHNNgVMbgQ1KhdsZFwLLjrAs8VdpxjanOdJvMeGJceoXnlcyWtwrwL/9 ksoxPKPhtgscHnouoyWWkv5qdAuhZxGV1JWtGGyQWBiXjHrOQkDCXPUczz9QFY/u7Aey E2oMR+v66QsJH76e2FfPtbPYbKhmSP5XkHWRxF0aP1yrVCmKRx13SZIGivRDuGOGP24c FxCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g23si11471248edr.322.2020.11.16.01.12.59; Mon, 16 Nov 2020 01:13:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728426AbgKPJLA (ORCPT + 99 others); Mon, 16 Nov 2020 04:11:00 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:50697 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727789AbgKPJK7 (ORCPT ); Mon, 16 Nov 2020 04:10:59 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id 923C71C43BF for ; Mon, 16 Nov 2020 09:10:57 +0000 (GMT) Received: (qmail 12950 invoked from network); 16 Nov 2020 09:10:57 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 16 Nov 2020 09:10:56 -0000 Date: Mon, 16 Nov 2020 09:10:54 +0000 From: Mel Gorman To: Peter Zijlstra , Will Deacon Cc: Davidlohr Bueso , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Loadavg accounting error on arm64 Message-ID: <20201116091054.GL3371@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I got cc'd internal bug report filed against a 5.8 and 5.9 kernel that loadavg was "exploding" on arch64 on a machines acting as a build servers. It happened on at least two different arm64 variants. That setup is complex to replicate but fortunately can be reproduced by running hackbench-process-pipes while heavily overcomitting a machine with 96 logical CPUs and then checking if loadavg drops afterwards. With an MMTests clone, I reproduced it as follows ./run-mmtests.sh --config configs/config-workload-hackbench-process-pipes --no-monitor testrun; \ for i in `seq 1 60`; do cat /proc/loadavg; sleep 60; done Load should drop to 10 after about 10 minutes and it does on x86-64 but remained at around 200+ on arm64. The reproduction case simply hammers the case where a task can be descheduling while also being woken by another task at the same time. It takes a long time to run but it makes the problem very obvious. The expectation is that after hackbench has been running and saturating the machine for a long time. Commit dbfb089d360b ("sched: Fix loadavg accounting race") fixed a loadavg accounting race in the generic case. Later it was documented why the ordering of when p->sched_contributes_to_load is read/updated relative to p->on_cpu. This is critical when a task is descheduling at the same time it is being activated on another CPU. While the load/stores happen under the RQ lock, the RQ lock on its own does not give any guarantees on the task state. Over the weekend I convinced myself that it must be because the implementation of smp_load_acquire and smp_store_release do not appear to implement acquire/release semantics because I didn't find something arm64 that was playing with p->state behind the schedulers back (I could have missed it if it was in an assembly portion as I can't reliablyh read arm assembler). Similarly, it's not clear why the arm64 implementation does not call smp_acquire__after_ctrl_dep in the smp_load_acquire implementation. Even when it was introduced, the arm64 implementation differed significantly from the arm implementation in terms of what barriers it used for non-obvious reasons. Unfortunately, making that work similar to the arch-independent version did not help but it's not helped that I know nothing about the arm64 memory model. I'll be looking again today to see can I find a mistake in the ordering for how sched_contributes_to_load is handled but again, the lack of knowledge on the arm64 memory model means I'm a bit stuck and a second set of eyes would be nice :( -- Mel Gorman SUSE Labs