Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp1365124pxv; Fri, 23 Jul 2021 06:41:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwtXu/VQE2qlfTWxwcb0pcZDzNYk27wEu93UaZFkNj1PjAjHps88Z7akilkGJjk7bO0p5m1 X-Received: by 2002:a05:6e02:1a8b:: with SMTP id k11mr3615355ilv.136.1627047661413; Fri, 23 Jul 2021 06:41:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627047661; cv=none; d=google.com; s=arc-20160816; b=d0p6h2eJQit1aPAcxp8hZ12JEDRniMQRtN3DpygSGpZLsmQLYB4H0W6XwAksg/u8gh Kiz8OraeWW+Nn1lyWliAvcmz6H92AHZtpZ+P75F38nq2sepwBBUb0PyQLl0MMK/dHwJZ QSiDCy4lf8zoT4v16+jGkcrGrhvQTafR2Jb2rhaaMZBWYspzCPe19qamoImcIhxzNdKm OoAX9JPkbnMm5KWbpcX+w0YJzlSNfbM+YX/cxFb8hTJjL9Dr3Zywt0rCma8n89v6y5Tl KBZrrby13KHJsoSVf+4Iod24YWdILHQzzFn1ZTfZ7jmx97ssXR18+hiqNNPqccm/pI5y blfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vn0/0GkL0S3h1OMfPoUZw0xiOiSwbFnbpLTaXgDIttk=; b=td4tL9Sit4T5sBb9EPg9HO3Kl9iLWXX5rtY4o4SCjppq6zcsO/LDIcVy2Zy/jHH2xe hvA8CWeM9s30joY7eIk10TixiRO876Kx1/KhIxjUa61aenIShh2XuvDeBSQIt1AZKFba 5sO5gZisuMkbuRuMdY0yz64S6exCG4tlNU/Rk+Eb4y9csCVRrQ5u8tW6Soeezq9aNKl6 wBUVIc2UGEQWkzPsuVzg5MsWH9jFNR0EMhC+3GYfs03zWpP5inIylxexireg+e79mg2k 5PMIIrJlxkqC1JTuXOTjcTe6neknYO+VhUt2NreWTeCovN1kQ5BcpmxPgyHmZJeHZlzI +l5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e+javnxP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u11si44134384jau.121.2021.07.23.06.40.49; Fri, 23 Jul 2021 06:41:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e+javnxP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235248AbhGWM6Z (ORCPT + 99 others); Fri, 23 Jul 2021 08:58:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34133 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235245AbhGWM6Y (ORCPT ); Fri, 23 Jul 2021 08:58:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1627047538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vn0/0GkL0S3h1OMfPoUZw0xiOiSwbFnbpLTaXgDIttk=; b=e+javnxPkaz8VvZW9GNsJR0qclbbX6/17hgUYHhrtbegZ1XHPtvw4MmJ2PV4L2VT9CeMhJ 2RhjohueQduugNaMUsXyuVUQfw+L4KstalukQv5g7RlnIV/dZIIJdrrw69d0LFZWPkFN1q QtNqSzdGt5Kl57YA4q0aGVXjafoCzUc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-246-pFCaFyLxMSav7OVAyPm3_w-1; Fri, 23 Jul 2021 09:38:54 -0400 X-MC-Unique: pFCaFyLxMSav7OVAyPm3_w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 85F9087D543; Fri, 23 Jul 2021 13:38:53 +0000 (UTC) Received: from lorien.usersys.redhat.com (unknown [10.22.32.128]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0CE88189BB; Fri, 23 Jul 2021 13:38:45 +0000 (UTC) Date: Fri, 23 Jul 2021 09:38:44 -0400 From: Phil Auld To: Peter Zijlstra Cc: linux-kernel , Waiman Long , Ingo Molnar , Juri Lelli , Vincent Guittot , stable@vger.kernel.org Subject: Re: [PATCH] sched: Fix nr_uninterruptible race causing increasing load average Message-ID: References: <20210707190457.60521-1-pauld@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 09, 2021 at 01:38:20PM +0200 Peter Zijlstra wrote: > On Thu, Jul 08, 2021 at 09:25:45AM -0400, Phil Auld wrote: > > Hi Peter, > > > > On Thu, Jul 08, 2021 at 09:26:26AM +0200 Peter Zijlstra wrote: > > > On Wed, Jul 07, 2021 at 03:04:57PM -0400, Phil Auld wrote: > > > > On systems with weaker memory ordering (e.g. power) commit dbfb089d360b > > > > ("sched: Fix loadavg accounting race") causes increasing values of load > > > > average (via rq->calc_load_active and calc_load_tasks) due to the wakeup > > > > CPU not always seeing the write to task->sched_contributes_to_load in > > > > __schedule(). Missing that we fail to decrement nr_uninterruptible when > > > > waking up a task which incremented nr_uninterruptible when it slept. > > > > > > > > The rq->lock serialization is insufficient across different rq->locks. > > > > > > > > Add smp_wmb() to schedule and smp_rmb() before the read in > > > > ttwu_do_activate(). > > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > > index 4ca80df205ce..ced7074716eb 100644 > > > > --- a/kernel/sched/core.c > > > > +++ b/kernel/sched/core.c > > > > @@ -2992,6 +2992,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, > > > > > > > > lockdep_assert_held(&rq->lock); > > > > > > > > + /* Pairs with smp_wmb in __schedule() */ > > > > + smp_rmb(); > > > > if (p->sched_contributes_to_load) > > > > rq->nr_uninterruptible--; > > > > > > > > > > Is this really needed ?! (this question is a big fat clue the comment is > > > insufficient). AFAICT try_to_wake_up() has a LOAD-ACQUIRE on p->on_rq > > > and hence the p->sched_contributed_to_load must already happen after. > > > > > > > Yes, it is needed. We've got idle power systems with load average of 530.21. > > Calc_load_tasks is 530, and the sum of both nr_uninterruptible and > > calc_load_active across all the runqueues is 530. Basically monotonically > > non-decreasing load average. With the patch this no longer happens. > > Have you tried without the rmb here? Do you really need both barriers? > You're right here. (I see now that you were asking about the rmb specifically in the first question) The rmb is not needed. I was unable to reproducde it with the upstream kernel. I still think it is a problem though since the code in question is all the same. The recent changes to unbound workqueues which make it more likely to run on the submitting cpu may be masking the problem since it obviously requires multiple cpus to hit. If I can isolate those changes I can try to revert them in upstream and see if I can get it there. I suppose pulling those changes back could get us past this but I'm not a fan of just hiding it by making it harder to hit. I've not gotten to the disable TTWU_QUEUE test, that's next... Cheers, Phil --