Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp4660618ybz; Tue, 28 Apr 2020 15:55:22 -0700 (PDT) X-Google-Smtp-Source: APiQypLNC8TGm+FOilfbDZu3HttwrGVaEpInwLOzFLSuRbIhkuimcUzoIwrplwmti8LEp4t6Qgyf X-Received: by 2002:a05:6402:14ce:: with SMTP id f14mr53584edx.244.1588114521839; Tue, 28 Apr 2020 15:55:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588114521; cv=none; d=google.com; s=arc-20160816; b=XhiHwjc0fbZCb0vpxHI6SqIzoMX3X/8SBUTb7bwSQPWj3XK8jL2JIdTGJI+fERCFhw 2NmKB6eOhiqQoGxs/oNQifo0fGS2imWiH7oEWn5W6CtbJAm1niDx93LIKhKya7FbREQG UAViRXdXjRikLtv8HWaJtZQ4VgEU6JaSmbR2Oigy5gjZrKO47ELCEFtM1xPlkTXCsazz BKuaJfigclkRdQRnjJIJBbDNKWIi6wD9tWg79/U/9qqX523dZ1MHC3mWR+BIRY5y+iGB lXJ857z/2lVWVcARwNoLb5x+7nzamUg1scIK5nof8KishKjb6eaMcahwfzCeYZI+0EoM ef8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:organization:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=ufHYW9Szgtlfn6kl1fo0AOdgn3k3qUj3jPCNjzfkXDU=; b=z51eyWaCKbigp+/Z7H+JrK8Nk/20ltxwhR6cQzLKnxPd52hVIdwKx6zuhj7dNJxRKW QX60ptCKJsq6u1quszxxCn54rpTKzBgoRhGOAU8FjNeovrAX8SnnzUwT5bgVV+jj1ka8 IC38h/IAPow+txZmZUy3WGYrOHskwUW5NeyZe6lz8lEMtON92WQO9G+0Ngn4U8jGju/2 yWpxFBhKzorIfRu75DR5xylZYvXvbl7VmZQW1d6qmC/5ATiqKqycvOtdfAldH+BiuT2/ YXvVPB4Nmbgs+sSLy3SESbtrKCOrtt652921+5O+4rKds8G3IgfEY7NyFnUw8a3NEtaI OAtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HWXUWvka; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dx23si2838306ejb.181.2020.04.28.15.54.11; Tue, 28 Apr 2020 15:55:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HWXUWvka; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726486AbgD1WwW (ORCPT + 99 others); Tue, 28 Apr 2020 18:52:22 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:48323 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726042AbgD1WwV (ORCPT ); Tue, 28 Apr 2020 18:52:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588114340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ufHYW9Szgtlfn6kl1fo0AOdgn3k3qUj3jPCNjzfkXDU=; b=HWXUWvkaViE8CENmKVcw4kK9wxqottkgqUtQqL0U9h9Aq8Zpb6SqdbqJOGQaGsehptQfxZ n6N7oa26RJ9WW7LSmqubfJMltzVHVgbF2NV3CTW/ZYfbxbqme61IBAgPaVA6Zu+adujagT eKXOS5M1insjXzc/Cpi/AghdUUrWnpM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-205-BGnQH7F_PX6ZifZbE81wYg-1; Tue, 28 Apr 2020 18:52:16 -0400 X-MC-Unique: BGnQH7F_PX6ZifZbE81wYg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A55451005510; Tue, 28 Apr 2020 22:52:14 +0000 (UTC) Received: from ovpn-112-24.phx2.redhat.com (ovpn-112-24.phx2.redhat.com [10.3.112.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id A0A971001B30; Tue, 28 Apr 2020 22:52:13 +0000 (UTC) Message-ID: Subject: Re: [RFC PATCH 3/3] sched,rt: break out of load balancing if an RT task appears From: Scott Wood To: Valentin Schneider Cc: Steven Rostedt , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Rik van Riel , Mel Gorman , linux-kernel@vger.kernel.org, linux-rt-users Date: Tue, 28 Apr 2020 17:52:13 -0500 In-Reply-To: References: <20200428050242.17717-1-swood@redhat.com> <20200428050242.17717-4-swood@redhat.com> Organization: Red Hat Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2020-04-28 at 17:33 -0500, Scott Wood wrote: > On Tue, 2020-04-28 at 22:56 +0100, Valentin Schneider wrote: > > On 28/04/20 06:02, Scott Wood wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index dfde7f0ce3db..e7437e4e40b4 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -9394,6 +9400,10 @@ static int should_we_balance(struct lb_env > > > *env) > > > struct sched_group *sg = env->sd->groups; > > > int cpu, balance_cpu = -1; > > > > > > + /* Run the realtime task now; load balance later. */ > > > + if (rq_has_runnable_rt_task(env->dst_rq)) > > > + return 0; > > > + > > > > I have a feeling this isn't very nice to CFS tasks, since we would now > > "waste" load-balance attempts if they happen to coincide with an RT task > > being runnable. > > > > On your 72 CPUs machine, the system-wide balance happens (at best) every > > 72ms if you have idle time, every ~2300ms otherwise (every balance > > CPU gets to try to balance however, so it's not as horrible as I'm > > making > > it sound). This is totally worst-case scenario territory, and you'd hope > > newidle_balance() could help here and there (as it isn't gated by any > > balance interval). > > > > Still, even for a single rq, postponing a system-wide balance for a > > full balance interval (i.e. ~2 secs worst case here) just because we had > > a > > single RT task running when we tried to balance seems a bit much. > > > > It may be possible to hack something to detect those cases and reset the > > interval to "now" when e.g. dequeuing the last RT task (& after having > > previously aborted a load-balance due to RT/DL/foobar). > > Yeah, some way to retry at an appropriate time after aborting a rebalance > would be good. Another option is to limit the bailing out to newidle balancing (as the patchset currently stands, it isn't checking the right rq for global balancing anyway). On RT the softirq runs from thread context, so enabling interrupts and (on RT) preemption should suffice to avoid latency problems in the global rebalance. -Scott