Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1732637imm; Thu, 18 Oct 2018 03:20:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV60c1PEZLf5+kZE3N3JRtq9iDREGQ8fx3df0Plnf8Sb1yyoLjDM9Dj0LDFdkjtg3u9NRC8Je X-Received: by 2002:a17:902:b945:: with SMTP id h5-v6mr29787009pls.61.1539858017210; Thu, 18 Oct 2018 03:20:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539858017; cv=none; d=google.com; s=arc-20160816; b=CI9lVO6moKK6Fli8ISbw6Tk+5PhwG9A1euQHkcvRvWbqy+wpz/QrYOqvmX4/46DMlK DXNPQXrdAZ0sUZZQPY+0SdPTM1F56duBibrHMT7XXR5zvkSU0GLhKqphm5YR6sd0R7hS xnfS/ruTuZvzvovxzRYZZAYEOUEvE81ygQe7DUob8Ow8KwbCQfC0b8n58/kDOBj/189u jZPR2jke9U6XXIrMcmfAHBtD1ZHWhGbu9/3MJphVN3OEGQV807+sOUPRb7TlCj6o43CE a9WHgFDB92p62RDYtwHxa76tQ5DAYcKSvIA+fTmK1UVmbuPDqnKvBbYZUnTtuLpRhJr4 7Bmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=7E0dkRK11RNn7ysnlvgsaF1hIhZQEgcUevKDa/GhHaU=; b=N2Rm5BowQ4IM6lNwVMMpx5sWcNcb0yiyl9qF8f/6kkQiLOQ8VuO3BeEvFiJoUcsA4R woW22k0MTgRFru3z9fPRBsoT85Ar1X2ZX3VdykKnknvJmPDj60KSqH0+nNCLj7RocBVT q1JmYNGcxvP/3dBVNmpGtikc1dEIvb9aPpLiJGK1ysZsDYkqeFFBM50MUf8bSuMAt0/p AY2GKZLBvZMO60pdtNyfhRQlbyiz37o5O33bC0GrEZNCc8v04B70v2t3Tn+lyV7N+GB4 9/nCSLqqRzHK0wxQKaYv5/KZYsGMTYB7UoXTkd20+7xzFUySgkVuqjV2XQtMpzth3jLT iI3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5-v6si20520277pgm.268.2018.10.18.03.20.01; Thu, 18 Oct 2018 03:20:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727829AbeJRSKb (ORCPT + 99 others); Thu, 18 Oct 2018 14:10:31 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:37014 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727363AbeJRSKa (ORCPT ); Thu, 18 Oct 2018 14:10:30 -0400 Received: by mail-ed1-f67.google.com with SMTP id c22-v6so27699244edc.4 for ; Thu, 18 Oct 2018 03:10:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=7E0dkRK11RNn7ysnlvgsaF1hIhZQEgcUevKDa/GhHaU=; b=jAWwhyebb1lmSRsC78c65MyxX6XUnn/+T1oE0213Adr3qPsUKBZZSuOhIYRBMoBxD/ IZAryr3xnRRy7soLy/C7oTevkc6GH01tJrpfzOrQ5IsSUeFsOvviTLGCSsnuLjkh0yPG Zb+Kvf4gymTy+GqTAobeYQVOdGXqC26lKcBWGz81nCK4tv1XFadc3UuSN40ciRV+uL0x 5LdyDerEWPfe9j6wkzgLlLw2NV3AbrXNuTHanpaps7byH3PyVQpTJRArcPzVDXv+Djze sn/WMeLVxcTQ6W+ob6uFGt+956rExcYUUWfI6KNV5XqcUVlBA+07Yz/1mHDE6ubNZt6I EyIw== X-Gm-Message-State: ABuFfoiaQEB4ZoG6ysNfxXYCwp80oOVTbwmqirePXGh/Ol5ifA6LW261 0ybAEbFyr7ZBSr4nljdGme47qA== X-Received: by 2002:a50:89ab:: with SMTP id g40-v6mr2782629edg.257.1539857411965; Thu, 18 Oct 2018 03:10:11 -0700 (PDT) Received: from localhost.localdomain (p200300EF2BD1D76FE429868C6209AAF5.dip0.t-ipconnect.de. [2003:ef:2bd1:d76f:e429:868c:6209:aaf5]) by smtp.gmail.com with ESMTPSA id c40-v6sm7086401edb.41.2018.10.18.03.10.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 18 Oct 2018 03:10:11 -0700 (PDT) Date: Thu, 18 Oct 2018 12:10:08 +0200 From: Juri Lelli To: Peter Zijlstra Cc: Thomas Gleixner , Juri Lelli , syzbot , Borislav Petkov , "H. Peter Anvin" , LKML , mingo@redhat.com, nstange@suse.de, syzkaller-bugs@googlegroups.com, Luca Abeni , henrik@austad.us, Tommaso Cucinotta , Claudio Scordino , Daniel Bristot de Oliveira Subject: Re: INFO: rcu detected stall in do_idle Message-ID: <20181018101008.GB21611@localhost.localdomain> References: <000000000000a4ee200578172fde@google.com> <20181016140322.GB3121@hirez.programming.kicks-ass.net> <20181016144045.GF9130@localhost.localdomain> <20181016153608.GH9130@localhost.localdomain> <20181018082838.GA21611@localhost.localdomain> <20181018094850.GW3121@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181018094850.GW3121@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18/10/18 11:48, Peter Zijlstra wrote: > On Thu, Oct 18, 2018 at 10:28:38AM +0200, Juri Lelli wrote: > > > Another side problem seems also to be that with such tiny parameters we > > spend lot of time in the while (dl_se->runtime <= 0) loop of replenish_dl_ > > entity() (actually uselessly, as deadline is most probably going to > > still be in the past when eventually runtime becomes positive again), as > > delta_exec is huge w.r.t. runtime and runtime has to keep up with tiny > > increments of dl_runtime. I guess we could ameliorate things here by > > limiting the number of time we execute the loop before bailing out. > > That's the "DL replenish lagged too much" case, right? Yeah, there is > only so much we can recover from. Right. > Funny that GCC actually emits that loop; sometimes we've had to fight > GCC not to turn that into a division. > > But yes, I suppose we can put a limit on how many periods we can lag > before just giving up. OK. > > So, I tend to think that we might want to play safe and put some higher > > minimum value for dl_runtime (it's currently at 1ULL << DL_SCALE). > > Guess the problem is to pick a reasonable value, though. Maybe link it > > someway to HZ? Then we might add a sysctl (or similar) thing with which > > knowledgeable users can do whatever they think their platform/config can > > support? > > Yes, a HZ related limit sounds like something we'd want. But if we're > going to do a minimum sysctl, we should also consider adding a maximum, > if you set a massive period/deadline, you can, even with a relatively > low u, incur significant delays. > > And do we want to put the limit on runtime or on period ? > > That is, something like: > > TICK_NSEC/2 < period < 10*TICK_NSEC > > and/or > > TICK_NSEC/2 < runtime < 10*TICK_NSEC > > Hmm, for HZ=1000 that ends up with a max period of 10ms, that's far too > low, 24Hz needs ~41ms. We can of course also limit the runtime by > capping u for users (as we should anyway). I also thought of TICK_NSEC/2 as a reasonably safe lower limit, that will implicitly limit period as well since runtime <= deadline <= period Not sure about the upper limit, though. Lower limit is something related to the inherent granularity of the platform/config, upper limit is more to do with highest prio stuff with huge period delaying everything else; doesn't seem to be related to HZ? Maybe we could just pick something that seems reasonably big to handle SCHED_DEADLINE users needs and not too big to jeopardize everyone else, say 0.5s?