Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp6148583ybf; Thu, 5 Mar 2020 14:13:04 -0800 (PST) X-Google-Smtp-Source: ADFU+vuxa47+f2OhXjY4vH9dXArlf+yTn7+pMe87cefHCrwM1UCgqTd1hC7Qs//EtQNhkbC8OPd2 X-Received: by 2002:a9d:6a82:: with SMTP id l2mr60647otq.312.1583446384236; Thu, 05 Mar 2020 14:13:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583446384; cv=none; d=google.com; s=arc-20160816; b=tuNjSELXmGjkApSTFrua6xh8K1tO0VO1v9Ra1YX7F0mCDQzMQx2bEZLM5WgpLDQ/nL KwvYosF18+NZGoZeBqPeq3GlobyqLVPqosq6ndpyNSNg4Iwuer0Hrzxo4qBatFGPXd6w 2G+gTocW82qJTLYxMLvGaL+taBlOTnfjSaH1gxiH/Ghs5rXDbgB4VebLztohGn+KzQTW suGkD1pLySsoEEVfei3aj6stYIkuie2mOk4OyazgxTyaeoJspboT56oIlFny4jdqrNVU +Zz2O2sy9cMhZqx2Rio0yQiNd+KTKssVp8g75adKOUFD5ENYJe7FHRqjR0d0OexGxOql 5+cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=BLO01LFdRSFqT4wIw42RwnApKJS//R7HNVPhSavIu5E=; b=QbDzTX6HbGmJkijhYxf6+TTnn1QvOehSeHZmEVmG390EjHSfozAFG0a40XVh0QTo8D 92lHwRJDDKq+6yGY3DZrmWWMeeqAO6aGcU3HKHrnHkdqbaeSUzdVQ1o0KT/0GvBxAn5s mDwQcu/0eFWMrjiNomDGptSdVcL3zgV2ta3Mm2VMMxEcc1kZe5QvMNvz/9NANOyFxQlF Rdi+5NY+6U2BaX4uFYjQvPsp8hwtmEd9O4TZy8pHRWw5MvQJL4IXT27VPxrW/O7+3GIh FK6Kadcwgo0rbN5hSlp2dugvlHKmS73knpe6FDUgzGBsj86veQptzri1rkPNxfTqK7Up oZYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lSOEVXpU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b22si156231oib.160.2020.03.05.14.12.52; Thu, 05 Mar 2020 14:13:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lSOEVXpU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726204AbgCEWM0 (ORCPT + 99 others); Thu, 5 Mar 2020 17:12:26 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:37011 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726067AbgCEWM0 (ORCPT ); Thu, 5 Mar 2020 17:12:26 -0500 Received: by mail-io1-f65.google.com with SMTP id k4so136962ior.4 for ; Thu, 05 Mar 2020 14:12:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BLO01LFdRSFqT4wIw42RwnApKJS//R7HNVPhSavIu5E=; b=lSOEVXpUwzFvtYV9lmGONQikqTAtuG6EdUXH5lWDTLve2DThE4B1xwkm9WSm76mNUr k5OHw4ZuGMBo7+b4+fVJKErhm5b91u/FFMCVf9hsgub8ILjTywxxIaF3nLWmilri1RFY bVOmmMwSmsD7otMDz2bROoDXi/a2zqW5FIa/YQGT3d4V4e2LZTKnx7pM+yqBTa7Fpd+Z ZkcqwlFl/2YWKTg6Cl008e9wMsm40T/Ul89O7qGJjK1mL/MNsn3HbqMCiQaLlTT/p3iN NeBiQ1ndS49Vo7LW2B7z81RGKHiC0SZ9VgYN3raQ5bld/JqDFAotR0FA9IooxwofPU6x cZvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BLO01LFdRSFqT4wIw42RwnApKJS//R7HNVPhSavIu5E=; b=JPLggpi0XtDZLIjs6PL2OkrLQcDN+y/xeWALVsEUvb6uQK33Z2JIynw2InJ0xPw2u5 MI/L1L0C3AXoVgjetyQBArsduA7xc+buVAMP6SaIfv8c9DyAlwNxVQe6HDF6lU4kPv+E KiiYCDrrqiBnDMLWAP9SIzb+W60B3d0VBhFIV/u1GitCcjumHyrql58nR+Q2/c2iXkBg KZab3uj/wAnFOXz0l8n8d7vTG2FJynz2Un4HahRwcK864JP2/iB3qr79nUnS/b55YjlA p55lPOuZCEW25mzS375AxqQm2FGsO2jmZtcPGgM6yNexgYBMiwl+GX+OxsolBYjNQjNI 7ojg== X-Gm-Message-State: ANhLgQ2dPTL4qped6ZSU/Ako+8OhRrz7xec51yVCWBv7olw7HHn/Dvoy vnSRlW5QznHutsNb1wal9brepJuwv55e4hxEg9hhFQ== X-Received: by 2002:a02:13ca:: with SMTP id 193mr58093jaz.54.1583446345383; Thu, 05 Mar 2020 14:12:25 -0800 (PST) MIME-Version: 1.0 References: <20200304213941.112303-1-xii@google.com> <20200305075742.GR2596@hirez.programming.kicks-ass.net> In-Reply-To: <20200305075742.GR2596@hirez.programming.kicks-ass.net> From: Paul Turner Date: Thu, 5 Mar 2020 14:11:49 -0800 Message-ID: Subject: Re: [PATCH] sched: watchdog: Touch kernel watchdog in sched code To: Peter Zijlstra Cc: Xi Wang , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Josh Don , LKML , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 4, 2020 at 11:57 PM Peter Zijlstra wrote: > > On Wed, Mar 04, 2020 at 01:39:41PM -0800, Xi Wang wrote: > > The main purpose of kernel watchdog is to test whether scheduler can > > still schedule tasks on a cpu. In order to reduce latency from > > periodically invoking watchdog reset in thread context, we can simply > > touch watchdog from pick_next_task in scheduler. Compared to actually > > resetting watchdog from cpu stop / migration threads, we lose coverage > > on: a migration thread actually get picked and we actually context > > switch to the migration thread. Both steps are heavily protected by > > kernel locks and unlikely to silently fail. Thus the change would > > provide the same level of protection with less overhead. > > > > The new way vs the old way to touch the watchdogs is configurable > > from: > > > > /proc/sys/kernel/watchdog_touch_in_thread_interval > > > > The value means: > > 0: Always touch watchdog from pick_next_task > > 1: Always touch watchdog from migration thread > > N (N>0): Touch watchdog from migration thread once in every N > > invocations, and touch watchdog from pick_next_task for > > other invocations. > > > > This is configurable madness. What are we really trying to do here? See reply to Thomas, no config is actually required here. Focusing on the intended outcome: The goal is to improve jitter since we're constantly periodically preempting other classes to run the watchdog. Even on a single CPU this is measurable as jitter in the us range. But, what increases the motivation is this disruption has been recently magnified by CPU "gifts" which require evicting the whole core when one of the siblings schedules one of these watchdog threads. The majority outcome being asserted here is that we could actually exercise pick_next_task if required -- there are other potential things this will catch, but they are much more braindead generally speaking (e.g. a bug in pick_next_task itself).