Received: by 10.223.164.202 with SMTP id h10csp4372849wrb; Wed, 29 Nov 2017 05:28:47 -0800 (PST) X-Google-Smtp-Source: AGs4zMZy+nHGj33uJ3va5swqj81qz4FVlwNGNBi4G6hDfqZTHg4qEMY2h6Xq9PhHYTArgZlMYmzx X-Received: by 10.159.242.4 with SMTP id t4mr2855462plr.411.1511962127193; Wed, 29 Nov 2017 05:28:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511962127; cv=none; d=google.com; s=arc-20160816; b=lzNsn30E6PB1xlPya7q7ssK8prfuGN8wjObMXG+4H1lRpfzu9xMW93DeSA2JnpX3HM VvyAMEqNGmoMb83lElwpRPli40dxFn4hIEzV2rYmCBgoDjha0333fS0K//rqRrwkpw9g KxKf5KQ5WcP4AXmW16VJypvYxs3wZHBZHNybbmZJ7E7dSQ24g/FBS2dzjo2ppXOM5I1u x4x4ukC6qVE8iAzOsh+2PhZ/p53/bVm9OvHYicFQ8ti2rgltMA925e2oj/9Q+c0/knOJ r0boFAlshATvco5jOtVwf2pzOENo0BuSQnJGpiYjBJtRILfaB8ZPqVTiVTNFJYgxgRAv PBsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=WA+9/80M03kaBhbm0KwTxOruvaCrAerKkfd1DMZmJB0=; b=oQ7dB/3NwhaygqGJQ1F2G5xfZaGvQp51p78tkPniMU70etcwlKC1beeUuEYnUxAr6v kpaQaxhi0ogxJO8euyFELUNpcdezyIHkzL3GPKrfNoWaB/fEUGwdtPbvNHoUXnZgdrkd tY8U2N0ZgCxQ3gi16m9a7nIimjnZFg8lRw6JE96HrmXts2CrLyxEYMVSd0pC7Iu+593m 1QWqdE87HU2FZYY55N63Y0ok0ymvhEveOzBuYIvu9KLyB/owAxGZSYiFSqMxTLOKPwwD +M5s2YDl91nJJm46O4+I5u6JidpaHxhTcng5KxTA1niv7+k30PDJM1PeBUvkE61SIc3N SE9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@prevas.dk header.s=ironport1 header.b=a3oZIZy0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6si1272613pgq.177.2017.11.29.05.28.37; Wed, 29 Nov 2017 05:28:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@prevas.dk header.s=ironport1 header.b=a3oZIZy0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754435AbdK2K5B (ORCPT + 70 others); Wed, 29 Nov 2017 05:57:01 -0500 Received: from mail01.prevas.se ([62.95.78.3]:43884 "EHLO mail01.prevas.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752777AbdK2K47 (ORCPT ); Wed, 29 Nov 2017 05:56:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=prevas.dk; i=@prevas.dk; l=3055; q=dns/txt; s=ironport1; t=1511953019; x=1543489019; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=4ly+YjzidWLO/m45WZKaBJTsWsHWk90V45NeLhYvDq0=; b=a3oZIZy0kL07BENV9yEgISJ8eY3dkrKScJjyhpzmn78F27KfwK309jED XHJDD91LnLJkL66BRDE+Btk3VOi2sdCOvZRUWZNX6rTqnrtO7IxYZ3VLo c0wnkJTRebWfiFaujp+rxYJEhw2ZlpKhfMzg2SkKxAVv+WJql6bPZTA4m 4=; X-IronPort-AV: E=Sophos;i="5.44,472,1505772000"; d="scan'208";a="2885789" Received: from vmprevas4.prevas.se (HELO smtp.prevas.se) ([172.16.8.104]) by ironport1.prevas.se with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2017 11:56:57 +0100 Received: from [172.16.11.34] (172.16.8.31) by smtp.prevas.se (172.16.8.104) with Microsoft SMTP Server (TLS) id 14.3.361.1; Wed, 29 Nov 2017 11:56:57 +0100 Subject: Re: [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter To: Guenter Roeck CC: Wim Van Sebroeck , Jonathan Corbet , Esben Haabendal , , , , References: <1511865350-20665-1-git-send-email-rasmus.villemoes@prevas.dk> <1511865350-20665-2-git-send-email-rasmus.villemoes@prevas.dk> <20171128221445.GG10144@roeck-us.net> From: Rasmus Villemoes Message-ID: Date: Wed, 29 Nov 2017 11:56:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20171128221445.GG10144@roeck-us.net> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.8.31] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017-11-28 23:14, Guenter Roeck wrote: > On Tue, Nov 28, 2017 at 11:35:49AM +0100, Rasmus Villemoes wrote: >> >> The unit is milliseconds rather than seconds because that covers more >> use cases. For example, one can effectively disable the kernel handling >> by setting the open_timeout to 1 ms. There are also customers with very >> strict requirements that may want to set the open_timeout to something >> like 4500 ms, which combined with a hardware watchdog that must be >> pinged every 250 ms ensures userspace is up no more than 5 seconds after >> the bootloader hands control to the kernel (250 ms until the driver gets >> registered and kernel handling starts, 4500 ms of kernel handling, and >> then up to 250 ms from the last ping until userspace takes over). > > This is quite vague, especially since it doesn't count the time from > boot to starting the watchdog driver, My example is bad, and I now realize one cannot really get such precise guarantees. But the example _did_ actually account for the time from boot to device registration - it allowed 250 ms for the kernel to get that far. which can vary even across boots. > Why not make it specific, for example by adjusting the open timeout with > ktime_get_boot_ns() ? If by "boot" we mean the moment the bootloader hands control to the kernel, ktime_get_boot_ns() doesn't give that either - at best, it gives an approximation of the time since timekeeping_init(), but it's not very accurate that early (I simply injected printks of ktime_get_boot_ns at various places in init/main.c and timestamped the output lines). If it overshoots, we'd be subtracting more of the allowance than we should, and I don't think we have any way of knowing when that happens or to correct for it. So I'd rather keep the code simple and let it count from the time the watchdog framework knows about the device, which is also around the time when the kernel's timekeeping is reasonably accurate. > I would actually make it even more specific and calculate the open > timeout such that the system would reboot after open_timeout, not > after . Any reason for not doing that ? > The upside would be more accuracy, and I don't really see a downside. I don't think it would be that much more accurate - we schedule the pings at a frequency of half the max_hw_heartbeat_ms==$x, with the current code we'd get rebooted somewhere between [open_deadline + $x/2, open_deadline + $x], and subtracting $x from the open_timeout that would become [open_deadline - $x/2, open_deadline]. I'd rather not have the reboot happen before the open_deadline. Sure, we could subtract $x/2 instead. Then there's the case where ->max_hw_heartbeat_ms is not set, so we have to use ->timeout for $x, and then there's the case of $x (or $x/2) being greater than $open_timeout. I'd really like to keep the code simple. If it helps, I'd be happy to document the exact semantics of the open_timeout with a nice ascii art timeline. Rasmus From 1585349757551533477@xxx Tue Nov 28 22:15:46 +0000 2017 X-GM-THRID: 1585305829144007387 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread