Received: by 10.223.176.5 with SMTP id f5csp3165wra; Thu, 8 Feb 2018 15:17:43 -0800 (PST) X-Google-Smtp-Source: AH8x226kyLhVYqJa6Q1Qe+JmQpChWTOrU4AMuJbjy2uMstK6aSNPmd/9gjyZAb15HSyeqLUK5xs6 X-Received: by 2002:a17:902:3124:: with SMTP id w33-v6mr593306plb.356.1518131863162; Thu, 08 Feb 2018 15:17:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518131863; cv=none; d=google.com; s=arc-20160816; b=fdKP9rKfCGpDXZWBpWNGqzmw9dt7b4iWCxW58NNbQm2+bLX3idqf5jhk9ICI6wN8D7 kuheYQC7Psfm+7hxjnVJ9TtWhd7dnbc/vnd0yxQfuNKt9dWrN5SyF7MvYbshRRech5UP 4ILWdeLVvq3nzEFlJYiWGrNrXLrYs8d9VyrvxszQLkoFIPC7jwHddcJJVx+khnoFQGTA BnunIEuJ3ENn1zJ/rP3JLNqY4ighC/FD6ryISqlH4W36GFXtk8BwLYgFJUfIpD9ZlC3P naEMyhm3JHvFj1tAylQqzdkQ9cMFebEp5qNEaBDqwao8vDVXiE/jNEHn+LepvrYTXcW2 yPxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:subject:user-agent:message-id :references:cc:in-reply-to:from:to:content-transfer-encoding :mime-version:arc-authentication-results; bh=W5X7j+1jDoZb7NyTwbYtyz3R31CG0aDBB1GoankkFk4=; b=hV2go/tufulPX0ZizkSt9yaxn8Z+Mb9oSG3TTHl2niR+S0Ufm6ErjuN+GSu+W6VaqX vHoKzSIjwrvwk5A4+n4phWkugmt5Chk768XgpgxaP+6ZEcNaGHmTz4G4UChHT/jNJVmc 4TPB64S2a+p0tictKwhRhJJIV4Q71Is/9dPmd6qyNLbc2poKGnWQJ3HikW9gUkU8E+xr F+uwWvUk8zrSLHBYGSGlii7OA8XKeSUCVdoDAj9tC5Lq0y+JchRRRRLudlzn5Gl5sHrw jxWheLT8vyb5q8BUx43lgaZuFRaYxZ8CHNxEtN0LQtke/baGYuiAPpgfzLXnZfC3bOsT H0Dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v128si556829pgb.185.2018.02.08.15.17.27; Thu, 08 Feb 2018 15:17:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752317AbeBHXP4 convert rfc822-to-8bit (ORCPT + 99 others); Thu, 8 Feb 2018 18:15:56 -0500 Received: from mail.fireflyinternet.com ([109.228.58.192]:52177 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752265AbeBHXP4 (ORCPT ); Thu, 8 Feb 2018 18:15:56 -0500 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 10624763-1500050 for multiple; Thu, 08 Feb 2018 23:15:21 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT To: Tetsuo Handa , linux-kernel@vger.kernel.org From: Chris Wilson In-Reply-To: <201802090810.DBF09356.OFMQFVFJtOHOLS@I-love.SAKURA.ne.jp> Cc: mingo@kernel.org, akpm@linux-foundation.org, ak@linux.intel.com, jack@suse.cz, aryabinin@virtuozzo.com, dvyukov@google.com References: <20180208190753.17690-1-chris@chris-wilson.co.uk> <201802090810.DBF09356.OFMQFVFJtOHOLS@I-love.SAKURA.ne.jp> Message-ID: <151813172079.28809.12438916989037864311@mail.alporthouse.com> User-Agent: alot/0.3.6 Subject: Re: [PATCH] khungtaskd: Kick stuck processes Date: Thu, 08 Feb 2018 23:15:20 +0000 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Tetsuo Handa (2018-02-08 23:10:43) > Chris Wilson wrote: > > After spotting a stuck process, and having decided not to panic, give > > the task a kick to see if that helps it to recover (e.g. to paper over a > > missed wake up). > > Yes, we are seeing hangs at io_schedule(), but doesn't optionally allowing > io_schedule() be replaced with timeout version (e.g. dump_page() upon timeout > if io_schedule() was called for e.g. wait_on_page_bit()) give us more clue? Yes, this isn't for debugging who left the page locked (or the exact root cause), this is just trying to allow the system to limp along afterwards :) From personal experience, I know how easy it is to lose a wakeup and the only thing to notice is khungtaskd shouting every 120s. -Chris