Received: by 10.223.185.116 with SMTP id b49csp3574308wrg; Tue, 13 Feb 2018 04:31:07 -0800 (PST) X-Google-Smtp-Source: AH8x226beEa0emwnECDAAf+k7kXumnOf/Cvy3iLO7LeSZN39fLgEN6kj8tQ1zlkmMxzPXw02s0AX X-Received: by 10.99.170.77 with SMTP id x13mr872718pgo.393.1518525067712; Tue, 13 Feb 2018 04:31:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518525067; cv=none; d=google.com; s=arc-20160816; b=dFwL6n5Is4T7SdKCvsh9qdNJJxw0EJCcYVbtkUvpo1zSBxHPm3Go4yvBnOJ9PQ/ubk GgiCogJ2ce5OaJTDbkhQZJK6CsB8/VKS2cYxB5TfjP6c6wo29uH5D1nNhkkBNofvCZbj KGC1KCWQ6keVm1AgOs+9wXxEtZAuK88zX8dZb39Q9Vj3ujOTGgEKHkfg+a+eVKh8cmLH DDddX1EpW1+tOlBBIP12JDSx8UcOoQDpp4b/sjThtj1t3LN7ffhyF9Xg5ItvSu7fktr/ DPxdB7J2WCDrdTVKjpMOI5wG2h/fdXxuIJINplt3LVyxE1RMMQV/8RP0oAZl7EMnak9f rlIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=I14Wnipxx68Bvjj5x53qUDNZ5wcKG5/8tdkv7ZO0FzA=; b=XfwIS72g4Q8XFrhetKiMT+rDfjUHBGuJsY+Qa+tMgkO1AC10tY4EgKSbHEswdY1Js6 Fda5O0NISnpXcsalzlT7Lot2woKwmAzWYbUFXZp3lN3SsGp/4D2M4Jd4C7B9PQa0m451 vjXcFAjJMyCR69iMS6KVcZG566WMZK/3DZNCNoMoBGBLQ3b7seQG7/Sm6AXzdeZTRhfJ dHBTbcDfaooZzed4gup9PzLW5eJ1nKcs9Ldci1ICbapeSVcIG3+Ol2iNGEAssRGzDa0H X9yJGEYRDe2igBJjaKkk6Zawy3Jo29FNzLBFrrKPLXz1+PW3pwwxTc5OGSuAHNyy0fkB kJLA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t17-v6si1212017plo.620.2018.02.13.04.30.53; Tue, 13 Feb 2018 04:31:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935196AbeBMM3m (ORCPT + 99 others); Tue, 13 Feb 2018 07:29:42 -0500 Received: from mx2.suse.de ([195.135.220.15]:37391 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935022AbeBMM3l (ORCPT ); Tue, 13 Feb 2018 07:29:41 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7E43BAD1C; Tue, 13 Feb 2018 12:29:40 +0000 (UTC) Date: Tue, 13 Feb 2018 13:29:40 +0100 From: Michal Hocko To: Chris Wilson Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Tetsuo Handa , Andrew Morton Subject: Re: [PATCH] khungtaskd: Kick stuck processes Message-ID: <20180213122940.GS3443@dhcp22.suse.cz> References: <20180208190753.17690-1-chris@chris-wilson.co.uk> <20180213115642.GR3443@dhcp22.suse.cz> <151852369261.8633.4809735220536862770@mail.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <151852369261.8633.4809735220536862770@mail.alporthouse.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 13-02-18 12:08:12, Chris Wilson wrote: > Quoting Michal Hocko (2018-02-13 11:56:42) > > On Thu 08-02-18 19:07:53, Chris Wilson wrote: > > > After spotting a stuck process, and having decided not to panic, give > > > the task a kick to see if that helps it to recover (e.g. to paper over a > > > missed wake up). > > > > huh, this is just no-no. watchdog is there to report problems not > > interfere. You cannot never know whether the sleeper is prepared for > > spurious wakeups. Do not paper over bugs... > > Aside from khungtaskd being a debug feature, we want to identify the bug > by kicking the stuck process and seeing what squeals. Being told that > khugepaged is stuck over and over again doesn't help resolve who is > holding onto that lock_page, or if it was just a missed wakeup as all > other processes are asleep. And how exactly does kicking helps here? If the waiter uses lock_page then it would go sleep again because of PG_locked. If the page is not locked and this is a missed wake up then either unlock_page is wrong (which doesn't seem to be the case AFAICS) or somebody messes up with the page locking and this patch doesn't achieve anything. > We are trying to paper over other bugs so that we can fix ours. But you do not want to break existing code which might be sensible to spurious wakeups. You could argue that such a code is broken already and I would tend to agree, but an artificial wake up is just nogo. -- Michal Hocko SUSE Labs