Received: by 10.213.65.68 with SMTP id h4csp1968240imn; Thu, 29 Mar 2018 14:42:48 -0700 (PDT) X-Google-Smtp-Source: AIpwx48Sm1TVCsJHb2EhzwsxLJfZmrCmgN2EUMcbNV2I95egWm0lmDnI2rElWCON7DsrYCX2thLB X-Received: by 10.98.242.6 with SMTP id m6mr7646467pfh.170.1522359768064; Thu, 29 Mar 2018 14:42:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522359768; cv=none; d=google.com; s=arc-20160816; b=kK1lzV1cSaToUe0zYp8wgxugBgBdBFJdMU6HnIdFZRX0UKvsiXSo/5Mbvi+Zqb3ZZ7 vl48I7zd16ceWkbCfV0um1uKM+/cl1XALY1hf6JTCqxK0l53BmYiniSW3XkYi5BrcM/u jOwkC93La+caJsbRTYaBslxwqiIlR+PT4Drf/WwpgUoiw6jKJChIOhbG6+pzMj4CJYYG gBbIpZdrtuPYIs4SKmjuTLeUsM9rMLtWGXc05KngtYiEWCiFKC0clRjZhXzBAfUAFbtN 1HgluS58Ihv4JsBFB1TuP6fQ8rbhC3259Zhk4O2dsYTLFTuQC0rKZKGXDS7OSw9kW+dK 6+Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=VSGn0JIIdznbsXQk5yTGhtkM5lP2gJ8Rv0sam8sqyNw=; b=U3Hwj64W+A8QwOw4GlfC7yGTqjRi2WJRgVvbt6FWakEnrRnYLBkxHpN0Twq5iqYN8p mgStC4WCSIqg9CLyt/5EnNNap+1BcFODtTWSt8ojvEr+N5YqUaQcq+sPtSBDYTJrw6RW 5YHq12NKhcUxJD9rhuZhQMfqrCanzjJ8RVaPYsa3jD6F0kO8YyUxtAuhbufS8eAK/OJU 845VKGi7vpGp7RFU4awEjKO5c0mqMPEjOVNRlTvupZKmOoprjMq8TQh+qdvsDtFdf62q X4Ur0UX+3pb3pFJFCKOV8W+mrYM8nDdVYTI9v5P5zv2t17Fxszxz1KJXaS4Jwx8DFB9V p3gQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7si4571941pgu.518.2018.03.29.14.42.33; Thu, 29 Mar 2018 14:42:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752180AbeC2Vl0 (ORCPT + 99 others); Thu, 29 Mar 2018 17:41:26 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36712 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751189AbeC2VlY (ORCPT ); Thu, 29 Mar 2018 17:41:24 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ABE544040856; Thu, 29 Mar 2018 21:41:23 +0000 (UTC) Received: from flask (unknown [10.43.2.80]) by smtp.corp.redhat.com (Postfix) with SMTP id D2C5A9457F; Thu, 29 Mar 2018 21:41:20 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Thu, 29 Mar 2018 23:41:20 +0200 Date: Thu, 29 Mar 2018 23:41:20 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Joey Pabalinas Cc: x86@kernel.org, Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/kvm: replace TASK_UNINTERRUPTIBLE with TASK_KILLABLE Message-ID: <20180329214116.GO26753@flask> References: <20180316105729.ykomeftdvlzqm4p6@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180316105729.ykomeftdvlzqm4p6@gmail.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 29 Mar 2018 21:41:23 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 29 Mar 2018 21:41:23 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'rkrcmar@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-03-16 00:57-1000, Joey Pabalinas: > There doesn't seem to be any advantage to having a *completely* > uninterruptible task here. For most users, allowing a task to respond > to the SIGKILL interrupt signal (all other signals are ignored just like > TASK_UNINTERRUPTIBLE) will not impact them at all. > > However, for the rare edge-cases where a task becomes stuck, maybe due to > snapshot corruption or some other similarly unrecoverable error, it > is *much* more convenient for a user to be able to have the additional > option of nuking that task with SIGKILL, rather than annoying them by > forcing them to reboot in order to remove the immortal process. > > Signed-off-by: Joey Pabalinas > > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index bc1a27280c4bf77899..7d4faee962e0c2e3c1 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -154,8 +154,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) > > for (;;) { > if (!n.halted) > - prepare_to_swait(&n.wq, &wait, TASK_UNINTERRUPTIBLE); > - if (hlist_unhashed(&n.link)) > + prepare_to_swait(&n.wq, &wait, TASK_KILLABLE); Makes sense, but it's not as easy: > + if (hlist_unhashed(&n.link) || fatal_signal_pending(current)) Removing the waiter from the list is currently waker's job, but the entry is stack-allocated by waiter; simply breaking there would cause use after free on the waker's side. We could take the lock again near the end of kvm_async_pf_task_wait to remove the entry and use a variable (instead of hlist_unhashed) to signal that the wakeup happened. And there is another problem as well: If the waiting task is killed before the wakeup arrives, the waker allocates a dummy entry that is not going to be consumed by a future waiter, leading to a leak that could potentially deplete the whole memory. A solution to that could use a list of waiters that were killed before the wakeup, so the normal working case wouldn't regress. Doing that to handle snapshot corruption is definitely not worth it -- restoring from a snapshot should do apf_task_wake_all, so the corruption would have to be very precise. I remember we had a bug where tasks were getting stuck when running nested and maybe there are going to be other cases to excuse the change. I'm slightly against changing the behavior as it's pretty hard to test, but I can be easily convinced with a well reasoned patch, thanks!