Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp581979rwd; Sat, 27 May 2023 03:06:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4gwqqQpe9h4DsBooszXDjs93csGh5SWng0vCWunSDVIsePKYO67oax64lxHJpiWPt8z/NX X-Received: by 2002:a17:902:b782:b0:1b0:2f03:8581 with SMTP id e2-20020a170902b78200b001b02f038581mr277103pls.44.1685182011611; Sat, 27 May 2023 03:06:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685182011; cv=none; d=google.com; s=arc-20160816; b=pyZl/13Pzq+4kCZDNYeEZJyk7NKmPfLTXAmBaCu6r/O/jOBgij/M0BGHwjrCcxaLfm fZRB9QvjRgWIml/V6EFM1fbTHQJmMguMwlbUlqIs5v++X9/MiSc+1XQwFn/lq81Z2Ctw SK5JPiljHpNVTUFwQ9FQ0PqafUyOscGdKmcfMU/o1OojVJU+GapSKPmaEVWHYLwxoWTE 0nnIeLEvo1vwGKPyQzwqDMgUma9gVJeSzqVqR6W+XPLcnRi6dG7aDlRgl8S0eoMThH7L pUDUGPkw9OF+davOjL1kUWSMy9OScl0OuGTT0l8uTzbEv9ON9grhDyqZXxjVCpsY1Xmy EqwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:mime-version:user-agent:message-id :in-reply-to:date:references:cc:to:from; bh=3avMH32j1UACEah7MaDQsd3M4AZvxaVyNmdbJOglkL4=; b=Rlv1nWtLDfU7oIufN6NCY8/+BkxnO7IgjyzB4qnyPMZkVX/c5RmGHURcULMt6N9CWl sgPT8EqUTkjdF1tl1tUc5tRD5vUGaUy3hthmfoejQZMqnubrPa5vfBbW0DWm3/S5FAXW pXdtanAv4gpKBhAJxkY68suEmSJZiy/L1z2rnf20iEyMBkcJcBU8SiOaQaduCp/1TjTf EmwXevlIgcwuQR6SnptZQjyIPHzWDuyb9J7OtuxhigZCBBRKAMc+Tz9zBKH5lVpqHUOt 0ct4QkHGam5oNjdKipx3niMStIhf1Mp9IIZSxlmrxWQGx05SIV1Rl7HZwYF3Erlbnz+F W0yQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x12-20020a170902ec8c00b001afc438f915si6494257plg.626.2023.05.27.03.06.33; Sat, 27 May 2023 03:06:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231324AbjE0Jt4 (ORCPT + 99 others); Sat, 27 May 2023 05:49:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229861AbjE0Jty (ORCPT ); Sat, 27 May 2023 05:49:54 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52727C3 for ; Sat, 27 May 2023 02:49:52 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:33906) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1q2qYX-00CTNs-QC; Sat, 27 May 2023 03:49:49 -0600 Received: from ip68-110-29-46.om.om.cox.net ([68.110.29.46]:50246 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1q2qYW-002KXI-8F; Sat, 27 May 2023 03:49:49 -0600 From: "Eric W. Biederman" To: Linus Torvalds Cc: Oleg Nesterov , Mike Christie , linux@leemhuis.info, nicolas.dichtel@6wind.com, axboe@kernel.dk, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, sgarzare@redhat.com, jasowang@redhat.com, stefanha@redhat.com, brauner@kernel.org References: <20230522025124.5863-1-michael.christie@oracle.com> <20230522025124.5863-4-michael.christie@oracle.com> <20230522123029.GA22159@redhat.com> <20230522174757.GC22159@redhat.com> <20230523121506.GA6562@redhat.com> <87bkib6nxr.fsf@email.froward.int.ebiederm.org> <20230524141022.GA19091@redhat.com> <87ttw1zt4i.fsf@email.froward.int.ebiederm.org> <20230525115512.GA9229@redhat.com> <87y1lcxwcj.fsf@email.froward.int.ebiederm.org> Date: Sat, 27 May 2023 04:49:19 -0500 In-Reply-To: (Linus Torvalds's message of "Thu, 25 May 2023 09:20:19 -0700") Message-ID: <87cz2mrtnk.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1q2qYW-002KXI-8F;;;mid=<87cz2mrtnk.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.110.29.46;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18QenzTYil61g5nAIUyy0kOBdxjsMy5svs= X-SA-Exim-Connect-IP: 68.110.29.46 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Virus: No X-Spam-DCC: XMission; sa02 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ***;Linus Torvalds X-Spam-Relay-Country: X-Spam-Timing: total 960 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 4.6 (0.5%), b_tie_ro: 3.2 (0.3%), parse: 1.28 (0.1%), extract_message_metadata: 12 (1.2%), get_uri_detail_list: 2.0 (0.2%), tests_pri_-2000: 4.1 (0.4%), tests_pri_-1000: 2.00 (0.2%), tests_pri_-950: 1.03 (0.1%), tests_pri_-900: 0.83 (0.1%), tests_pri_-200: 0.68 (0.1%), tests_pri_-100: 5 (0.6%), tests_pri_-90: 82 (8.5%), check_bayes: 80 (8.4%), b_tokenize: 7 (0.7%), b_tok_get_all: 8 (0.9%), b_comp_prob: 1.75 (0.2%), b_tok_touch_all: 60 (6.3%), b_finish: 0.83 (0.1%), tests_pri_0: 339 (35.3%), check_dkim_signature: 0.41 (0.0%), check_dkim_adsp: 3.4 (0.4%), poll_dns_idle: 492 (51.3%), tests_pri_10: 2.6 (0.3%), tests_pri_500: 501 (52.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds writes: > So I'd really like to finish this. Even if we end up with a hack or > two in signal handling that we can hopefully fix up later by having > vhost fix up some of its current assumptions. The real sticky widget for me is how to handle one of these processes coredumping. It really looks like it will result in a reliable hang. Limiting ourselves to changes that will only affect vhost, all I can see would be allowing the vhost_worker thread to exit as soon as get_signal reports the process is exiting. Then vhost_dev_flush would need to process the pending work. Something like this: diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..fb5ebc50c553 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -234,14 +234,31 @@ EXPORT_SYMBOL_GPL(vhost_poll_stop); void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; + struct vhost_worker *worker = dev->worker; + struct llist_node *node, *head; + + if (!worker) + return; + + init_completion(&flush.wait_event); + vhost_work_init(&flush.work, vhost_flush_work); - if (dev->worker) { - init_completion(&flush.wait_event); - vhost_work_init(&flush.work, vhost_flush_work); + vhost_work_queue(dev, &flush.work); - vhost_work_queue(dev, &flush.work); - wait_for_completion(&flush.wait_event); + /* Either vhost_worker runs the pending work or we do */ + node = llist_del_all(&worker->work_list); + if (node) { + node = llist_reverse_order(node); + /* make sure flag is seen after deletion */ + smp_wmb(); + llist_for_each_entry_safe(work, work_next, node, node) { + clear_bit(VHOST_WORK_QUEUED, &work->flags); + work->fn(work); + cond_resched(); + } } + + wait_for_completion(&flush.wait_event); } EXPORT_SYMBOL_GPL(vhost_dev_flush); @@ -338,6 +355,7 @@ static int vhost_worker(void *data) struct vhost_worker *worker = data; struct vhost_work *work, *work_next; struct llist_node *node; + struct ksignal ksig; for (;;) { /* mb paired w/ kthread_stop */ @@ -348,6 +366,9 @@ static int vhost_worker(void *data) break; } + if (get_signal(&ksig)) + break; + node = llist_del_all(&worker->work_list); if (!node) schedule(); diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c index b7cbd66f889e..613d52f01c07 100644 --- a/kernel/vhost_task.c +++ b/kernel/vhost_task.c @@ -47,6 +47,7 @@ void vhost_task_stop(struct vhost_task *vtsk) * not exiting then reap the task. */ kernel_wait4(pid, NULL, __WCLONE, NULL); + put_task_struct(vtsk->task); kfree(vtsk); } EXPORT_SYMBOL_GPL(vhost_task_stop); @@ -101,7 +102,7 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg, return NULL; } - vtsk->task = tsk; + vtsk->task = get_task_struct(tsk); return vtsk; } EXPORT_SYMBOL_GPL(vhost_task_create); Eric