Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp354304rwd; Tue, 16 May 2023 01:53:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7M6MPtPwMime59BWQTRx6CPdm3/u5/0wzDIB9ROlC579j7VfSMJVdoO0NAMi3xzuoXL0Fk X-Received: by 2002:a17:90a:bc8d:b0:250:bc78:9828 with SMTP id x13-20020a17090abc8d00b00250bc789828mr23813911pjr.4.1684227197515; Tue, 16 May 2023 01:53:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684227197; cv=none; d=google.com; s=arc-20160816; b=GMjLnnPoexUzw6sPPPjYBVRqgi2sYD35jVfQmqi1e1Cpc1Zo0rkNsQzwnyMzGtEqrJ pwCeNdxx6tbM1E9eGWLJYDC8i/tHTih2A9O06m5pOcMTCt9l7FndOzlkqdqhLN19rsVX gxZCGEOOyqyEZHsTmHpY3ZT0Vj5MqMSHmCser3Ob9XFxJiuGHS0czCDJ+eubZC0a/7eD a324/7pk4fZfq/oGMl2gdaJ0SITWBxdD64pLvl21gyzQJYNy58vHtreCviDbNo8Q5zdE VxfSEh0RPEbejsGESAcXVAHTRI+PVDPHXPDE6dOnvsCN1LbccYlQGLv4C5vDOw3GdGeT LOUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=bx0GPwArxPfZCwOnAINdaeCSqRFS7+r6V/5K/PR+dF8=; b=LZEbEbaMAyWT7qaPXU/XEdc64+x7TbJihNIntSndE8x6xaoELFLxUxSsVU/mD2S+Hn w95BHYbOX97FV6Gonx80DNlS+irkgSsl8OU1rR1QR7W841IgYHgRkRNxN7UjZvFxeif3 xM8Q01XNFABjNWHJ3ZobATzxF7XUf0G0/C00N+rOTK9MlOsUdDNbvbAgtjDaiIBCQv2H uPKpGlszTjGvW/g2KktKjMYX0QSRiZH00Bi2+UTN1DeKtENXEujl8yhVpqcK57Exn/8E gpTxUbuU2AdS0TrBexNNy+QYLYgyRU1Ir7+cJg3R2f0twtC+C+8h/FAr957eTy2zSfL6 OpPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HllN8HjV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y6-20020a17090a600600b0025323913ae0si803815pji.179.2023.05.16.01.53.03; Tue, 16 May 2023 01:53:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HllN8HjV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231626AbjEPIkK (ORCPT + 99 others); Tue, 16 May 2023 04:40:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbjEPIkH (ORCPT ); Tue, 16 May 2023 04:40:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38AE21BF6 for ; Tue, 16 May 2023 01:40:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B39C062FFE for ; Tue, 16 May 2023 08:40:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B049C433D2; Tue, 16 May 2023 08:39:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684226401; bh=L//2mJqpJe73Cmk74amA5WtIVkNTK2EJMElKgY4Oifw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HllN8HjVjfHFg0iKJPiSZ+Ym2m9sd35ggzSOD6myB8PVaHbOVyxF9vNF6JCWqpRTd GSJ/qoF84U8qBkM6Ikskkgi5Wi7fwb6evjn/mrZFaEFSsjVf0hT6Mme9PThb3IbOc+ 5GgWXkL74/h9xy4Pn4WSCkjUysVIF4HnnjQgck9UN4x5uXA/qcAeZ/JeqYf+NHD241 cuPjBVe8somRlu6CjRrMttWHt5XrMWYNQoz29dG21zk0PBGc1nbdF4r+wtNFSUo98C tVOxOBb40VoktDmsHMrLQksJZOBPtiZ3MKXTlG7Iq6IjIFJ2/nWFLZl7vxXmcIWa+z RXHLm6mijvu7g== Date: Tue, 16 May 2023 10:39:53 +0200 From: Christian Brauner To: Mike Christie , Linus Torvalds Cc: Thorsten Leemhuis , nicolas.dichtel@6wind.com, Linux kernel regressions list , hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com, mst@redhat.com, sgarzare@redhat.com, virtualization@lists.linux-foundation.org, ebiederm@xmission.com, konrad.wilk@oracle.com, linux-kernel@vger.kernel.org, Jens Axboe Subject: Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Message-ID: <20230516-weltmeere-backofen-27f12ae2c9e0@brauner> References: <20230202232517.8695-1-michael.christie@oracle.com> <20230202232517.8695-9-michael.christie@oracle.com> <78c5e150-26cf-7724-74ee-4a0b16b944b1@oracle.com> <48842e92-835e-bc3f-7118-48b8f415f532@leemhuis.info> <20230515-vollrausch-liebgeworden-2765f3ca3540@brauner> <122b597e-a5fa-daf7-27bb-6f04fa98d496@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <122b597e-a5fa-daf7-27bb-6f04fa98d496@oracle.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 15, 2023 at 05:23:12PM -0500, Mike Christie wrote: > On 5/15/23 10:44 AM, Linus Torvalds wrote: > > On Mon, May 15, 2023 at 7:23 AM Christian Brauner wrote: > >> > >> So I think we will be able to address (1) and (2) by making vhost tasks > >> proper threads and blocking every signal except for SIGKILL and SIGSTOP > >> and then having vhost handle get_signal() - as you mentioned - the same > >> way io uring already does. We should also remove the ingore_signals > >> thing completely imho. I don't think we ever want to do this with user > >> workers. > > > > Right. That's what IO_URING does: > > > > if (args->io_thread) { > > /* > > * Mark us an IO worker, and block any signal that isn't > > * fatal or STOP > > */ > > p->flags |= PF_IO_WORKER; > > siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)); > > } > > > > and I really think that vhost should basically do exactly what io_uring does. > > > > Not because io_uring fundamentally got this right - but simply because > > io_uring had almost all the same bugs (and then some), and what the > > io_uring worker threads ended up doing was to basically zoom in on > > "this works". > > > > And it zoomed in on it largely by just going for "make it look as much > > as possible as a real user thread", because every time the kernel > > thread did something different, it just caused problems. > > > > So I think the patch should just look something like the attached. > > Mike, can you test this on whatever vhost test-suite? > > I tried that approach already and it doesn't work because io_uring and vhost > differ in that vhost drivers implement a device where each device has a vhost_task > and the drivers have a file_operations for the device. When the vhost_task's > parent gets signal like SIGKILL, then it will exit and call into the vhost > driver's file_operations->release function. At this time, we need to do cleanup But that's no reason why the vhost worker couldn't just be allowed to exit on SIGKILL cleanly similar to io_uring. That's just describing the current architecture which isn't a necessity afaict. And the helper thread could e.g., crash. > like flush the device which uses the vhost_task. There is also the case where if > the vhost_task gets a SIGKILL, we can just exit from under the vhost layer. In a way I really don't like the patch below. Because this should be solvable by adapting vhost workers. Right now, vhost is coming from a kthread model and we ported it to a user worker model and the whole point of this excercise has been that the workers behave more like regular userspace processes. So my tendency is to not massage kernel signal handling to now also include a special case for user workers in addition to kthreads. That's just the wrong way around and then vhost could've just stuck with kthreads in the first place. So I'm fine with skipping over the freezing case for now but SIGKILL should be handled imho. Only init and kthreads should get the luxury of ignoring SIGKILL. So, I'm afraid I'm asking some work here of you but how feasible would a model be where vhost_worker() similar to io_wq_worker() gracefully handles SIGKILL. Yes, I see there's net.c: .release = vhost_net_release scsi.c: .release = vhost_scsi_release test.c: .release = vhost_test_release vdpa.c: .release = vhost_vdpa_release vsock.c: .release = virtio_transport_release vsock.c: .release = vhost_vsock_dev_release but that means you have all the basic logic in place and all of those drivers also support the VHOST_RESET_OWNER ioctl which also stops the vhost worker. I'm confident that a lof this can be leveraged to just cleanup on SIGKILL. So it feels like this should be achievable by adding a callback to struct vhost_worker that get's called when vhost_worker() gets SIGKILL and that all the users of vhost workers are forced to implement. Yes, it is more work but I think that's the right thing to do and not to complicate our signal handling. Worst case if this can't be done fast enough we'll have to revert the vhost parts. I think the user worker parts are mostly sane and are useful. Thoughts?