Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp109429iog; Wed, 29 Jun 2022 19:21:02 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vqmz++IVCf7mbupyhkCReFiUqDJw8rlsUSbSi5/tILygLZ7YbQHNQbdEv1tKMMSwXXyF3n X-Received: by 2002:aa7:86c9:0:b0:525:3d39:8d0f with SMTP id h9-20020aa786c9000000b005253d398d0fmr13349562pfo.54.1656555661794; Wed, 29 Jun 2022 19:21:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656555661; cv=none; d=google.com; s=arc-20160816; b=q7OiGgNZXmN84rVtbVqJBS+wPwYSqmODpJEa6fNy9tGY5p8rDYg0qnAYfmU0Wk9U8O mBpIxjdkeRk+aaD45ItZJ4xuCQUk0jzrluoSGTRH3TWIRru7z4R7uxlzGlhTIHvDvaBb rDEF7zGVNp3IPh9Aj/g925fHHyNLknf6WTtjwjwQarNuirwG3uyU/tuGlHbsWoaH9jz4 BmV6AEdgawXo3M7QgzflJ8Ng9YPedCW/1EIdI/h5v42Aa68n1Cs3KNJKIV24KwHucpoR EE9NJEI+i1LZnEFpHGe9Q6x2WdL/HU62z0y/fcPo1tUBLGNVnf4mSq/CxSpiyPFMdqkC 6G8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=siz7WrYvb8LkyacUeeNLloNPDn/nazktAAvEBoYIVXM=; b=YCGaD1OqYFytO4GzsS53U7VUJOyxylplbX6baBY7YXZSZBKnAPwgNdnMwPy1kHGk26 UaNt1rhCsIleLQsLwGH7LqORnsVIJAdomKY/r2hErF879Wvd5YygcU7V4ofIxNDsBdA+ vTFeq8Dd7HoiF76C0yJU8tPWG1TQLeyCMsjlki8XBMpM0XVaxgQAC/zpjqojvN3a1GdG JHyx+1NhsozHlsgBjpJy4XvGiH71Nr/WA9lev+2SrY0afrchRDY9aSeH9R92F9ZJ4J6l r6rSHumQy7JlE8PSCSk3u3Nbi8we/prucu43mmgvNaMebzSHQnJDO0ba7rjxjZ2DgyvX 4ROw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XbIA9+eL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ja1-20020a170902efc100b0016a613012a7si22655031plb.414.2022.06.29.19.20.28; Wed, 29 Jun 2022 19:21:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XbIA9+eL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230438AbiF3CBg (ORCPT + 99 others); Wed, 29 Jun 2022 22:01:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229687AbiF3CBe (ORCPT ); Wed, 29 Jun 2022 22:01:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1841B3A73B for ; Wed, 29 Jun 2022 19:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656554491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=siz7WrYvb8LkyacUeeNLloNPDn/nazktAAvEBoYIVXM=; b=XbIA9+eLT13Yt3zhdnsv0Vhao29dy6KbfX/J3dGShuyQcaGpm8Rom6w1ueEiY7xpwWowl5 eRkHaeMe0aBuW711OHc7frb2A+mBjMRWsZ7/QC0FaPdqtmJAEQqbe+uIKaWfPzynOOKsyi BS0s0BWu+2HnzDMpmp7UGg7Laxi2MPs= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-325-genx9lS4NZa_vde1ln_1IA-1; Wed, 29 Jun 2022 22:01:29 -0400 X-MC-Unique: genx9lS4NZa_vde1ln_1IA-1 Received: by mail-lj1-f197.google.com with SMTP id y16-20020a2e95d0000000b0025a70d22a16so2691611ljh.7 for ; Wed, 29 Jun 2022 19:01:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=siz7WrYvb8LkyacUeeNLloNPDn/nazktAAvEBoYIVXM=; b=VTItlZb6aqyhzMSQ3xY85QwQstsy4j/bGuRTQUOCxASqzmGwaGWW5k27Lu2pewZzab tfpz8hj3sGP/TtQzJN1opScM9gLCxyxB2dLQkH09XmvvMwUVe+vTf2i7EXODGgKmP0RL Uv7ZizJQGh6pYorN5xBfGsNpl8KXRuDM9NQ+AIEIZA7vFqBshwojMPs3SN5PXdGgNAlQ HK8fA5htvTF8oj+ig2PCL6gRvpxIi46GVejYKPv1nfya38wFINy8m1ZehGKrW+JofcxG nCNS5ZwgM+9JtbAL+VKtaQAXivpEuEwMJBAenistisIYJ3w52y7XXEyEhufIvb9qrqrZ FguA== X-Gm-Message-State: AJIora+Hxq+PF/4r12zenGijFllq2xdNDmJNNw0trnP8SQhHxJb4bmae bk+JeUsfajuHyOj8HlgB9Oj+kPyqBFtGCJDNMbbMdJK3dzbeqZidUsldAHB6gfw0Dp7xgOKdpji /JvqiKFrNE2/9IXmnSNFzNKYoyUSvKN78DSKUOYmT X-Received: by 2002:a05:6512:158d:b0:47f:718c:28b5 with SMTP id bp13-20020a056512158d00b0047f718c28b5mr4175583lfb.397.1656554487922; Wed, 29 Jun 2022 19:01:27 -0700 (PDT) X-Received: by 2002:a05:6512:158d:b0:47f:718c:28b5 with SMTP id bp13-20020a056512158d00b0047f718c28b5mr4175562lfb.397.1656554487568; Wed, 29 Jun 2022 19:01:27 -0700 (PDT) MIME-Version: 1.0 References: <20220627053820-mutt-send-email-mst@kernel.org> <20220628004614-mutt-send-email-mst@kernel.org> <20220629022223-mutt-send-email-mst@kernel.org> <20220629030600-mutt-send-email-mst@kernel.org> <20220629044514-mutt-send-email-mst@kernel.org> In-Reply-To: <20220629044514-mutt-send-email-mst@kernel.org> From: Jason Wang Date: Thu, 30 Jun 2022 10:01:16 +0800 Message-ID: Subject: Re: [PATCH V3] virtio: disable notification hardening by default To: "Michael S. Tsirkin" Cc: Cornelia Huck , Halil Pasic , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Alexander Gordeev , linux-s390@vger.kernel.org, virtualization , kvm , linux-kernel , Ben Hutchings , David Hildenbrand Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 29, 2022 at 4:52 PM Michael S. Tsirkin wrote: > > On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang wrote: > > On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin wrote: > > > > > > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang wrote: > > > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin wrote: > > > > > > > > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang wrote: > > > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang wrote: > > > > > > > > > > > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin wrote: > > > > > > > > > > > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM +0800, Jason Wang wrote: > > > > > > > > > > Heh. Yea sure. But things work fine for people. What is the chance > > > > > > > > > > your review found and fixed all driver bugs? > > > > > > > > > > > > > > > > > > I don't/can't audit all bugs but the race between open/close against > > > > > > > > > ready/reset. It looks to me a good chance to fix them all but if you > > > > > > > > > think differently, let me know > > > > > > > > > > > > > > > > > > > After two attempts > > > > > > > > > > I don't feel like hoping audit will fix all bugs. > > > > > > > > > > > > > > > > > > I've started the auditing and have 15+ patches in the queue. (only > > > > > > > > > covers bluetooth, console, pmem, virtio-net and caif). Spotting the > > > > > > > > > issue is not hard but the testing, It would take at least the time of > > > > > > > > > one release to finalize I guess. > > > > > > > > > > > > > > > > Absolutely. So I am looking for a way to implement hardening that does > > > > > > > > not break existing drivers. > > > > > > > > > > > > > > I totally agree with you to seek a way without bothering the drivers. > > > > > > > Just wonder if this is possbile. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The reason config was kind of easy is that config interrupt is rarely > > > > > > > > > > > > vital for device function so arbitrarily deferring that does not lead to > > > > > > > > > > > > deadlocks - what you are trying to do with VQ interrupts is > > > > > > > > > > > > fundamentally different. Things are especially bad if we just drop > > > > > > > > > > > > an interrupt but deferring can lead to problems too. > > > > > > > > > > > > > > > > > > > > > > I'm not sure I see the difference, disable_irq() stuffs also delay the > > > > > > > > > > > interrupt processing until enable_irq(). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Absolutely. I am not at all sure disable_irq fixes all problems. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Consider as an example > > > > > > > > > > > > virtio-net: fix race between ndo_open() and virtio_device_ready() > > > > > > > > > > > > if you just defer vq interrupts you get deadlocks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't see a deadlock here, maybe you can show more detail on this? > > > > > > > > > > > > > > > > > > > > What I mean is this: if we revert the above commit, things still > > > > > > > > > > work (out of spec, but still). If we revert and defer interrupts until > > > > > > > > > > device ready then ndo_open that triggers before device ready deadlocks. > > > > > > > > > > > > > > > > > > Ok, I guess you meant on a hypervisor that is strictly written with spec. > > > > > > > > > > > > > > > > I mean on hypervisor that starts processing queues after getting a kick > > > > > > > > even without DRIVER_OK. > > > > > > > > > > > > > > Oh right. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, thinking about all this, how about a simple per vq flag meaning > > > > > > > > > > > > "this vq was kicked since reset"? > > > > > > > > > > > > > > > > > > > > > > And ignore the notification if vq is not kicked? It sounds like the > > > > > > > > > > > callback needs to be synchronized with the kick. > > > > > > > > > > > > > > > > > > > > Note we only need to synchronize it when it changes, which is > > > > > > > > > > only during initialization and reset. > > > > > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If driver does not kick then it's not ready to get callbacks, right? > > > > > > > > > > > > > > > > > > > > > > > > Sounds quite clean, but we need to think through memory ordering > > > > > > > > > > > > concerns - I guess it's only when we change the value so > > > > > > > > > > > > if (!vq->kicked) { > > > > > > > > > > > > vq->kicked = true; > > > > > > > > > > > > mb(); > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > will do the trick, right? > > > > > > > > > > > > > > > > > > > > > > There's no much difference with the existing approach: > > > > > > > > > > > > > > > > > > > > > > 1) your proposal implicitly makes callbacks ready in virtqueue_kick() > > > > > > > > > > > 2) my proposal explicitly makes callbacks ready via virtio_device_ready() > > > > > > > > > > > > > > > > > > > > > > Both require careful auditing of all the existing drivers to make sure > > > > > > > > > > > no kick before DRIVER_OK. > > > > > > > > > > > > > > > > > > > > Jason, kick before DRIVER_OK is out of spec, sure. But it is unrelated > > > > > > > > > > to hardening > > > > > > > > > > > > > > > > > > Yes but with your proposal, it seems to couple kick with DRIVER_OK somehow. > > > > > > > > > > > > > > > > I don't see how - my proposal ignores DRIVER_OK issues. > > > > > > > > > > > > > > Yes, what I meant is, in your proposal, the first kick after rest is a > > > > > > > hint that the driver is ok (but actually it could not). > > > > > > > > > > > > > > > > > > > > > > > > > and in absence of config interrupts is generally easily > > > > > > > > > > fixed just by sticking virtio_device_ready early in initialization. > > > > > > > > > > > > > > > > > > So if the kick is done before the subsystem registration, there's > > > > > > > > > still a window in the middle (assuming we stick virtio_device_ready() > > > > > > > > > early): > > > > > > > > > > > > > > > > > > virtio_device_ready() > > > > > > > > > virtqueue_kick() > > > > > > > > > /* the window */ > > > > > > > > > subsystem_registration() > > > > > > > > > > > > > > > > Absolutely, however, I do not think we really have many such drivers > > > > > > > > since this has been known as a wrong thing to do since the beginning. > > > > > > > > Want to try to find any? > > > > > > > > > > > > > > Yes, let me try and update. > > > > > > > > > > > > This is basically the device that have an RX queue, so I've found the > > > > > > following drivers: > > > > > > > > > > > > scmi, mac80211_hwsim, vsock, bt, balloon. > > > > > > > > > > Looked and I don't see it yet. Let's consider > > > > > ./net/vmw_vsock/virtio_transport.c for example. Assuming we block > > > > > callbacks until the first kick, what is the issue with probe exactly? > > > > > > > > We need to make sure the callback can survive when it runs before sub > > > > system registration. > > > > > > With my proposal no - only if we also kick before registration. > > > So I do not see the issue yet. > > > > > > Consider ./net/vmw_vsock/virtio_transport.c > > > > > > kicks: virtio_transport_send_pkt_work, > > > virtio_vsock_rx_fill, virtio_vsock_event_fill > > > > > > which of these triggers before we are ready to > > > handle callbacks? > > > > So: > > > > virtio_vsock_vqs_init() > > virtio_device_ready() > > virtio_vsock_rx_fill() /* kick there */ > > rcu_assign_pointer(the_virtio_vsock, vsock) > > > > It means at least virtio_vsock_rx_done()/virtio_vsock_workqueue needs > > to survive. I don't say it has a bug but we do need to audit the code > > in this case. The implication is: the virtqueue callback should be > > written with no assumption that the driver has registered in the > > subsystem. We don't or can't assume all drivers are written in this > > way. > > > I thought you said you audited code and found bugs. > > My claim is that simply because qemu starts processing > packets immediately upon kick, if bugs like this > existed we would have noticed by now. This is true for a well behaved hypervisor. But what we want to deal with is the buggy/malicious hypervisors. > > In this case the_virtio_vsock is used for xmit things, > callbacks do not seem to use it at all. So the hypervisor can trigger the notification just after the kick and the work function seems to be safe. One another example for this is in virtcons_probe(): spin_lock_init(&portdev->ports_lock); INIT_LIST_HEAD(&portdev->ports); INIT_LIST_HEAD(&portdev->list); virtio_device_ready(portdev->vdev); INIT_WORK(&portdev->config_work, &config_work_handler); INIT_WORK(&portdev->control_work, &control_work_handler); in control_intr() we had: static void control_intr(struct virtqueue *vq) { struct ports_device *portdev; portdev = vq->vdev->priv; schedule_work(&portdev->control_work); } So we might crash if the notification is raised just after virtio_device_ready(). This is not an exact example of when a callback is not ready after kick, but it demonstrates that the callback could have assumed that all setup has been done when it is called. Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >I couldn't ... except maybe bluetooth > > > > > > > > but that's just maintainer nacking fixes saying he'll fix it > > > > > > > > his way ... > > > > > > > > > > > > > > > > > And during remove(), we get another window: > > > > > > > > > > > > > > > > > > subsysrem_unregistration() > > > > > > > > > /* the window */ > > > > > > > > > virtio_device_reset() > > > > > > > > > > > > > > > > Same here. > > > > > > > > > > > > Basically for the drivers that set driver_ok before registration, > > > > > > > > > > I don't see what does driver_ok have to do with it. > > > > > > > > I meant for those driver, in probe they do() > > > > > > > > virtio_device_ready() > > > > subsystem_register() > > > > > > > > In remove() they do > > > > > > > > subsystem_unregister() > > > > virtio_device_reset() > > > > > > > > for symmetry > > > > > > Let's leave remove alone for now. I am close to 100% sure we have *lots* > > > of issues around it, but while probe is unavoidable remove can be > > > avoided by blocking hotplug. > > > > Unbind can trigger this path as well. > > > > > > > > > > > > > > > > > > > so > > > > > > we have a lot: > > > > > > > > > > > > blk, net, mac80211_hwsim, scsi, vsock, bt, crypto, gpio, gpu, i2c, > > > > > > iommu, caif, pmem, input, mem > > > > > > > > > > > > So I think there's no easy way to harden the notification without > > > > > > auditing the driver one by one (especially considering the driver may > > > > > > use bh or workqueue). The problem is the notification hardening > > > > > > depends on a correct or race-free probe/remove. So we need to fix the > > > > > > issues in probe/remove then do the hardening on the notification. > > > > > > > > > > > > Thanks > > > > > > > > > > So if drivers kick but are not ready to get callbacks then let's fix > > > > > that first of all, these are racy with existing qemu even ignoring > > > > > spec compliance. > > > > > > > > Yes, (the patches I've posted so far exist even with a well-behaved device). > > > > > > > > Thanks > > > > > > patches you posted deal with DRIVER_OK spec compliance. > > > I do not see patches for kicks before callbacks are ready to run. > > > > Yes. > > > > Thanks > > > > > > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > >