Received: by 2002:a05:6359:322:b0:b3:69d0:12d8 with SMTP id ef34csp147973rwb; Wed, 10 Aug 2022 06:17:00 -0700 (PDT) X-Google-Smtp-Source: AA6agR5igWZutP6Evz+qrw/PSKvskLibbuJoA21Qv32sc2pvZ4YM6kpvhrN5mIaZnKrJ9lGA7ges X-Received: by 2002:a50:ff13:0:b0:43e:76d3:63e1 with SMTP id a19-20020a50ff13000000b0043e76d363e1mr26123602edu.271.1660137420009; Wed, 10 Aug 2022 06:17:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660137420; cv=none; d=google.com; s=arc-20160816; b=A+0LHmpa8a34j1/m1QpAmZHRHaX8gOIm0blNkzH2K+5SaFT2sgi4zJfATmjmZTFALN S48vhCVRk7gaQIAcvJSbSUCpmZf7Pk9A8/dvSakXeSDR4peJDUBJC+ak4MFSp2JXiorR xM2bLz8jKH7HbMARWFS2YaCRsqs+eOmsa1IN5ik36PBwu1DKmyR5d6vvhHs1bqnKcCVs 3mp0oaCnx2+FzvuEOUr1nuz/0araiy2C/a5KGn3Y+jul7S/pS5j5Xej77EbvgdTRVx7Z cL1BY5tCV6FhHQZnh3aFERQ7cv96Tlc/eIZeO6wEcoKj652QcQWNAQLNVptG4HRKLVVH j/pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=1Az9r8PaXn4rkKs3EhG4UMCSFNdSXcfB28ulgnG8CXk=; b=yrMsM4QyVBq8oLXdeMLHzdU/pch9NULB/3ZXgFbZ+KQ+c9Rb9XL3fjgsUQ31w2DVem vfPAsI3FWI0CmX8O+yEtYPT3gc5pfkVTd/8o6obnbyhn0DJhwDF7OnpNbiMsLpQ1je9/ oNeq4MBkiGJT5ubhJ0NqJpd2XkEyYECoH0g2kreXBggYFkPuRBTbsCwvVXI87bHUqVD6 672pWOIQXZ+80GdWCRkfu8E10zoT6RD2iu7Kda7FFaf+j2kw4UKh57k8eyL/7liIx7pn TW3dTIIs7/5My/f6LUW0KOLwimb+eQFRQNYTApelTxdC1ZkUQ6NJQm66RGioDqqB53tS av1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jchSJKp9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f20-20020a50ee94000000b0043e3e5a4d09si10149858edr.71.2022.08.10.06.16.34; Wed, 10 Aug 2022 06:17:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jchSJKp9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232475AbiHJNCL (ORCPT + 99 others); Wed, 10 Aug 2022 09:02:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232461AbiHJNBv (ORCPT ); Wed, 10 Aug 2022 09:01:51 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01EAA6C770; Wed, 10 Aug 2022 06:01:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B952BB81C5A; Wed, 10 Aug 2022 13:01:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7BEC7C433D6; Wed, 10 Aug 2022 13:01:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1660136508; bh=sqRsv3j2KsslosUdoJUnMtju0RyhTkbMMdJP4XMtR88=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=jchSJKp9p/5pLvSZH+39SU4q8ijN/x2AHcXrL9qN7AoW7AiB37LS/ES63Pop4KgUE 40jkUWqzDfg0AKJ2PjdaR2uNXq3O2hzR8T9y7jVvL97ErOwZgfxRzykt+rz5D9O9o5 noHoK+EoDIdFoTRBXks8SetidjeJBTi5fSk6jo35EekFhZKcNJtorj7gv7zhToAqZZ lvVOH5zmAywU5k9/X3T51lhVAGYLPoq7UZalTH6GU87bynLB8wNppLPxvrgGSgoQrZ Vjh6ltoEUaF7xDpvaRAxbK7NWHm/tCyeQtSYn3j8kteRpWdqJZP2S51ghk0H7AFaCB 8DZ8Jx8ZFfkJg== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oLlLG-0028Hw-Ap; Wed, 10 Aug 2022 14:01:46 +0100 Date: Wed, 10 Aug 2022 14:01:45 +0100 Message-ID: <87r11ouu9y.wl-maz@kernel.org> From: Marc Zyngier To: eric.auger@redhat.com Cc: Dmytro Maluka , "Dong, Eddie" , "Christopherson,, Sean" , Paolo Bonzini , "kvm@vger.kernel.org" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "x86@kernel.org" , "H. Peter Anvin" , "linux-kernel@vger.kernel.org" , Alex Williamson , "Liu, Rong L" , Zhenyu Wang , Tomasz Nowicki , Grzegorz Jaszczyk , "upstream@semihalf.com" , Dmitry Torokhov Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding In-Reply-To: <8ff76b5e-ae28-70c8-2ec5-01662874fb15@redhat.com> References: <20220805193919.1470653-1-dmy@semihalf.com> <87o7wsbngz.wl-maz@kernel.org> <8ff76b5e-ae28-70c8-2ec5-01662874fb15@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: eric.auger@redhat.com, dmy@semihalf.com, eddie.dong@intel.com, seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org, alex.williamson@redhat.com, rong.l.liu@intel.com, zhenyuw@linux.intel.com, tn@semihalf.com, jaz@semihalf.com, upstream@semihalf.com, dtor@google.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 10 Aug 2022 09:12:18 +0100, Eric Auger wrote: > > Hi Marc, > > On 8/10/22 08:51, Marc Zyngier wrote: > > On Wed, 10 Aug 2022 00:30:29 +0100, > > Dmytro Maluka wrote: > >> On 8/9/22 10:01 PM, Dong, Eddie wrote: > >>> > >>>> -----Original Message----- > >>>> From: Dmytro Maluka > >>>> Sent: Tuesday, August 9, 2022 12:24 AM > >>>> To: Dong, Eddie ; Christopherson,, Sean > >>>> ; Paolo Bonzini ; > >>>> kvm@vger.kernel.org > >>>> Cc: Thomas Gleixner ; Ingo Molnar ; > >>>> Borislav Petkov ; Dave Hansen ; > >>>> x86@kernel.org; H. Peter Anvin ; linux- > >>>> kernel@vger.kernel.org; Eric Auger ; Alex > >>>> Williamson ; Liu, Rong L ; > >>>> Zhenyu Wang ; Tomasz Nowicki > >>>> ; Grzegorz Jaszczyk ; > >>>> upstream@semihalf.com; Dmitry Torokhov > >>>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding > >>>> > >>>> On 8/9/22 1:26 AM, Dong, Eddie wrote: > >>>>>> The existing KVM mechanism for forwarding of level-triggered > >>>>>> interrupts using resample eventfd doesn't work quite correctly in the > >>>>>> case of interrupts that are handled in a Linux guest as oneshot > >>>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device > >>>>>> in its threaded irq handler, i.e. later than it is acked to the > >>>>>> interrupt controller (EOI at the end of hardirq), not earlier. The > >>>>>> existing KVM code doesn't take that into account, which results in > >>>>>> erroneous extra interrupts in the guest caused by premature re-assert of an > >>>> unacknowledged IRQ by the host. > >>>>> Interesting... How it behaviors in native side? > >>>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the > >>>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its > >>>> immediate re-assert, and then unmasks it later, after its threaded irq handler > >>>> completes. > >>>> > >>>> In handle_fasteoi_irq(): > >>>> > >>>> if (desc->istate & IRQS_ONESHOT) > >>>> mask_irq(desc); > >>>> > >>>> handle_irq_event(desc); > >>>> > >>>> cond_unmask_eoi_irq(desc, chip); > >>>> > >>>> > >>>> and later in unmask_threaded_irq(): > >>>> > >>>> unmask_irq(desc); > >>>> > >>>> I also mentioned that in patch #3 description: > >>>> "Linux keeps such interrupt masked until its threaded handler finishes, to > >>>> prevent the EOI from re-asserting an unacknowledged interrupt. > >>> That makes sense. Can you include the full story in cover letter too? > >> Ok, I will. > >> > >>> > >>>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't > >>>> check that the interrupt is still masked in the guest at the moment of EOI. > >>>> Resamplefd is notified regardless, so vfio prematurely unmasks the host > >>>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host > >>>> and queued for injection to the guest." > > Sorry to barge in pretty late in the conversation (just been Cc'd on > > this), but why shouldn't the resamplefd be notified? If there has been > yeah sorry to get you involved here ;-) No problem! > > an EOI, a new level must be made visible to the guest interrupt > > controller, no matter what the state of the interrupt masking is. > > > > Whether this new level is actually *presented* to a vCPU is another > > matter entirely, and is arguably a problem for the interrupt > > controller emulation. > > FWIU on guest EOI the physical line is still asserted so the pIRQ is > immediatly re-sampled by the interrupt controller (because the > resamplefd unmasked the physical IRQ) and recorded as a guest IRQ > (although it is masked at guest level). When the guest actually unmasks > the vIRQ we do not get a chance to re-evaluate the physical line level. Indeed, and maybe this is what should be fixed instead of moving the resampling point around (I was suggesting something along these lines in [1]). We already do this on arm64 for the timer, and it should be easy enough it generalise to any interrupt backed by the GIC (there is an in-kernel API to sample the pending state). No idea how that translate for other architectures though. M. [1] https://lore.kernel.org/r/87mtccbie4.wl-maz@kernel.org -- Without deviation from the norm, progress is not possible.