Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Subject: Re: Interrupt for port 19, but apparently not enabled; per-user
 000000004af23acc
To:     Juergen Gross <jgross@suse.com>
Cc:     "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
        linux-kernel@vger.kernel.org, mheyne@amazon.de
References: <6552fc66-ba19-2c77-7928-b0272d3e1622@xen.org>
 <4d8a7ba7-a9f6-2999-8750-bfe2b85f064e@suse.com>
 <9a08bbf2-ba6a-6e49-3bcb-bfe2beb32b99@xen.org>
 <5d88a82e-d237-7803-7b50-897e857f2fbd@suse.com>
From:   Julien Grall <julien@xen.org>
Message-ID: <3d029164-43b7-d65f-4a8a-3ddef5e743e5@xen.org>
Date:   Wed, 23 Jun 2021 14:25:33 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <5d88a82e-d237-7803-7b50-897e857f2fbd@suse.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
Precedence: bulk

Hi Juergen,

On 22/06/2021 17:14, Juergen Gross wrote:
> On 22.06.21 14:21, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 22/06/2021 13:04, Juergen Gross wrote:
>>> On 22.06.21 12:24, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> As discussed on IRC yesterday, we noticed a couple of splat in 5.13-rc6 
>>>
>>>> (and stable 5.4) in the evtchn driver:
>>>>
>>>> [    7.581000] ------------[ cut here ]------------
>>>> [    7.581899] Interrupt for port 19, but apparently not 
>>> enabled;
>>>> per-user 000000004af23acc
>>>> [    7.583401] WARNING: CPU: 0 PID: 467 at 
>>>> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:169 
>>>> evtchn_interrupt+0xd5/0x100
>>>> [    7.585583] Modules linked in:
>>>> [    7.586188] CPU: 0 PID: 467 Comm: xenstore-read Not 
> tainted
>>>> 5.13.0-rc6 #240
>>>> [    7.587462] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>>>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>> [    7.589462] RIP: e030:evtchn_interrupt+0xd5/0x100
>>>> [    7.590361] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 
>>> be 1d 00 00 00
>>>> e8 5f 72 c4 ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 03 5f 82 e8 6b 2d 96 
>>>
>>>> ff <0f> 0b e9 4d ff ff ff 41 0f b6 f4 48 c7 c7 80 da a2 82 e8 f0
>>>> [    7.593662] RSP: e02b:ffffc90040003e60 EFLAGS: 00010082
>>>> [    7.594636] RAX: 0000000000000000 RBX: ffff888102328c00 RCX: 
>>>> 0000000000000027
>>>> [    7.595924] RDX: 0000000000000000 RSI: ffff88817fe18ad0 RDI: 
>>>> ffff88817fe18ad8
>>>> [    7.597216] RBP: ffff888108ef8140 R08: 0000000000000000 R09: 
>>>> 0000000000000001
>>>> [    7.598522] R10: 0000000000000000 R11: 7075727265746e49 R12: 
>>>> 0000000000000000
>>>> [    7.599810] R13: ffffc90040003ec4 R14: ffff8881001b8000 R15: 
>>>> ffff888109b36f80
>>>> [    7.601113] FS:  0000000000000000(0000) GS:ffff88817fe00000(0000) 
>>>> knlGS:0000000000000000
>>>> [    7.602570] CS:  10000e030 DS: 0000 ES: 0000 CR0:0000000080050033
>>>> [    7.603700] CR2: 00007f15b390e368 CR3: 000000010bb04000 CR4: 
>>>> 0000000000050660
>>>> [    7.604993] Call Trace:
>>>> [    7.605501]  <IRQ>
>>>> [    7.605929]  __handle_irq_event_percpu+0x4c/0x330
>>>> [    7.606817]  handle_irq_event_percpu+0x32/0xa0
>>>> [    7.607670]  handle_irq_event+0x3a/0x60
>>>> [    7.608416]  handle_edge_irq+0x9b/0x1f0
>>>> [    7.609154]  generic_handle_irq+0x4f/0x60
>>>> [    7.609918]  __evtchn_fifo_handle_events+0x195/0x3a0
>>>> [    7.610864]  __xen_evtchn_do_upcall+0x66/0xb0
>>>> [    7.611693]  __xen_pv_evtchn_do_upcall+0x1d/0x30
>>>> [    7.612582]  xen_pv_evtchn_do_upcall+0x9d/0xc0
>>>> [    7.613439]  </IRQ>
>>>> [    7.613882]  exc_xen_hypervisor_callback+0x8/0x10
>>>>
>>>> This is quite similar to the problem I reported a few months ago (see 
> 
>>>> [1]) but this time this is happening with fifo rather than 2L.
>>>>
>>>> I haven't been able to reproduced it reliably so far. But looking at 
>>>> the code, I think I have found another potential race after commit
>>>>
>>>> commit b6622798bc50b625a1e62f82c7190df40c1f5b21
>>>> Author: Juergen Gross <jgross@suse.com>
>>>> Date:   Sat Mar 6 17:18:33 2021 +0100
>>>>     xen/events: avoid handling the same event on two cpusat the same 
>>>> time
>>>>     When changing the cpu affinity of an event it can happen today that
>>>>     (with some unlucky timing) the same event will be handled 
>>> on the old
>>>>     and the new cpu at the same time.
>>>>     Avoid that by adding an "event active" flag to the 
> per-event data
>>>> and
>>>>     call the handler only if this flag isn't set.
>>>>     Cc: stable@vger.kernel.org
>>>>     Reported-by: Julien Grall <julien@xen.org>
>>>>     Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>     Reviewed-by: Julien Grall <jgrall@amazon.com>
>>>>     Link: 
>>>> https://lore.kernel.org/r/20210306161833.4552-4-jgross@suse.com
>>>>     Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>
>>>> The evtchn driver will use the lateeoi handlers. So the code to ack 
>>>> looks like:
>>>>
>>>> do_mask(..., EVT_MASK_REASON_EOI_PENDING)
>>>> smp_store_release(&info->is_active, 0);
>>>> clear_evtchn(info->evtchn);
>>>>
>>>> The code to handle an interrupts look like:
>>>>
>>>> clear_link(...)
>>>> if ( evtchn_fifo_is_pending(port) && !evtchn_fifo_is_mask()) {
>>>>    if (xchg_acquire(&info->is_active, 1)
>>>>      return;
>>>>    generic_handle_irq();
>>>> }
>>>>
>>>> After changing the affinity, an interrupt may be received once on the 
> 
>>>> previous vCPU. So, I think the following can happen:
>>>>
>>>> vCPU0                             | vCPU1
>>>>                    |
>>>>   Receive event              |
>>>>                    | change affinity to vCPU1
>>>>   clear_link()              |
>>>>                        |
>>>>                 /* The interrupt is re-raised */
>>>>                    | receive event
>>>>                      |
>>>>                    | /* The interrupt is not masked */
>>>>   info->is_active = 1          |
>>>>   do_mask(...)              |
>>>>   info->is_active = 0          |
>>>>                    | info->is_active = 1
>>>>   clear_evtchn(...)               |
>>>>                                   | do_mask(...)
>>>>                                   | info->is_active = 0
>>>>                    | clear_evtchn(...)
>>>>
>>>> Does this look plausible to you?
>>>
>>> Yes, it does.
>>>
>>> Thanks for the analysis.
>>>
>>> So I guess for lateeoi events we need to clear is_active only in
>>> xen_irq_lateeoi()? At a first glance this should fix the issue.
>>
>> It should work and would be quite neat. But, I believe clear_evtchn() 
>> would have to stick in the ack helper to avoid losing interrupts.
>>
> 
> Could you try the attached patch, please? Only compile tested.

Thanks for the patch! I have also found a reproducer on Linux 5.13 so it 
was easier to confirm the patch works.

The reproducer is continuously the affinity of the interrupt under high 
interrupt load. After a few seconds I can see dozen of WARN splat.

Regarding the patch itself, a few suggestions:
   1) It is not entirely obvious from the code why ack_mask_dynirq() is 
not modified. My understanding is we are assuming that the 
xen_irq_lateeoi_locked() will not be called in this case. I would 
suggest to spell it clearly in the commit message
   2) I would suggest to add a comment in the code explaining why 
event_handler_exit() is not used. It is probably worth to also add one 
in event_handler_exit() so one know that this doesn't cover all the paths.

Cheers,

-- 
Julien Grall