Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966312AbbD2Q45 (ORCPT ); Wed, 29 Apr 2015 12:56:57 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:29541 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753725AbbD2Q4y (ORCPT ); Wed, 29 Apr 2015 12:56:54 -0400 Message-ID: <55410CD6.2000708@oracle.com> Date: Wed, 29 Apr 2015 12:54:46 -0400 From: Boris Ostrovsky User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: David Vrabel , konrad.wilk@oracle.com CC: xen-devel@lists.xenproject.org, annie.li@oracle.com, linux-kernel@vger.kernel.org Subject: Re: [Xen-devel] [PATCH 1/4] xen/events: Clear cpu_evtchn_mask before resuming References: <1430236333-11905-1-git-send-email-boris.ostrovsky@oracle.com> <1430236333-11905-2-git-send-email-boris.ostrovsky@oracle.com> <553FB540.4040707@citrix.com> <553FD1A5.60409@oracle.com> <554107A6.50604@citrix.com> In-Reply-To: <554107A6.50604@citrix.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5848 Lines: 132 On 04/29/2015 12:32 PM, David Vrabel wrote: > On 28/04/15 19:29, Boris Ostrovsky wrote: >> On 04/28/2015 12:28 PM, David Vrabel wrote: >>> On 28/04/15 16:52, Boris Ostrovsky wrote: >>>> When a guest is resumed, the hypervisor may change event channel >>>> assignments. If this happens and the guest uses 2-level events it >>>> is possible for the interrupt to be claimed by wrong VCPU since >>>> cpu_evtchn_mask bits may be stale. This can happen even though >>>> evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that >>>> is passed in is not necessarily the original one (from pre-migration >>>> times) but instead is freshly allocated during resume and so any >>>> information about which CPU the channel was bound to is lost. >>>> >>>> Thus we should clear the mask during resume. >>>> >>>> We also need to make sure that bits for xenstore and console channels >>>> are set when these two subsystems are resumed. While rebind_evtchn_irq() >>>> (which is invoked for both of them on a resume) calls >>>> irq_set_affinity(), >>>> the latter will in fact postpone setting affinity until handling the >>>> interrupt. But because cpu_evtchn_mask will have bits for these two >>>> cleared >>>> we won't be able to take the interrupt. >>>> >>>> Setting IRQ_MOVE_PCNTXT flag for the two irqs avoids this problem by >>>> allowing to set affinity immediately, which is safe for >>>> event-channel-based >>>> interrupts. >>> [...] >>>> --- a/drivers/tty/hvc/hvc_xen.c >>>> +++ b/drivers/tty/hvc/hvc_xen.c >>>> @@ -533,6 +533,7 @@ static int __init xen_hvc_init(void) >>>> info = vtermno_to_xencons(HVC_COOKIE); >>>> info->irq = bind_evtchn_to_irq(info->evtchn); >>>> + irq_set_status_flags(info->irq, IRQ_MOVE_PCNTXT); >>>> } >>>> if (info->irq < 0) >>>> info->irq = 0; /* NO_IRQ */ >>>> diff --git a/drivers/xen/events/events_2l.c >>>> b/drivers/xen/events/events_2l.c >>>> index 5db43fc..7dd4631 100644 >>>> --- a/drivers/xen/events/events_2l.c >>>> +++ b/drivers/xen/events/events_2l.c >>>> @@ -345,6 +345,15 @@ irqreturn_t xen_debug_interrupt(int irq, void >>>> *dev_id) >>>> return IRQ_HANDLED; >>>> } >>>> +static void evtchn_2l_resume(void) >>>> +{ >>>> + int i; >>>> + >>>> + for_each_online_cpu(i) >>>> + memset(per_cpu(cpu_evtchn_mask, i), 0, sizeof(xen_ulong_t) * >>>> + EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD); >>>> +} >>>> + >>>> static const struct evtchn_ops evtchn_ops_2l = { >>>> .max_channels = evtchn_2l_max_channels, >>>> .nr_channels = evtchn_2l_max_channels, >>>> @@ -356,6 +365,7 @@ static const struct evtchn_ops evtchn_ops_2l = { >>>> .mask = evtchn_2l_mask, >>>> .unmask = evtchn_2l_unmask, >>>> .handle_events = evtchn_2l_handle_events, >>>> + .resume = evtchn_2l_resume, >>>> }; >>>> void __init xen_evtchn_2l_init(void) >>>> diff --git a/drivers/xen/xenbus/xenbus_comms.c >>>> b/drivers/xen/xenbus/xenbus_comms.c >>>> index fdb0f33..30203d1 100644 >>>> --- a/drivers/xen/xenbus/xenbus_comms.c >>>> +++ b/drivers/xen/xenbus/xenbus_comms.c >>>> @@ -231,6 +231,7 @@ int xb_init_comms(void) >>>> } >>>> xenbus_irq = err; >>>> + irq_set_status_flags(xenbus_irq, IRQ_MOVE_PCNTXT); >>> IRQ_MOVE_PCNTXT means "Interrupt can be migrated from process context" >>> which doesn't really sound relevant to me here? >>> >>> Thomas Glexnier is really not happy with mis-use of IRQ APIs. >> Yes, this is somewhat expanding definition of IRQ_MOVE_PCNTXT but then >> there are other devices that do the same (HPET, for one). > Ok. But why set it on only these two event channels and not every one? On resume in xen_irq_resume() we do for_each_possible_cpu(cpu) { restore_cpu_virqs(cpu); restore_cpu_ipis(cpu); } restore_pirqs(); And console and xenstore are neither of the those three types. rebind_irq_to_cpu() essentially plays the role of these restore_*() routines, except it does not make the hypercall to let the hypervisor assign notify_vcpu_id and it doesn't do explicit bind_evtchn_to_cpu(). Which makes me think that perhaps we should modify rebind_irq_to_cpu() to in fact do these two things. We may still need to keep irq_set_affinity() so that generic affinity data is properly set up but this will be deferred until the first interrupt. > >>> From the commit log the evtchn_2l_resume() fucntion that's added sounds >>> like it fixes the problem on its own? >> It in fact makes this problem worse since now that cpu_evtchn_mask is >> cleared during resume we cannot process the interrupt anymore in >> evtchn_2l_handle_events(): irqs have to be bound to a cpu in order for >> an interrupt to be processed. > Perhaps evtchn_2l_resume() should set the local cpu mask for any bound > event channels? And then you wouldn't need IRQ_MOVE_PCNTX. But then (at least in 2-level case) more than one VCPUs may pick the same interrupt, won't they? Because the local cpu mask is what tells a VCPU that it is allowed to claim an interrupt. -boris > >> We already have this issue even without clearing the mask if, say, >> xenstore channel changes or if it gets bound to cpu other than 0 before >> suspend. It's just that we rarely (if ever) see this happen. >> >> We can explicitly call events_base.c:set_affinity_irq() from >> rebind_evtchn_irq() after irq_set_affinity() if we want to avoid using >> IRQ_MOVE_PCNTXT flag but that will also looks kind of strange. > David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/