Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752753AbdI1KRL (ORCPT ); Thu, 28 Sep 2017 06:17:11 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:25554 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752681AbdI1KRJ (ORCPT ); Thu, 28 Sep 2017 06:17:09 -0400 Subject: Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page To: Boris Ostrovsky References: <20170927134623.3147-1-joao.m.martins@oracle.com> <20170927134623.3147-3-joao.m.martins@oracle.com> <0f58594c-ee42-7813-a0c2-1cbbc1e2a576@oracle.com> <2fb49000-5f07-8cb9-f16d-1701c36b6c49@oracle.com> <5bb6cc3b-f9de-35b9-2546-3ef97b6c0b71@oracle.com> Cc: linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Juergen Gross , Andy Lutomirski From: Joao Martins Message-ID: <0a318172-0d88-37cb-9ef7-5a4089874660@oracle.com> Date: Thu, 28 Sep 2017 11:16:40 +0100 MIME-Version: 1.0 In-Reply-To: <5bb6cc3b-f9de-35b9-2546-3ef97b6c0b71@oracle.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4839 Lines: 111 On 09/28/2017 12:46 AM, Joao Martins wrote: > On 09/27/2017 11:44 PM, Boris Ostrovsky wrote: >> On 09/27/2017 04:57 PM, Joao Martins wrote: >>> On 09/27/2017 09:22 PM, Boris Ostrovsky wrote: >>>> On 09/27/2017 11:26 AM, Joao Martins wrote: >>>>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote: >>>>>>> +static void xen_setup_vsyscall_time_info(void) >>>>>>> +{ >>>>>>> + struct vcpu_register_time_memory_area t; >>>>>>> + struct pvclock_vsyscall_time_info *ti; >>>>>>> + struct pvclock_vcpu_time_info *pvti; >>>>>>> + int ret; >>>>>>> + >>>>>>> + pvti = &__this_cpu_read(xen_vcpu)->time; >>>>>>> + >>>>>>> + /* >>>>>>> + * We check ahead on the primary time info if this >>>>>>> + * bit is supported hence speeding up Xen clocksource. >>>>>>> + */ >>>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) >>>>>>> + return; >>>>>>> + >>>>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); >>>>>> Is it OK to have this flag set if anything below fails? >>>>>> >>>>> Yes - if anything below fails it will only affect userspace mapped page. >>>> Then should it be set somewhere else, like in xen_time_init()? >>>> >>> Hm, I could move it if you think it's better - but given the importance of the >>> bit we are checking and its direct correlation to whether or not we can setup >>> VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One >>> thing I failed to mention before is that checking ahead like above, let us also >>> avoid allocating a page plus an hypercall to register the pvti just to check the >>> one bit of info we need for using VCLOCK_PVCLOCK. >>> >>> It is very unlikely with current Xen code that 1) the secondary copy register >>> below fails, or 2) master and secondary don't have the same bits set. So in case >>> you're reconsidering the "shortcut" check above I can move it like we had in v1 >>> and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va(). >> >> I think it would be more logical to move it to the end like in v1. >> >> But can you explain again why this flag should not be set in >> xen_time_init()? > > I didn't say we shouldn't have this flag there - I was just pointing out a > matter of taste on whether to put on xen_time_init() or in > xen_setup_vsyscall_time_info() (which is called from xen_time_init btw) so > there's no functional change. > To be clear, in this paragraph when I say on xen_setup_vsyscall_time_info() I mean like it is described in this patch i.e. in the beginning of the routine. >> It seems to me that it would be useful not just for >> vDSO but for xen_clocksource_read()->pvclock_clocksource_read() as well. > > Right - That's what I mentioned by "allowing xen clocksource to use/check that > bit (consequently speeding up sched_clock)". The above chunk is really focused > on enabling the flag on pvclock_clocksource_read(). > >>> >>>>> What I >>>>> do above is just allowing xen clocksource to use/check that bit (consequently >>>>> speeding up sched_clock) given the necessary support is there in the master >>>>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso) >>>>> has the same data from the master copy, just separate memory regions. The checks >>>>> below are just for the unlikely cases of failing to register the secondary copy >>>>> or if its content were to differ from master copy in future releases - and >>>>> therefore we handle those more gracefully. >>>>> >>>>>> (I can see in the changelog that apparently at some point I've asked >>>>>> about this at v1 but I can't remember/find what exactly it was) >>>>>> >>>>>>> + >>>>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL); >>>>>>> + if (!ti) >>>>>>> + return; >>>>>>> + >>>>>>> + t.addr.v = &ti->pvti; >>>>>>> + >>>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); >>>>>>> + if (ret) { >>>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret); >>>>>>> + free_page((unsigned long)ti); >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + /* >>>>>>> + * If the check above succedded this one should too since it's the >>>>>>> + * same data on both primary and secondary time infos just different >>>>>>> + * memory regions. But we still check it in case hypervisor is buggy. >>>>>>> + */ >>>>>>> + pvti = &ti->pvti; >>>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) { >>>>>>> + t.addr.v = NULL; >>>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, >>>>>>> + 0, &t); >>>>>>> + if (!ret) >>>>>>> + free_page((unsigned long)ti); >>>>>>> + >>>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n"); >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + xen_clock = ti; >>>>>>> + pvclock_set_pvti_cpu0_va(xen_clock); >>>>>>> + >>>>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; >>>>>>> +} >>>>>>> + >>