Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752470AbdI0Xqu (ORCPT ); Wed, 27 Sep 2017 19:46:50 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:25738 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752357AbdI0Xqt (ORCPT ); Wed, 27 Sep 2017 19:46:49 -0400 Subject: Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page To: Boris Ostrovsky References: <20170927134623.3147-1-joao.m.martins@oracle.com> <20170927134623.3147-3-joao.m.martins@oracle.com> <0f58594c-ee42-7813-a0c2-1cbbc1e2a576@oracle.com> <2fb49000-5f07-8cb9-f16d-1701c36b6c49@oracle.com> Cc: linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Juergen Gross , Andy Lutomirski From: Joao Martins Message-ID: <5bb6cc3b-f9de-35b9-2546-3ef97b6c0b71@oracle.com> Date: Thu, 28 Sep 2017 00:46:21 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4522 Lines: 107 On 09/27/2017 11:44 PM, Boris Ostrovsky wrote: > On 09/27/2017 04:57 PM, Joao Martins wrote: >> On 09/27/2017 09:22 PM, Boris Ostrovsky wrote: >>> On 09/27/2017 11:26 AM, Joao Martins wrote: >>>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote: >>>>>> +static void xen_setup_vsyscall_time_info(void) >>>>>> +{ >>>>>> + struct vcpu_register_time_memory_area t; >>>>>> + struct pvclock_vsyscall_time_info *ti; >>>>>> + struct pvclock_vcpu_time_info *pvti; >>>>>> + int ret; >>>>>> + >>>>>> + pvti = &__this_cpu_read(xen_vcpu)->time; >>>>>> + >>>>>> + /* >>>>>> + * We check ahead on the primary time info if this >>>>>> + * bit is supported hence speeding up Xen clocksource. >>>>>> + */ >>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) >>>>>> + return; >>>>>> + >>>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); >>>>> Is it OK to have this flag set if anything below fails? >>>>> >>>> Yes - if anything below fails it will only affect userspace mapped page. >>> Then should it be set somewhere else, like in xen_time_init()? >>> >> Hm, I could move it if you think it's better - but given the importance of the >> bit we are checking and its direct correlation to whether or not we can setup >> VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One >> thing I failed to mention before is that checking ahead like above, let us also >> avoid allocating a page plus an hypercall to register the pvti just to check the >> one bit of info we need for using VCLOCK_PVCLOCK. >> >> It is very unlikely with current Xen code that 1) the secondary copy register >> below fails, or 2) master and secondary don't have the same bits set. So in case >> you're reconsidering the "shortcut" check above I can move it like we had in v1 >> and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va(). > > I think it would be more logical to move it to the end like in v1. > > But can you explain again why this flag should not be set in > xen_time_init()? I didn't say we shouldn't have this flag there - I was just pointing out a matter of taste on whether to put on xen_time_init() or in xen_setup_vsyscall_time_info() (which is called from xen_time_init btw) so there's no functional change. > It seems to me that it would be useful not just for > vDSO but for xen_clocksource_read()->pvclock_clocksource_read() as well. Right - That's what I mentioned by "allowing xen clocksource to use/check that bit (consequently speeding up sched_clock)". The above chunk is really focused on enabling the flag on pvclock_clocksource_read(). >> >>>> What I >>>> do above is just allowing xen clocksource to use/check that bit (consequently >>>> speeding up sched_clock) given the necessary support is there in the master >>>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso) >>>> has the same data from the master copy, just separate memory regions. The checks >>>> below are just for the unlikely cases of failing to register the secondary copy >>>> or if its content were to differ from master copy in future releases - and >>>> therefore we handle those more gracefully. >>>> >>>>> (I can see in the changelog that apparently at some point I've asked >>>>> about this at v1 but I can't remember/find what exactly it was) >>>>> >>>>>> + >>>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL); >>>>>> + if (!ti) >>>>>> + return; >>>>>> + >>>>>> + t.addr.v = &ti->pvti; >>>>>> + >>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); >>>>>> + if (ret) { >>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret); >>>>>> + free_page((unsigned long)ti); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + /* >>>>>> + * If the check above succedded this one should too since it's the >>>>>> + * same data on both primary and secondary time infos just different >>>>>> + * memory regions. But we still check it in case hypervisor is buggy. >>>>>> + */ >>>>>> + pvti = &ti->pvti; >>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) { >>>>>> + t.addr.v = NULL; >>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, >>>>>> + 0, &t); >>>>>> + if (!ret) >>>>>> + free_page((unsigned long)ti); >>>>>> + >>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n"); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + xen_clock = ti; >>>>>> + pvclock_set_pvti_cpu0_va(xen_clock); >>>>>> + >>>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; >>>>>> +} >>>>>> + >