Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp719313ybk; Wed, 20 May 2020 10:13:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyTyLckX4oYElMR88aW61vOQ20bHaFXPlNthiWZwfihPsGeVBdRiKGtq97U0K2GdnjVKPc2 X-Received: by 2002:a05:6402:3cd:: with SMTP id t13mr4487416edw.285.1589994803188; Wed, 20 May 2020 10:13:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589994803; cv=none; d=google.com; s=arc-20160816; b=W2Q5gvBg7Z0eE1SIp1+M05eVR4ypBPk7AZ/Quio/E+OTrpbP1Re8v/9bxZYGaJak3o hvlYpGbscxHhVnf4Tst93fECvAR1OP6IYne4mKIQc/L+aJlswn3xYTD0aK1bHgYnTwrD 30dMBckV+R7XvDRtZxWXAOy+5X6npLOx+TSnFasQlAZd4Q2WZ7gAnZ7XjG5jUj+SgACe r3Af3kfMXPeLB4SaIo5BSqxNgvtPTtUxhnQX0ytrNTcOEI1gt5QKmkJFdpZ6uq0rMybE 9liUkDTyNk0E9K05x6BSgSqU2om1E4xOQClEpSKwCRG482gKPKwyWvp4L6ylKuknoQ2b UfOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=AzsDPxgzoCuKRZYFiIqF09hef2HOA3lBX+YyW5SFYmM=; b=pY1zNFF5TCceW965kGlKgA/uaUqKhojId0t+yClcOWEh8nG9VmDxDRAjVcHxSmNG7j 8ORsvzkHevKaq5QOoeH4qNJCa/z299coe5SwWs2BlKXwyK4YQZw4Bm1T7pjp2EIvG+jQ RaUoFy2wTHDGhMK3i/O4MewrlwM/HCqzhCQnm5qjgd8JLSvdPCB7MRKrS35WbRaGMaR5 z76qBnuXI5Ji8zUUpO1dYwx3vCejkoX1V0+BQJdQe4mqmlGFz+e4wnJP4kpU1zUYd5TI GE+ShyaxsTng8wXLcHxj0KbGqjovNEcDrkJMBx6IhGrPxHUB9V4njGTHLTbSMl9nESJ3 NHnw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v26si2119957ejb.314.2020.05.20.10.12.59; Wed, 20 May 2020 10:13:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726846AbgETRJf (ORCPT + 99 others); Wed, 20 May 2020 13:09:35 -0400 Received: from brightrain.aerifal.cx ([216.12.86.13]:59644 "EHLO brightrain.aerifal.cx" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726691AbgETRJf (ORCPT ); Wed, 20 May 2020 13:09:35 -0400 Date: Wed, 20 May 2020 13:09:34 -0400 From: Rich Felker To: Szabolcs Nagy Cc: Arnd Bergmann , Adhemerval Zanella , Vincenzo Frascino , Russell King - ARM Linux , Will Deacon , Jack Schmidt , Linux ARM , Linux Kernel Mailing List , Thomas Gleixner , Stephen Boyd , nd@arm.com Subject: Re: clock_gettime64 vdso bug on 32-bit arm, rpi-4 Message-ID: <20200520170932.GO1079@brightrain.aerifal.cx> References: <0c2abcd1-7da8-2559-1e93-4c3bdd38dec1@linaro.org> <20200520154128.GA24483@arm.com> <20200520160810.GM1079@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200520160810.GM1079@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 20, 2020 at 12:08:10PM -0400, Rich Felker wrote: > On Wed, May 20, 2020 at 04:41:29PM +0100, Szabolcs Nagy wrote: > > The 05/19/2020 22:31, Arnd Bergmann wrote: > > > On Tue, May 19, 2020 at 10:24 PM Adhemerval Zanella > > > wrote: > > > > On 19/05/2020 16:54, Arnd Bergmann wrote: > > > > > Jack Schmidt reported a bug for the arm32 clock_gettimeofday64 vdso call last > > > > > month: https://github.com/richfelker/musl-cross-make/issues/96 and > > > > > https://github.com/raspberrypi/linux/issues/3579 > > > > > > > > > > As Will Deacon pointed out, this was never reported on the mailing list, > > > > > so I'll try to summarize what we know, so this can hopefully be resolved soon. > > > > > > > > > > - This happened reproducibly on Linux-5.6 on a 32-bit Raspberry Pi patched > > > > > kernel running on a 64-bit Raspberry Pi 4b (bcm2711) when calling > > > > > clock_gettime64(CLOCK_REALTIME) > > > > > > > > Does it happen with other clocks as well? > > > > > > Unclear. > > > > > > > > - The kernel tree is at https://github.com/raspberrypi/linux/, but I could > > > > > see no relevant changes compared to a mainline kernel. > > > > > > > > Is this bug reproducible with mainline kernel or mainline kernel can't be > > > > booted on bcm2711? > > > > > > Mainline linux-5.6 should boot on that machine but might not have > > > all the other features, so I think users tend to use the raspberry pi > > > kernel sources for now. > > > > > > > > - From the report, I see that the returned time value is larger than the > > > > > expected time, by 3.4 to 14.5 million seconds in four samples, my > > > > > guess is that a random number gets added in at some point. > > > > > > > > What kind code are you using to reproduce it? It is threaded or issue > > > > clock_gettime from signal handlers? > > > > > > The reproducer is very simple without threads or signals, > > > see the start of https://github.com/richfelker/musl-cross-make/issues/96 > > > > > > It does rely on calling into the musl wrapper, not the direct vdso > > > call. > > > > > > > > - From other sources, I found that the Raspberry Pi clocksource runs > > > > > at 54 MHz, with a mask value of 0xffffffffffffff. From these numbers > > > > > I would expect that reading a completely random hardware register > > > > > value would result in an offset up to 1.33 billion seconds, which is > > > > > around factor 100 more than the error we see, though similar. > > > > > > > > > > - The test case calls the musl clock_gettime() function, which falls back to > > > > > the clock_gettime64() syscall on kernels prior to 5.5, or to the 32-bit > > > > > clock_gettime() prior to Linux-5.1. As reported in the bug, Linux-4.19 does > > > > > not show the bug. > > > > > > > > > > - The behavior was not reproduced on the same user space in qemu, > > > > > though I cannot tell whether the exact same kernel binary was used. > > > > > > > > > > - glibc-2.31 calls the same clock_gettime64() vdso function on arm to > > > > > implement clock_gettime(), but earlier versions did not. I have not > > > > > seen any reports of this bug, which could be explained by users > > > > > generally being on older versions. > > > > > > > > > > - As far as I can tell, there are no reports of this bug from other users, > > > > > and so far nobody could reproduce it. > > > > note: i could not reproduce it in qemu-system with these configs: > > > > qemu-system-aarch64 + arm64 kernel + compat vdso > > qemu-system-aarch64 + kvm accel (on cortex-a72) + 32bit arm kernel > > qemu-system-arm + cpu max + 32bit arm kernel > > > > so i think it's something specific to that user's setup > > (maybe rpi hw bug or gcc miscompiled the vdso or something > > with that particular linux, i built my own linux 5.6 because > > i did not know the exact kernel version where the bug was seen) > > > > i don't have access to rpi (or other cortex-a53 where i > > can install my own kernel) so this is as far as i got. > > If we have a binary of the kernel that's known to be failing on the > hardware, it would be useful to dump its vdso and examine the > disassembly to see if it was miscompiled. OK, OP posted it and I think we've solved this. See https://github.com/richfelker/musl-cross-make/issues/96#issuecomment-631604410 And my analysis: <@dalias> see what i just found on the tracker <@dalias> patch_vdso/vdso_nullpatch_one in arch/arm/kernel/vdso.c patches out the time32 functions in this case <@dalias> but not the time64 one <@dalias> this looks like a real kernel bug that's not hw-specific except breaking on all hardware where the patching-out is needed <@dalias> we could possibly work around it by refusing to use the time64 vdso unless the time32 one is also present <@dalias> yep <@dalias> so i think we've solved this. the kernel thought it wasnt using vdso anymore because it patched it out <@dalias> but it forgot to patch out the time64 one <@dalias> so it stopped updating the data needed for vdso to work