Received: by 10.223.164.221 with SMTP id h29csp13673wrb; Tue, 31 Oct 2017 09:16:57 -0700 (PDT) X-Google-Smtp-Source: ABhQp+QZuiBJWciQgsky0qbCQJAKQ3Hvh6w17fTZT7LkchyPn8yyMypt8k6hUaAbL3ZV0Ai0HMMy X-Received: by 10.99.3.213 with SMTP id 204mr2271106pgd.407.1509466617732; Tue, 31 Oct 2017 09:16:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509466617; cv=none; d=google.com; s=arc-20160816; b=jytoq+KJz5ey0lEcuJ9fz1NofxRJZHkFCVrwlF8IvEqK3/8LmaC++tD6Wj0jTJUJIp M4Y3BHaFeq9Qck1DuJO1qM7HpFRn+aijRZPaTDIR16ZPySfp99/rx+wIKTK5Mun+fwBR KNYiSXA0j9qTLQl1vrYonnY4C4RAFY8bAu/trWVhS01SSbrLvfZO6aa62or1nM3iPZ7/ LnT/WvGdlAmM/A5eoRoKjFR07yJ7GOqfMrex7uCYUFLWXmLm9CFaEfu7/DcnCdiX19Ti QC6YfH4Aex60olNEDLqrvpfQqYZOUjz2+sShr1OY4vC0vwh2KxwXtG2vKKgCAwUoagDq iY4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:date:message-id:cc:to:subject:from :arc-authentication-results; bh=1v99jolik1kzKZH0SuPqc95jtjLfBJssAakxpb21dVI=; b=sNtM4HnC9R4y3qA2k3blJ0+LQS7JFPM33jgYlwfwcpC8Mexdoz72ax3uh4VEUVwGHh uWHRi5PXHWyll+QCtjYz0aX8WurxZbIccaAT63ett58hpXlbGcAJ+AykJvGiCH8d0Qb3 ZtdH7uBq44+nrczGtb+euTL7lMMbqHJrEMjKvvkEF1gPNhjd1B+XmGtTYxU0JbJ4CoZz u3i9sabV10CsO4PZ2f0FAl2uVK3XnC0pQRhkQ/V5At0dBRYfEROLS7kAI35gZCgXEbUx p1rIesEA1+OgfhSD2abvHn44WxSFpao3xzdjoG2C0EWsAvQpHQiamMg0lMCUGINie1dB YKjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q7si1890982pgn.232.2017.10.31.09.16.43; Tue, 31 Oct 2017 09:16:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753650AbdJaQPs convert rfc822-to-8bit (ORCPT + 99 others); Tue, 31 Oct 2017 12:15:48 -0400 Received: from us-smtp-delivery-107.mimecast.com ([216.205.24.107]:50226 "EHLO us-smtp-delivery-107.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751524AbdJaQPq (ORCPT ); Tue, 31 Oct 2017 12:15:46 -0400 Received: from CPH-EX1.SDESIGNS.COM (195-215-56-170-static.dk.customer.tdc.net [195.215.56.170]) (Using TLS) by us-smtp-1.mimecast.com with ESMTP id us-mta-167-NohQOFnJMfSFqytuwy2j8Q-1; Tue, 31 Oct 2017 12:15:40 -0400 X-MC-Unique: NohQOFnJMfSFqytuwy2j8Q-1 Received: from [172.27.0.114] (172.27.0.114) by CPH-EX1.sdesigns.com (192.168.10.36) with Microsoft SMTP Server (TLS) id 14.3.294.0; Tue, 31 Oct 2017 17:15:36 +0100 From: Marc Gonzalez Subject: [RFC] Improving udelay/ndelay on platforms where that is possible To: Linus Torvalds CC: LKML , Linux ARM , Steven Rostedt , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , John Stultz , Douglas Anderson , Nicolas Pitre , Mark Rutland , Will Deacon , Jonathan Austin , Arnd Bergmann , Kevin Hilman , Russell King , Michael Turquette , Stephen Boyd , Mason Message-ID: Date: Tue, 31 Oct 2017 17:15:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [172.27.0.114] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello BDFL, I am writing to you directly because Russell has bluntly stated (paraphrased) "Send your patches to Linus; it's his kernel, and he has the final say." which is his diplomatic way of telling me to fsck off. Basically, I want to improve the accuracy of clock-based delays, in order to improve the boot-time delay in the NAND framework. I will send a patch, but I wanted to have a discussion about the design first. The key points under discussion below are a) guarantee of no under-delays b) availability of ndelay for arm32 Below is the long-winded rationale. 1) Delays and sleeps (clocks, timers, alarms, etc) Every operating system kernel (that I know of) offers some way for a kernel thread to wait for a specified amount of time (expressed in seconds, milliseconds, microseconds, or even nanoseconds). For relatively small amounts of time, such a primitive is typically implemented as a busy-wait spin-loop. Let us call it delay(). Such a primitive cannot guarantee that the calling thread will be delayed *exactly* the amount of time specified, because there are many sources of inaccuracy: * the timer's period may not divide the requested amount * sampling the timer value may have variable latency * the thread might be preempted for whatever reason * the timer itself may have varying frequencies * etc Therefore, users are accustomed to having delays be longer (within a reasonable margin). However, very few users would expect delays to be *shorter* than requested. As you have stated yourself, the vast majority of code in Linux is driver code. Typical driver writers (of which I am one) are not experts on Linux internals, and may sometimes need some guidance, either from another programmer or a maintainer (whose time is a rare resource), or from clear APIs (either well-documented, or just "natural" and "obvious" interfaces). A typical driver writer has some HW spec in front of them, which e.g. states: * poke register A * wait 1 microsecond for the dust to settle * poke register B which most programmers would translate to: poke(reg_A, val_A); udelay(1); poke(reg_B, val_B); Given a similar example, Russell has stated: > And if a data sheet says "needs a delay of 2us" and you put in the > driver udelay(2) then you're doing it wrong, and you need to read > Linus' mails on this subject, such as the one I've provided a link > to... that udelay() must be _at least_ udelay(3), if not 4. I see two key points in this reply. i) There seems to be an implicit agreement that it is BAD for the *actual* delay to be less than what the data sheet specifies, leading to... ii) It is the responsibility of the *driver writer* to figure out how much of a cushion they need, in order to guarantee a minimal delay. Obviously, I agree with the first point. The second point is troubling. It means driver writers are required to be aware of the quirkiness of Linux internals. And because drivers are supposed to work on all platforms, are we saying that driver writers should be aware of the quirkiness for ALL platforms? For example, assume that for short delays, such as 1 µs: * amd64 has 5% relative error, 10 ns absolute error * arm32 has 10% relative error, 1 µs absolute error * alpha has 3% relative error, 3 µs absolute error The driver writer would need to write udelay(4); ? In my opinion, it makes more sense to hide the complexity and quirkiness of udelay inside each platform's implementation. Which brings me to... 2) Different implementations of udelay and ndelay On arm32, it is possible to set up udelay() to be clock-based. In that case, udelay simply polls a constant-frequency tick-counter. For example, on my puny little platform, there is a 27 MHz crystal oscillator (37 ns period) which is wired to a tick counter mapped on the system bus. The latency to sample the register is within [10-200] ns. This implies two things: i) it is possible to guarantee a minimum delay (in quanta of 37 ns) ii) the sampling error is limited to ~250 ns [NB: some platforms are even "better", and move the tick counter inside the CPU block (e.g. ARM architected timer) to achieve a lower sampling error.] There is one minor trap when handling tick counter sampling: Assume we are waiting for one period to elapse. Consider the timeline below, where x marks the cycle when the tick counter register is incremented. -------x----------x----------x----------x----------x-----> time ^ ^ ^ A B C If the execution flow leads to sampling at times A and B, then B-A equals 1, yet only a tiny amount of time has elapsed. To guarantee that at least one period has elapsed, we must wait until the arithmetic difference is 2 (i.e. one more than required). In the example, until time C. In other words, if we need to delay for N cycles, the correct code should be: t0 = readl(tick_address); while (readl(tick_address) - t0 <= N) /* spin */ ; There is another source of "error" (in the sense that udelay might spin too little) caused by the conversion from µs to cycles -- which rounds down. Consider arch/arm/include/asm/delay.h loops = (loops_per_jiffy * delay_us * UDELAY_MULT) >> 31 where UDELAY_MULT = 2147 * HZ + 483648 * HZ / 1000000 Consider a platform with a 27 MHz clock, and HZ=300 The proper conversion is trivially loops = delay_us * 27 Thus, for a 1 microsecond delay, loops equals 27. But when using UDELAY_MULT = 644245 (rounded down from 644245,0944) loops equals 26. This "off-by-one" error is systematic over the entire range of allowed delay_us input (1 to 2000), so it is easy to fix, by adding 1 to the result. 3) Why does all this even matter? At boot, the NAND framework scans the NAND chips for bad blocks; this operation generates approximately 10^5 calls to ndelay(100); which cause a 100 ms delay, because ndelay is implemented as a call to the nearest udelay (rounded up). My current NAND chips are tiny (2 x 512 MB) but with larger chips, the number of calls to ndelay would climb to 10^6 and the delay increase to 1 second, with is starting to be a problem. One solution is to implement ndelay, but ndelay is more prone to under-delays, and thus a prerequisite is fixing under-delays. Regards. From 1586053227666775108@xxx Wed Dec 06 16:37:07 +0000 2017 X-GM-THRID: 1585191893243899017 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread