Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753935AbcDJAUw (ORCPT ); Sat, 9 Apr 2016 20:20:52 -0400 Received: from www.linutronix.de ([62.245.132.108]:55038 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751916AbcDJAUu (ORCPT ); Sat, 9 Apr 2016 20:20:50 -0400 Date: Sat, 9 Apr 2016 17:19:00 -0700 (PDT) From: Thomas Gleixner To: Waiman Long cc: Ingo Molnar , "H. Peter Anvin" , Jonathan Corbet , LKML , linux-doc@kernel.org, x86@kernel.org, Borislav Petkov , Andy Lutomirski , Scott J Norton , Douglas Hatch , Randy Wright Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention In-Reply-To: <1460146305-14666-1-git-send-email-Waiman.Long@hpe.com> Message-ID: References: <1460146305-14666-1-git-send-email-Waiman.Long@hpe.com> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3476 Lines: 114 On Fri, 8 Apr 2016, Waiman Long wrote: > This patch attempts to reduce HPET read contention by using the fact > that if more than one task are trying to access HPET at the same time, > it will be more efficient if one task in the group reads the HPET > counter and shares it with the rest of the group instead of each > group member reads the HPET counter individually. That has nothing to do with tasks. clocksource reads can happen from almost any context. The problem is concurrent access on multiple cpus. > This optimization is enabled on systems with more than 32 CPUs. It can > also be explicitly enabled or disabled by using the new opt_read_hpet > kernel parameter. Please not. What's wrong with enabling it unconditionally? > +/* > * Clock source related code > */ > static cycle_t read_hpet(struct clocksource *cs) > { > - return (cycle_t)hpet_readl(HPET_COUNTER); > + int seq, cnt = 0; > + u32 time; > + > + if (opt_read_hpet <= 0) > + return (cycle_t)hpet_readl(HPET_COUNTER); This wants to be conditional on CONFIG_SMP. No point in having all that muck around for an UP kernel. > + seq = READ_ONCE(hpet_save.seq); > + if (!HPET_SEQ_LOCKED(seq)) { > + int old, new = seq + 1; > + unsigned long flags; > + > + local_irq_save(flags); > + /* > + * Set the lock bit (lsb) to get the right to read HPET > + * counter directly. If successful, read the counter, save > + * its value, and increment the sequence number. Otherwise, > + * increment the sequnce number to the expected locked value > + * for comparison later on. > + */ > + old = cmpxchg(&hpet_save.seq, seq, new); > + if (old == seq) { > + time = hpet_readl(HPET_COUNTER); > + WRITE_ONCE(hpet_save.hpet, time); > + > + /* Unlock */ > + smp_store_release(&hpet_save.seq, new + 1); > + local_irq_restore(flags); > + return (cycle_t)time; > + } > + local_irq_restore(flags); > + seq = new; > + } > + > + /* > + * Wait until the locked sequence number changes which indicates > + * that the saved HPET value is up-to-date. > + */ > + while (READ_ONCE(hpet_save.seq) == seq) { > + /* > + * Since reading the HPET is much slower than a single > + * cpu_relax() instruction, we use two here in an attempt > + * to reduce the amount of cacheline contention in the > + * hpet_save.seq cacheline. > + */ > + cpu_relax(); > + cpu_relax(); > + > + if (likely(++cnt <= HPET_RESET_THRESHOLD)) > + continue; > + > + /* > + * In the unlikely event that it takes too long for the lock > + * holder to read the HPET, we do it ourselves and try to > + * reset the lock. This will also break a deadlock if it > + * happens, for example, when the process context lock holder > + * gets killed in the middle of reading the HPET counter. > + */ > + time = hpet_readl(HPET_COUNTER); > + WRITE_ONCE(hpet_save.hpet, time); > + if (READ_ONCE(hpet_save.seq) == seq) { > + if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq) > + pr_warn("read_hpet: reset hpet seq to 0x%x\n", > + seq + 1); This is voodoo programming and actively dangerous. CPU0 CPU1 CPU2 lock_hpet() T1=read_hpet() wait_for_unlock() store_hpet(T1) .... T2 = read_hpet() unlock_hpet() lock_hpet() T3 = read_hpet() store_hpet(T3) unlock_hpet() return T3 lock_hpet() T4 = read_hpet() wait_for_unlock() store_hpet(T4) store_hpet(T2) unlock_hpet() return T2 CPU2 will observe time going backwards. Thanks, tglx