Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp695170imm; Fri, 22 Jun 2018 03:50:57 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIXe3eQtNjV/uvm1ovi/1FzDIF2acSm2ay16GPGBCe/pwhi4HF6AtNR4gpKu+HCgcn7UiyS X-Received: by 2002:a17:902:a518:: with SMTP id s24-v6mr1170418plq.144.1529664657239; Fri, 22 Jun 2018 03:50:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529664657; cv=none; d=google.com; s=arc-20160816; b=IVVngjZIAMYX6qqKFzfsmhNq1dyqyCxYG5de5LfUp/bJxbHAZ6i3ibuNZ4DxsCGqMH 4x75/5kLUPireLVUUVAlTcl802DZrFU9MhcOgM5HcmzSf2T5tJFyATa1Zg2lvC0Yjqa+ j3G0QbYR5WGqwLOeEj0psHEcq28+j46AGMMA1tepH2kMUDZL8OaDxYk0y7MNIswrv8vm /wE9KMcPytuK/OLLxFWj8i3y1BpDtzOGHgj7KHwDRf1AZjqmPdJOwLBv0beaU5Z4Ay7N DF9gFuUs4bty6lHcP/I4/XlAD7KSAsjtJ9A17a1AVPtRqaI3Uk8fDpNORP94ROozgM8O fHmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=dVr0f3il6UbcvcLJM0oG5Jr5JD/6h2psjz5HAeKLICY=; b=Na2TWw/oWQBeqqOfdC9WeNvSxhae18Sx7L0Ivo9WctsQzpoeyRwYu/U7AnRS5LhL93 huWZeRMcng4+heHspVktt+lG4OKK1d/WGcRF+6vUS/KeX1KmlGXJhwMLhiXfC1YpQBUg CeeEwK2v18w2f9NWnb55prY2kpaVRddPTh4JGWor3wAr3fJeF6tDuXGGXMSIDeLwPnCh 0e5zq67MW+VyV8W9y/o8slgbNhKpFasNQC2k10Zd4rPxe46tt7augyi9aleLCgm2QOme dYFZw90cQYsL2NPB7pco64hqPo8IJcCNu09YosSNYP+J3X4pxNbVtC2A8N2h+w6I7O6c yFFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s22-v6si6028612pgs.396.2018.06.22.03.50.42; Fri, 22 Jun 2018 03:50:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932892AbeFVKtO (ORCPT + 99 others); Fri, 22 Jun 2018 06:49:14 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:41132 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751903AbeFVKtM (ORCPT ); Fri, 22 Jun 2018 06:49:12 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fWJcj-0003ia-5S; Fri, 22 Jun 2018 12:49:01 +0200 Date: Fri, 22 Jun 2018 12:49:00 +0200 (CEST) From: Thomas Gleixner To: Fenghua Yu cc: Ingo Molnar , "H. Peter Anvin" , Ashok Raj , Dave Hansen , Rafael Wysocki , Tony Luck , Alan Cox , Ravi V Shankar , Arjan van de Ven , linux-kernel , x86 Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode In-Reply-To: <1527435965-202085-3-git-send-email-fenghua.yu@intel.com> Message-ID: References: <1527435965-202085-1-git-send-email-fenghua.yu@intel.com> <1527435965-202085-3-git-send-email-fenghua.yu@intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 27 May 2018, Fenghua Yu wrote: > +static void wait_for_reexecution(void) > +{ > + while (time_before(jiffies, disable_split_lock_jiffies + > + reenable_split_lock_delay)) > + cpu_relax(); > +} > + > +/* > + * TEST_CTL MSR is shared among threads on the same core. To simplify > + * situation, disable_split_lock_jiffies is global instead of per core. Simplify? > + * Multiple threads may generate #AC for split lock at the same time. > + * disable_split_lock_jiffies is updated by those threads. This may > + * postpone re-enabling split lock on this thread. But that's OK > + * because we need to make sure all threads on the same core re-execute > + * their faulting instructions before re-enabling split lock on the core. This does not ensure anything. It's a bandaid. If the task gets scheduled out _before_ reexecuting then the timer will reenable AC and then when scheduled back in it will trip it again. > + * We want to avoid the situation when split lock is disabled on one > + * thread (thus on the whole core), then split lock is re-enabled on > + * another thread (thus on the whole core), and the faulting instruction > + * generates another #AC on the first thread. > + * > + * Before re-enabling split lock, wait until there is no re-executed > + * split lock instruction which may only exist before > + * disable_split_lock_jiffies + reenable_split_lock_delay. > + */ > +static void delayed_reenable_split_lock(struct work_struct *w) > +{ > + mutex_lock(&reexecute_split_lock_mutex); > + wait_for_reexecution(); And because the delayed work got scheduled after a tick and the other thread just managed to make the timeout another tick longer, this thing busy loops for a full tick with the mutex held. Groan. > + _setup_split_lock(ENABLE_SPLIT_LOCK_AC); > + mutex_unlock(&reexecute_split_lock_mutex); > +} > + > +/* Will the faulting instruction be re-executed? */ > +static bool re_execute(struct pt_regs *regs) > +{ > + /* > + * The only reason for generating #AC from kernel is because of > + * split lock. The kernel faulting instruction will be re-executed. Will be reexecuted? By what? > + */ > + if (!user_mode(regs)) > + return true; > + > + return false; > +} The extra value of this wrapper around !user_mode(regs) is that it is more obfuscated than just having a sensible comment and a simple 'if (user_mode(regs))' at the call site. Or is there anything I'm missing here? > +static void disable_split_lock(void *unused) > +{ > + _setup_split_lock(DISABLE_SPLIT_LOCK_AC); > +} > + > +/* > + * #AC handler for split lock is called by generic #AC handler. > + * > + * Disable #AC for split lock on the CPU that the current task runs on > + * in order for the faulting instruction to get executed. The #AC for split > + * lock is re-enabled later. > + */ > +bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code) > +{ > + static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5); > + char str[] = "alignment check for split lock"; > + struct task_struct *tsk = current; > + int cpu = task_cpu(tsk); What the heck is wrong with smp_processor_id()? task_cpu(current) is the CPU on which current is running and that's nothing else than smp_processor_id(). But sure, it would be too obvious and not make a reviewer scratch his head while trying to figure out what the magic cpu selection means. > + > + if (!re_execute(regs)) > + return false; So user space just returns here. The explanatory comment for that is where? It's not even in this hideous wrapper which inverts the condition. > + > + /* Pace logging with jiffies. */ > + if (__ratelimit(&ratelimit)) { > + pr_info("%s[%d] %s ip:%lx sp:%lx error:%lx", > + tsk->comm, tsk->pid, str, > + regs->ip, regs->sp, error_code); > + print_vma_addr(KERN_CONT " in ", regs->ip); Why the heck would print_vma_addr() of a kernel IP provide any useful information? Kernel IP is in the kernel not in some random VMA. What's wrong with a simple WARN() which is prominent and immediately pointing to the right place? > + > + mutex_lock(&reexecute_split_lock_mutex); How is that supposed to work when the exception happens in a preempt or interrupt disabled region? And what guarantees that there is no #AC inside a reexecute_split_lock_mutex held region? It's just forbidden to have one there, right? If there is one, then you still have the reset button. > + smp_call_function_single(cpu, disable_split_lock, NULL, 1); When #AC happens in a interrupt disabled region then this will trigger the irq disabled warning in smp_call_function_single(). Cannot happen because it's forbidden to trip #AC in an interrupt disabled region, right? Aside of that the value of using smp_call_function_single() instead of invoking the function directly is? It makes sure that it's executed on: task_cpu(current) which is obviously the same as smp_processor_id(), i.e. the current CPU. Makes a lot of sense especially as it has the added value that it makes sure by setting 'wait = 1' that the function actually was called. > + /* > + * Mark the time when split lock is disabled for re-executing the > + * faulting instruction. > + */ > + disable_split_lock_jiffies = jiffies; > + mutex_unlock(&reexecute_split_lock_mutex); > + > + /* The faulting instruction will be re-executed when > + * split lock is re-enabled 1 HZ later. Ah, the reenablement of that stuff will re-execute the instruction. Now it becomes more clear - NOT! These re-execution comments are just misleading and wrong. Aside of that Linus will love this comment style: https://lkml.kernel.org/r/CA+55aFyQYJerovMsSoSKS7PessZBr4vNp-3QUUwhqk4A4_jcbg@mail.gmail.com Why don't you use it everywhere? > + */ > + schedule_delayed_work_on(cpu, &per_cpu(reenable_delayed_work, cpu), > + reenable_split_lock_delay); That does work really well during early boot because #AC can only happen after scheduler and workqueues are initialized and after late init calls have been done which initialized the per cpu delayed work. Heck no. #AC wants to be on right from the beginning. This patch surely earns extra points in the trainwreck engineering contest, but that's not taking place on LKML. The whole thing is simply: handle_ac() { if (user_mode(regs)) { do_trap(AC, SIGBUS, ...); } else { disable_ac_on_local_cpu(); WARN_ONCE(1); } } That wants #AC enabled as early as possible so the kernel gets as much coverage as it can. If it trips in the kernel it's a bug and needs to be fixed and we can them fix ONE by ONE. There is absolutely no requirement for all this so called "re-execution" hackery. If there would be a true benefit of keeping #AC on after it tripped in the kernel then you'd need to do disable_ac() emulate_faulting_insn() enable_ac() Obviously including refcounted synchronization of the disable/enable pair with the CPUs which share the AC mechanics. Total overkill. Thanks, tglx