Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2096013imm; Wed, 16 May 2018 07:50:45 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpYzvzDOMY9dXApK1+tx0kZk3A7fRh9JGeJ7V3F+ktat4xP0wxCqWQqKphvpldPSTvpcfgt X-Received: by 2002:a62:a315:: with SMTP id s21-v6mr1295750pfe.168.1526482245263; Wed, 16 May 2018 07:50:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526482245; cv=none; d=google.com; s=arc-20160816; b=Jz9JYm1yyYIH4+82a7IP42xjuVnM6E1QWplewUGhhv4Yj0YfEW9NtKQKG/73yg/LkR hyGfO6bVfQyxSpv/mjWNB2GkimbFzZQ55WJiPooxt0A1yPjDkfKaWAQqfvl2A8AQoIXO oxVxMQBYKOxZ0/SpWn7t3n6dZrUa1LUB2xZkZBS0rStUx1xsWMWqif6xCVfa3Vbu3loF 4St7R0/UrR9lI5/JcoCjjCU6zUzwpHfv9+GBg/T9Z2m2OGd7Dwch+kGQiontoVH80WPn +Tr+z35qSowVhOK1x4eZTa3F9ClhH1tJbK3jw3R6QE537UplqSW/AFxwfcI1CWeL0bVQ MYBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=k8Up0YQ7CYeynDXpnf8fSHp8gEZeVw1Onok2bOXlbiE=; b=LP1QMZ/jCFuTqC4eaZmKJoNDo1oZXYeYOBL++/3lzSqlT65E+7HJ5iUtEBvlCX+x7z sepdHuSP/pVl7porMtSKYbkaLH1Rdb3Ef/aZ1eb90xkvWZ6H5vT7wvoTZeGe4zYuPzk3 Y2TiqU0DLq4KK6AQvK/6F3u/ha/5H6yZ/5cq31Bsh0jxR1QSXHeja+T9X8cjFuh9VgzU /KSe42WzpLXt0JvgR9INeGfH26MF/IaOwG7nEVhqMrb3GsBiTul5Xy+kIYUaQFsioR/3 BOeMMV/veKAxrD77YGANpqB6Kht1zsXFfEbzagu8yLcMQOOIYt79E2VS2AVrnK5HwH8w E26A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23-v6si2177926pgn.326.2018.05.16.07.50.31; Wed, 16 May 2018 07:50:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752228AbeEPOuK (ORCPT + 99 others); Wed, 16 May 2018 10:50:10 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:40677 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752205AbeEPOuJ (ORCPT ); Wed, 16 May 2018 10:50:09 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fIxkl-0007Kh-Gu; Wed, 16 May 2018 16:50:07 +0200 Date: Wed, 16 May 2018 16:50:07 +0200 (CEST) From: Thomas Gleixner To: Rick Warner cc: Linux Kernel Mailing List Subject: Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC In-Reply-To: Message-ID: References: <831e8a53-05d1-edfb-6287-fecfba22b8bd@microway.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rick, On Tue, 15 May 2018, Thomas Gleixner wrote: > I can't spot an immediate fail with that commit, but I'll have a look > tomorrow for instrumenting this with tracepoints which can be dumped from > the stall detector. can you please give the patch below a try? I assume you have a serial console, otherwise this is going to be tedious. Please add the following to the kernel command line: ftrace_dump_on_oops Note, the box will panic after the first stall and spill out the trace buffer over serial, which might take a while. Thanks, tglx 8<-------------------- --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -52,20 +52,28 @@ static void if (apic_dest != APIC_DEST_ALLINC) cpumask_clear_cpu(smp_processor_id(), tmpmsk); + trace_printk("To: %*pbl\n", cpumask_pr_args(tmpmsk)); + /* Collapse cpus in a cluster so a single IPI per cluster is sent */ for_each_cpu(cpu, tmpmsk) { struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu); dest = 0; + trace_printk("CPU: %u cluster: %*pbl\n", cpu, + cpumask_pr_args(&cmsk->mask)); for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask) dest |= per_cpu(x86_cpu_to_logical_apicid, clustercpu); - if (!dest) + if (!dest) { + trace_printk("dest = 0!?\n"); continue; + } __x2apic_send_IPI_dest(dest, vector, apic->dest_logical); /* Remove cluster CPUs from tmpmask */ cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask); + trace_printk("dest %08x --> tmpmsk %*pbl\n", dest, + cpumask_pr_args(tmpmsk)); } local_irq_restore(flags); --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -124,7 +124,7 @@ int rcu_num_lvls __read_mostly = RCU_NUM int num_rcu_lvl[] = NUM_RCU_LVL_INIT; int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ /* panic() on RCU Stall sysctl. */ -int sysctl_panic_on_rcu_stall __read_mostly; +int sysctl_panic_on_rcu_stall __read_mostly = 1; /* * The rcu_scheduler_active variable is initialized to the value --- a/kernel/smp.c +++ b/kernel/smp.c @@ -681,6 +681,7 @@ void on_each_cpu_cond(bool (*cond_func)( for_each_online_cpu(cpu) if (cond_func(cpu, info)) cpumask_set_cpu(cpu, cpus); + trace_printk("%*pbl\n", cpumask_pr_args(cpus)); on_each_cpu_mask(cpus, func, info, wait); preempt_enable(); free_cpumask_var(cpus);