Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3849735ybi; Mon, 29 Jul 2019 13:53:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqxpZ8s7+9utgw0LktpAyQ2PXPZN7cMt1vgnMLvFy9RjH0j1GTuzVFJkRCMu0y2ogI+LJkHm X-Received: by 2002:a17:902:110b:: with SMTP id d11mr116740657pla.213.1564433623766; Mon, 29 Jul 2019 13:53:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564433623; cv=none; d=google.com; s=arc-20160816; b=CPyneBeBtXQ+Pqhdj3g3FDyR9AAbT1CbBqosGMDXZtmPj8qhURcEGo2G85xPDc3hKm 3DVxYW+DBfV6VdAvSENYu1BgKhizFWARjetHEFUZnIuVsYs1xNlWDvwRpoSR1lASX/VS xtFUd9ntN+GVKpeWLpuqtm3k4o75pKRRWu2Y3e2Dg4GtNMpde1bUaILDZIvrpEBGROIl jxF2EklAsN8Vo3niEw51ArialFSQ6LWY1QMt+zr1G2lzd/+LB6+N1LZ/58HzbZXM521S zwHkIH07+XkRGbLOSPCr+Fp0Ur0xkkQNdfG0pIzrxe9bYEUDfzsCqg7v5c6EuAkpn/3Y xu7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=SoKqHgmiv4tAja7Vbhan3qjXLAkeJjmJfY6Jshn/jTM=; b=T1N57hJ58901hp+x3PgXoB/hWKSJBHalxD68+vCTmo+KkptJJpXV4YTo6HRDwdNtOQ ndozZJokmNvfw4wNM4fy7n+UVocVi28nQ2Mbz/EoGHT16E9Du2dsb4dRyDIGa+EwdpiF WJtishREhUnaLXKEbBpGQ3e2GS4V7yRx2UrfS1APzmISGIoxB/gFh7dz3lvWnP6qdqVB X76VcdKEiQzlIpP8iengCEhFZXT12iONWXDJPU9fmZlvTaxrJIVJi5xMaRODRPv+NIRh 0WOTeaszNCTY87BANsBLunTYKT5p2QGR79AlNyKdKyUUDr+gTPISSlG/2TZnCPf3csE6 88OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=m134jeAx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o4si26381476pgv.157.2019.07.29.13.53.28; Mon, 29 Jul 2019 13:53:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=m134jeAx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388284AbfG2UvC (ORCPT + 99 others); Mon, 29 Jul 2019 16:51:02 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:46993 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387869AbfG2UvB (ORCPT ); Mon, 29 Jul 2019 16:51:01 -0400 Received: by mail-pg1-f193.google.com with SMTP id k189so9798236pgk.13 for ; Mon, 29 Jul 2019 13:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=SoKqHgmiv4tAja7Vbhan3qjXLAkeJjmJfY6Jshn/jTM=; b=m134jeAxviNJzkVWe3qEuTeSrF7LFMPnpAZSch1GP8VEq9yXdMn/0Y8XR4vv+QpgDu 6+Q3m9vJLudQDxv1iqtiBxQ5dmEeLOThurVOqpsQayUmGpqPt+zOYD8N7XW3Q0soFGSU pcgUJ9wBJbIeExJjdwDMyilH5jBBY+DyEb1jDJkOQVL0/u5vdy3zkZ/g3HstfOlgh1B7 p4YFHAMU5zu7CIQpMulyUsGvOdE/GtwkAjFsCrwhF7jtckVzxQW6mBMmWRy5ZkW0/u9l S8+jT+Ngq8n9CuHXWbK3dZy9X9bJkXUmqqzjyUfCPx2Wfb4mYI44fXkj2PO24v8zN4Cv +Cfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=SoKqHgmiv4tAja7Vbhan3qjXLAkeJjmJfY6Jshn/jTM=; b=BTJ7iqlAvYuBm/spm+Kf3972SaAC03wMEIFzUga7kR7d4wBEMFBlOctJ4P51t3vLKC LaDgqP/He3MRQchJZq9WZlKPPZjOErdABC0EzMZ5WrHHTpPNtT3NvqCWYsRSfWSVyRyR vRGSG7Ocvtzijod/7pfSKfvqJBawEpVdfiz2V7x63RGU/o5ciVrcU6SOfu59swCyAZpr dXKhDzOz+IRGUKSiF0kloARI/ir5VCSK2YdT9+ndsQPGxh35HRak9zySrUOCAM1e3+Qf 0EOKWnjjFgWIBRc1ufycl9U01KpxIRPRt8CY3vA18qI1rlEMcv7oGVKjgxHchF8bkHVP RaAw== X-Gm-Message-State: APjAAAVo44YKCu8GYU3fqOWNe91U2ZMzh1C27JHhO29y3aIIAz14efvp xSis7eVCOrkQUc298QTpOm0= X-Received: by 2002:a63:724f:: with SMTP id c15mr108423404pgn.257.1564433461271; Mon, 29 Jul 2019 13:51:01 -0700 (PDT) Received: from localhost ([2600:1700:e321:62f0:329c:23ff:fee3:9d7c]) by smtp.gmail.com with ESMTPSA id v126sm8585234pgb.23.2019.07.29.13.50.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jul 2019 13:51:00 -0700 (PDT) Date: Mon, 29 Jul 2019 13:50:59 -0700 From: Guenter Roeck To: Peter Zijlstra Cc: Thomas Gleixner , x86@kernel.org, Ingo Molnar , linux-kernel@vger.kernel.org, Borislav Petkov Subject: Re: sched: Unexpected reschedule of offline CPU#2! Message-ID: <20190729205059.GA1127@roeck-us.net> References: <20190727164450.GA11726@roeck-us.net> <20190729093545.GV31381@hirez.programming.kicks-ass.net> <20190729101349.GX31381@hirez.programming.kicks-ass.net> <20190729104745.GA31398@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190729104745.GA31398@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 29, 2019 at 12:47:45PM +0200, Peter Zijlstra wrote: > On Mon, Jul 29, 2019 at 12:38:30PM +0200, Thomas Gleixner wrote: > > On Mon, 29 Jul 2019, Peter Zijlstra wrote: > > > On Mon, Jul 29, 2019 at 11:58:24AM +0200, Thomas Gleixner wrote: > > > > On Mon, 29 Jul 2019, Peter Zijlstra wrote: > > > > > On Sat, Jul 27, 2019 at 09:44:50AM -0700, Guenter Roeck wrote: > > > > > > [ 61.348866] Call Trace: > > > > > > [ 61.349392] kick_ilb+0x90/0xa0 > > > > > > [ 61.349629] trigger_load_balance+0xf0/0x5c0 > > > > > > [ 61.349859] ? check_preempt_wakeup+0x1b0/0x1b0 > > > > > > [ 61.350057] scheduler_tick+0xa7/0xd0 > > > > > > > > > > kick_ilb() iterates nohz.idle_cpus_mask to find itself an idle_cpu(). > > > > > > > > > > idle_cpus_mask() is set from nohz_balance_enter_idle() and cleared from > > > > > nohz_balance_exit_idle(). nohz_balance_enter_idle() is called from > > > > > __tick_nohz_idle_stop_tick() when entering nohz idle, this includes the > > > > > cpu_is_offline() clause of the idle loop. > > > > > > > > > > However, when offline, cpu_active() should also be false, and this > > > > > function should no-op. > > > > > > > > Ha. That reboot mess is not clearing cpu active as it's not going through > > > > the regular cpu hotplug path. It's using reboot IPI which 'stops' the cpus > > > > dead in their tracks after clearing cpu online.... > > > > > > $string-of-cock-compliant-curses > > > > > > What a trainwreck... > > > > > > So if it doesn't play by the normal rules; how does it expect to work? > > > > > > So what do we do? 'Fix' reboot or extend the rules? > > > > Reboot has two modes: > > > > - Regular reboot initiated from user space > > > > - Panic reboot > > > > For the regular reboot we can make it go through proper hotplug, > > That seems sensible. > > > for the panic case not so much. > > It's panic, shit has already hit fan, one or two more pieces shouldn't > something anybody cares about. > Some more digging shows that this happens a lot with Google GCE intances, typically after a panic. The problem with that, if I understand correctly, is that it may prevent coredumps from being written. So, while of course the panic is what needs to be fixed, it is still quite annoying, and it would help if this can be fixed for panic handling as well. How about the patch suggested by Hillf Danton ? Would that help for the panic case ? Thanks, Guenter