Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp796617imm; Wed, 11 Jul 2018 11:06:06 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdk9e2WeEtpsc6BmLP50u7YLw47sE9MZllKlGnwfsu2lx+5mzS3yIGiSt0qv2ul/l1a5AlJ X-Received: by 2002:a65:64d7:: with SMTP id t23-v6mr22093811pgv.207.1531332366728; Wed, 11 Jul 2018 11:06:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531332366; cv=none; d=google.com; s=arc-20160816; b=wYqeo1gax59bPJ3YB0EBrY/5TlCXYra4Jg7USdxY0ehUFC7lsBTlpQsYYi/acoF1Lq /zCUGgGE3XFKM/YpMSmQuZspiewev5r+wbPa20rCA7mgn0cBimK8TexTbe2143Rj6U0H FobW9A5+YoW4jpuV45fmRnV8p2I9F/lmrs3f5vYXtjUHhNH9K5Fnmm8W+prcDmi/dQtP ZyOmqX8h4A56BShikURfWeabnhcpz5vwsLcQVdA9p9pWRDd16SmmMjbRn688pqzzbcV8 8q7mk4qIAB95GdgSDYi30rP1ZdfnP33oBrWypNlg2Ozl7N/Q8xklKSRqgMQohF7+osfo 9KcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=cFaIxKiRkLdZoOsCRxHiV+OQRxSfXPA7u7YEMeQ0qkY=; b=LP3fxfmhXRbGdFcKm1wKUZ5WfRh6qDMLQVZ93AypHkmLI902l9QB314sZ1hBREyKrt 6qIvTIoMSZdiLZE5I2f95IwjKAQUhhGryHJgHo4feyqD2Ws13kL/dRgReNcfrSgBqXrk 7Mwd6DMO3NL932RmTL2kdojN5QU89+oRVnIpM0J0nVUm6oHQN246dLwbYAZrcR5XzvO2 3v3bFBZRg0NTdnU/wIevqU4WMmhFtzKYTAOikQXIaCUU/qp/RSdmOEM6X9/OZMNu7+fN CpJc8iBw1c9NHqJTXhxPAgP9lDNkTMhhFmS9AbOQXVoFcORwvXvSimyDzXnYygiEiJwl t0Qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Cdez+S81; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a1-v6si9761923pgg.326.2018.07.11.11.05.51; Wed, 11 Jul 2018 11:06:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Cdez+S81; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733005AbeGKPpw (ORCPT + 99 others); Wed, 11 Jul 2018 11:45:52 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:46814 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732452AbeGKPpw (ORCPT ); Wed, 11 Jul 2018 11:45:52 -0400 Received: by mail-oi0-f68.google.com with SMTP id y207-v6so50052479oie.13 for ; Wed, 11 Jul 2018 08:40:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=cFaIxKiRkLdZoOsCRxHiV+OQRxSfXPA7u7YEMeQ0qkY=; b=Cdez+S81Fd7Yw7bXeglzWnBoDb9GA4lj8ovEzMkLaRBdpOGp2fCxdaFN5GXcYnqTiK tSyKI6XXtOpYCBWEWyczOcAnd5L77C9ZV8kbTejjr9H0q/2lMwsZuXMdgsPOK26qC1YG RWHDMfRsTxqKwim2pKWqRxWKa8IFT6BwVDO7jZyk4e1/UNuHrXY0keZN+yjtE0KjPCY8 rrOj+jKmfygNLAVBczpyq051NZALJRqHnlDVsSqQqJn6aM1TC2oa+0fo9HA9upBiHIp1 bvckImaeaH1tTwFSQHtiElIjme4PPEjtauvIOvyQjAgoRCoOIXouUy+ml8Dd3c8p0tg4 jwrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=cFaIxKiRkLdZoOsCRxHiV+OQRxSfXPA7u7YEMeQ0qkY=; b=UVf/cBswaGa3eDn834yfRLqOHPz+rPI9x98M4GojSveP/BxmFwAwnOs+RmRwhRV6xS jKQhTgYUnG6mmUyob20KtvRKFZ/DEeHDABpLjOqa7zNjF+jNTMwDdkD/RSen5e/TNghF 4D58DOT8LFjtETEdodMYn6kV9nGzzirunkDzzv2qt1WyorC5pj4JBnhqaBTux41S1oNP EPfIvyMbb7dax0Auwg2KeryO1NUd7WnbPY82zw72rnFv60JuLdQoEcKb4DuyQyX3SX4l ssAEMTyd8rIzkytQJD++f4cLTPNuyHgTpeWGFUEym3taw6ycof381hDbDDHC4Egf9nFG gBkw== X-Gm-Message-State: APt69E1yUkPfUDdFPvWw1oRy+L+3Dm+aHwjQqioYWbEf1EJ5dD28/A9D j5lf3fPbfAroq/538AUL+WD3JRMXXxSJj0rqQO8= X-Received: by 2002:aca:e6d4:: with SMTP id d203-v6mr31219305oih.311.1531323657363; Wed, 11 Jul 2018 08:40:57 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4a:c984:0:0:0:0:0 with HTTP; Wed, 11 Jul 2018 08:40:56 -0700 (PDT) In-Reply-To: References: <50d6bb50-5fa4-33d1-1f88-3844d0237f16@intel.com> <7db385ef-0940-8f28-87b0-828921dd2f1d@intel.com> <9548e10a-7403-425e-bf1f-b1eb9d055d99@intel.com> From: "H.J. Lu" Date: Wed, 11 Jul 2018 08:40:56 -0700 Message-ID: Subject: Re: Kernel 4.17.4 lockup To: Dave Hansen Cc: "H. Peter Anvin" , LKML , Andy Lutomirski , Mel Gorman , Andrew Morton , Rik van Riel , Minchan Kim Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 11, 2018 at 8:13 AM, Dave Hansen wrote: > On 07/11/2018 07:56 AM, H.J. Lu wrote: >> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen wrote: >>> On 07/09/2018 07:14 PM, H.J. Lu wrote: >>>>> I'd really want to see this reproduced without KASLR to make the oops >>>>> easier to read. It would also be handy to try your workload with all >>>>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc... >>>>> and see if it still triggers. >>>> How can I turn them on at boot time? >>> The only thing you can add at boot time is slab debugging, and it's >>> probably the most useless of the three that I listed since you're not >>> actually seeing any slab corruption. >>> >>> The rest are compile-time options. >> I enabled KASAN, slab debugging, DEBUG_PAGE_ALLOC and disabled >> KASLR. Machine locked up. Here is the last kernel message before locking >> up. > > KASAN looks to have caught it, although it scrolled off the screen. I > can certainly imagine the oops you saw earlier being caused by stack > corruption. > > Sounds like we need to reproduce this in an environment that can > actually capture a real oops. Can you share more about your workload? > I'll see if I can get it to reproduce in a VM. This is a quad-core machine with HT and 6 GB RAM. The workload is x32 GCC build and test with "make -j8". The bug is triggered during GCC test after a couple hours. I have a script to set up my workload: https://github.com/hjl-tools/gcc-regression with a cron job # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests, TIMEOUT=480 # Run it every hour, 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32 --with-pic > /dev/null 2>&1 -- H.J.