Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751615AbeAPSeS convert rfc822-to-8bit (ORCPT + 1 other); Tue, 16 Jan 2018 13:34:18 -0500 Received: from mga02.intel.com ([134.134.136.20]:39578 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751110AbeAPSeQ (ORCPT ); Tue, 16 Jan 2018 13:34:16 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="19885417" From: "Yu, Fenghua" To: Thomas Gleixner , Joseph Salisbury CC: "Shankar, Ravi V" , "vikas.shivappa@linux.intel.com" , "stable@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Luck, Tony" , "peterz@infradead.org" , "eranian@google.com" , "ak@linux.intel.com" , "davidcc@google.com" , "mingo@redhat.com" , "hpa@zytor.com" , "x86@kernel.org" , "1733662@bugs.launchpad.net" <1733662@bugs.launchpad.net>, "Roderick W. Smith" Subject: RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing Thread-Topic: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing Thread-Index: AQHTi8G2UhAXFGkxrkq00A20clVZZKNzxjqAgAM+8oCAAAZsAIAANGwAgAAY5oD//38c8A== Date: Tue, 16 Jan 2018 18:34:10 +0000 Message-ID: <3E5A0FA7E9CA944F9D5414FEC6C7122075908855@FMSMSX153.amr.corp.intel.com> References: <84b8d891-6217-f56d-8ec0-313f7eb317c9@canonical.com> <159B72D0-06FE-4925-A11A-1F8A7741BF70@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzRjMTliZWItMjY1My00Y2QzLTg3ZGUtZmNmMTFiNDMwNWFjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjIuNS4xOCIsIlRydXN0ZWRMYWJlbEhhc2giOiI1RkU5RDd2MEVVSlRlSktRTHFnVFU2Y2JEXC9aZnhKcGdPUDhTWno3NEF0ek5aWTZJTjhsSXlmajd0Y1k1M3FLWiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.1.200.106] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: > From: Thomas Gleixner [mailto:tglx@linutronix.de] > On Tue, 16 Jan 2018, Joseph Salisbury wrote: > > On 01/16/2018 08:32 AM, Shankar, Ravi V wrote: > > > Vikas on vacation until end of the month. Fenghua will look into > > > this issue. > > > > > > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner > > > wrote: > > > > > >> > > >> Vikas, Fenghua can you please look at that ASAP? > > >> > > >> On Sun, 14 Jan 2018, Thomas Gleixner wrote: > > >> > > >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote: > > >>> > > >>>> Hi Vikas, > > >>>> > > >>>> A kernel bug report was opened against Ubuntu [0].? After a > > >>>> kernel bisect, it was found that reverting the following commit > > >>>> resolved this bug: > > >>>> > > >>>> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07 > > >>>> Author: Vikas Shivappa > >>>> > > > >>>> Date:?? Tue Aug 15 18:00:43 2017 -0700 > > >>>> > > >>>> ??? x86/intel_rdt/cqm: Improve limbo list processing > > >>>> > > >>>> > > >>>> The regression was introduced as of v4.14-r1 and still exists > > >>>> with current mainline.? The trace with v4.15-rc7 is in comment #44[1]. > > >>>> > > >>>> I was hoping to get your feedback, since you are the patch > > >>>> author.? Do you think gathering any additional data will help > > >>>> diagnose this issue, or would it be best to submit a revert request? > > >>> > > >>> That stinks like a use after free. Can you run with KASAN enabled? > > >>> > > >>> Thanks, > > >>> > > >>> ? ?tglx > > > > > > Here is some data wiht KASAN enabled: > > https://bugs.launchpad.net/ubuntu/+source/linux- > hwe/+bug/1733662/comme > > nts/51 > > > > Are there any specific logs you would like to see, or specific actions > > executed? > > No, the KASAN output is pretty clear where the issue is. > > Thanks, > > tglx Is this a Haswell specific issue? I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted: for ((;;)) do for ((i=1;i<88;i++)) do echo 0 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo |wc for ((i=1;i<88;i++)) do echo 1 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo|wc done I'm finding a Haswell to reproduce the issue. Thanks. -Fenghua