Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753534AbeAFBRa (ORCPT + 1 other); Fri, 5 Jan 2018 20:17:30 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:49692 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753389AbeAFBR2 (ORCPT ); Fri, 5 Jan 2018 20:17:28 -0500 Subject: Re: [PATCH 4.4 00/37] 4.4.110-stable review To: Hugh Dickins Cc: Greg Kroah-Hartman , Andy Lutomirski , Linus Torvalds , Thomas Voegtle , Linux Kernel Mailing List , Andrew Morton , Guenter Roeck , Shuah Khan , patches@kernelci.org, Ben Hutchings , lkft-triage@lists.linaro.org, stable References: <630fd5c7-61bb-6af7-897e-b3ac254730bb@oracle.com> <192D254D-57C6-49F0-809C-2391FCB4F341@amacapital.net> <0C00DC80-5F7E-4417-872D-66473A6387A0@amacapital.net> <20180105175229.GA29834@kroah.com> <20180105204557.GA8839@kroah.com> From: Pavel Tatashin Message-ID: <270e7e43-d3f9-0305-4764-7e23b2d515a2@oracle.com> Date: Fri, 5 Jan 2018 20:16:09 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8765 signatures=668651 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801060011 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Hugh, Thank you very much for your very thoughtful input. I quiet positive this problem is PTI regression, because exactly the same problem I see with kernel 4.1 to which I back-ported all the necessary PTI patches from 4.4.110. I will provide this thread with more information as I collect it. I will also try to root cause the problem. The bug has memory corruption behavior, but with both 4.1 and 4.4 kernels problem goes away when I boot with noefi parameter. So, EFI + PTI is the culprit for this memory corruption. Thank you, Pavel On 01/05/2018 06:15 PM, Hugh Dickins wrote: > On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin > wrote: >> The hardware works :) I meant that before the patch linked in >> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But >> with that patch applied, I was able to boot it at least once, but it could >> be accidental. The hang/panic does not happen at the same time on every >> boot. > > I get the feeling that it was accidental: it seems to me that you have > a memory corruption problem, that gets shifted around by the different > patches (or "noefi" or "nopti"). > > Because yesterday your boots were able to get way beyond the "EFI > Variables Facility" message, and I can't imagine why the EFI issue > would not have been equally debilitating on yesterday's 110-rc, if it > were in play. > > I did intend to ask you to send your System.map, for us to scan > through: maybe some variable is marked __init and should not be, then > the "Freeing unused kernel memory" frees it for random reuse. > > But today you didn't get anywhere near the "Freeing unused kernel > memory", so that can't be it - or do you sometimes get that far today? > > You mention that the hang/panic does not happen at the same time on > every boot: I think all I can ask is for you to keep supplying us with > different examples (console messages) of where it occurs, in the hope > that one of them will point us in the right direction. > > And it even seems possible that this has nothing to do with the > 4.4.110 changes - that 4.4.109 plus some other random patches would > unleash similar corruption. Though on balance that does seem unlikely. > > Hugh