Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp25747ybm; Mon, 20 May 2019 11:11:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqxdGsxFZjKRnl+28s+ZVs4ZLhOSIZQgOIW2nXyf5iwC7l/HG9Ng0Lw00olsnoHTom2CRiGv X-Received: by 2002:aa7:9104:: with SMTP id 4mr51577646pfh.66.1558375904470; Mon, 20 May 2019 11:11:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558375904; cv=none; d=google.com; s=arc-20160816; b=E3rddaFx233QOTyQvT+nO6XVLUYurw6vT1pWyC7vjGMZPWJP5z4sjmra88axCxbt9q zXRyBCSduqSZ3vkqtsYLWTcMeXWIN96SfZJ3/QxEQgjJdKfQRl6jxS1XHWgOCVSuTtg4 GjoPneAqzCiF4921+3PRvBmJsgwWNSgjyoNptfp/X5yKvjS9qHK06Ry3m+9puy4vQy/E XkbEr2tQU3ZUU8a7/NBuIZIC4fZGBD/+oQ3GQHt/lxrIC8SInO2eb1LtrwDtZewOp1D4 WUtl/WcJVGEl4M8/goBmPvGte7BdKn9CQ9H5ZfrGoxagncDWaZBHN7TXRoD7kYN4N9Zm hjvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=FLHvmcW8iJMJxxoJVS8Jf8tZWtnQ7qbQtwh6HhNwDXw=; b=CKQ51QVEsWgeGtUuDlOA3FXe7OPQaY/IgkOewjVRfwP6IhtFFiYgiPcs9L+XfQCYLU Y/CYVB/gzcHYH/tNC4HBsAekcyth/8QclPPkc9YtCvq2BFhx6YSdTfivdoShXFc3gPq+ kflPNT4rlM4i+XicatdVDuAI3rRyyjf28lsZqUOdxNHqQ98qk1YltWJASWAowcfwOytY ZnqwN1ZnnigPF5SbgL/pVqth2Dmn+Jjr/ACtNL6fpasW/dY8TtZyEP+i2+kReGT8lCoM RDaQ3T+aJO2pt7kn2X/R2Q7Zrzc9hkc0sVzyRe2I19I6QLaT2eSs3irwZVAHGUdyHnye GKrw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w10si17433091pgs.50.2019.05.20.11.11.29; Mon, 20 May 2019 11:11:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388386AbfETO3d (ORCPT + 99 others); Mon, 20 May 2019 10:29:33 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45982 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732641AbfETO3c (ORCPT ); Mon, 20 May 2019 10:29:32 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4KES7W7046227 for ; Mon, 20 May 2019 10:29:31 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2skwdbsuk4-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 May 2019 10:29:31 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 20 May 2019 15:29:29 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 20 May 2019 15:29:27 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x4KETQk911993118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 May 2019 14:29:26 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A680BA4053; Mon, 20 May 2019 14:29:26 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 31FE9A405F; Mon, 20 May 2019 14:29:25 +0000 (GMT) Received: from in.ibm.com (unknown [9.199.42.100]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 20 May 2019 14:29:25 +0000 (GMT) Date: Mon, 20 May 2019 19:59:22 +0530 From: Bharata B Rao To: Nicholas Piggin Cc: aneesh.kumar@linux.ibm.com, bharata@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , srikanth Subject: Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest Reply-To: bharata@linux.ibm.com References: <16a7a635-c592-27e2-75b4-d02071833278@linux.vnet.ibm.com> <20190518141434.GA22939@in.ibm.com> <878sv1993k.fsf@concordia.ellerman.id.au> <20190520042533.GB22939@in.ibm.com> <1558327521.633yjtl8ki.astroid@bobo.none> <20190520055622.GC22939@in.ibm.com> <1558335484.9inx69a7ea.astroid@bobo.none> <20190520082035.GD22939@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190520082035.GD22939@in.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 x-cbid: 19052014-0020-0000-0000-0000033E9E0D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19052014-0021-0000-0000-000021917620 Message-Id: <20190520142922.GE22939@in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-05-20_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905200096 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote: > On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote: > > Bharata B Rao's on May 20, 2019 3:56 pm: > > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: > > >> >> > git bisect points to > > >> >> > > > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d > > >> >> > Author: Nicholas Piggin > > >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 > > >> >> > > > >> >> > powerpc/64s: Fix page table fragment refcount race vs speculative references > > >> >> > > > >> >> > The page table fragment allocator uses the main page refcount racily > > >> >> > with respect to speculative references. A customer observed a BUG due > > >> >> > to page table page refcount underflow in the fragment allocator. This > > >> >> > can be caused by the fragment allocator set_page_count stomping on a > > >> >> > speculative reference, and then the speculative failure handler > > >> >> > decrements the new reference, and the underflow eventually pops when > > >> >> > the page tables are freed. > > >> >> > > > >> >> > Fix this by using a dedicated field in the struct page for the page > > >> >> > table fragment allocator. > > >> >> > > > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") > > >> >> > Cc: stable@vger.kernel.org # v3.10+ > > >> >> > > >> >> That's the commit that added the BUG_ON(), so prior to that you won't > > >> >> see the crash. > > >> > > > >> > Right, but the commit says it fixes page table page refcount underflow by > > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow > > >> > for this pt_frag_refcount. > > >> > > >> The fixed underflow is caused by a bug (race on page count) that got > > >> fixed by that patch. You are hitting a different underflow here. It's > > >> not certain my patch caused it, I'm just trying to reproduce now. > > > > > > Ok. > > > > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a > > 4GB guest (via host adding / removing memory device), and it just works. > > Boot, add 8G, reboot, remove 8G is the sequence to reproduce. > > > > > It's likely to be an edge case like an off by one or rounding error > > that just happens to trigger in your config. Might be easiest if you > > could test with a debug patch. > > Sure, I will continue debugging. When the guest is rebooted after hotplug, the entire memory (which includes the hotplugged memory) gets remapped again freshly. However at this time since no slab is available yet, pt_frag_refcount never gets initialized as we never do pte_fragment_alloc() for these mappings. So we right away hit the underflow during the first unplug itself, it looks like. I will check how this can be fixed. > > Regards, > Bharata.