Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4656656yba; Mon, 20 May 2019 01:07:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqzdkjsgdVQTgE6XvqKaAkUvQBNpAY/DUsWX4H0xZXKnsLkOrlms3AKTm9AqZWXxu/xvQOb4 X-Received: by 2002:a62:5b81:: with SMTP id p123mr80502477pfb.158.1558339662162; Mon, 20 May 2019 01:07:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558339662; cv=none; d=google.com; s=arc-20160816; b=K+dtTheKvcmvYTx99R68dedicuTOgLIDSW3Dk+vIeUzSTNsdcg3T3+kMryfm3rtdWF u9Sm47MXBA2WUHgInYnVFq25V+rD8n9dNXbhAjFzLrj1SCQW8F1Eh6UXizrhVwyPfk3V ywGuGe2N3/mIxlt59Xupd64dWCnp/eD+hspw8W0XGJ80+Z+/wpCaQ8mAHeh8XcMT2i5L q5QtlSvo4odZUqaQ1HYI6QkRpTM71SD5U7xNvCUNhU/TshnsGOpMH9ag5KN5jglV2vvW PXMbDLN47gVn0ImpRLdafMYvbIFX1N1c0kx8lFHt3x17nvNkqgPebZhpr0O+r+BrMVzv pQGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:message-id :user-agent:mime-version:in-reply-to:references:cc:to:subject:from :date:dkim-signature; bh=08DG43aOD+Euaff1im+598ddGTX4txkr6BAgSvzzebg=; b=ozqCIxoiL0aUwNGasrpFBRG0Hh0zcorrLIxkmAsfNyaBbcTL4VnqZe3frsDAS2gY3K W4noKbqBYzwlVMQvtFF5lQ3aYSdD9ymeA1AeigLU6ann4vp5s35OEiaye8Pv3mj4v/bb iPmCJc5i5MDYy7T2TSULOq1PYv0mecgXgk9zbkKhQnog1AZGTCo9nj/QfK4zOwoq+Mt6 UOEOQrlWcSM05xNSYNkjR2exxpaF1cM6mwY/2PbZ5BVWYXeCph0a7uTqw9OQBcoVqmgW pLjdVE4i4KY5ZnMC+l4+IZNIr2bYpnd/zGSW3ouwkKc9tHBXQvoZl24RP1Beu+kQNjch 5Ytg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lYLwGxL3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t22si17051425pgg.292.2019.05.20.01.07.27; Mon, 20 May 2019 01:07:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lYLwGxL3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728977AbfETHBV (ORCPT + 99 others); Mon, 20 May 2019 03:01:21 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:46302 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725372AbfETHBV (ORCPT ); Mon, 20 May 2019 03:01:21 -0400 Received: by mail-pl1-f196.google.com with SMTP id r18so6247377pls.13; Mon, 20 May 2019 00:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :user-agent:message-id:content-transfer-encoding; bh=08DG43aOD+Euaff1im+598ddGTX4txkr6BAgSvzzebg=; b=lYLwGxL3wK2u4wZzmtQ37p/dK5KP31YOg+9GaIBPjcXpzSKOvrN0rUCR6eJ5ammzSj nCsA4dCKFmP64zrGVCk4PkF7w5usqGCROsUGpEZbd1F788v7n1N2GefY0kOTMpHt7Hyt 4G8CQGeHMGeXi1C6OGpo9vIh73RK3KAqRQSiINaIft9hRrRLSLaFr0kQFwdTMS5/e2R/ iAQZkSXK64YEgRVVqevqYZa8F/EW2Vkw+0B//9D+T3Cf0C75/ey05tdQdtodOFj7Ol/s ihBQ+ZOI3OoxCEVWWkZf41FeiCGJvRbGp3w1ssCe+2nDUjmgT/xXKPkSmz9nO+fFhpLe FhuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:user-agent:message-id:content-transfer-encoding; bh=08DG43aOD+Euaff1im+598ddGTX4txkr6BAgSvzzebg=; b=mXMoKn81cxf7Tr0HelpZSYoLJ1LSkNJG77SuCabYf8Uh6m/ETaz0SkushqWPPx0r7Z FxPXclLACHZtKSpah+V1mzpATMIa6qz/ZLryM1kRT1JM5UZ++6CXCPFG0I5zuS14Rqtn Jmj6Fna8xGApYBF9vv0oAud8hSUwIHoB/EqJiqYGbvGq4kWqeLgNuFYCfMzPwGE8QK1N /yESG6FIA6QTLNP3oClGEY+T1zY/E++9S8ZMaNmEP9PiYIEK36hAIiISbnx8LLOAXYXU Ey44H61Zo8LGg1E6UOKxk1sUqpuLI2/4OTnHuyuXt3wsIL6x79nslXqa/RliTmpjUAcb cTww== X-Gm-Message-State: APjAAAVi6OnV6XrdlYKFt8UbOxMGbLDp36s0gmisDsRUtloa/Tnrlq7B UUFf+D3nzPe4JSxykWY06xY= X-Received: by 2002:a17:902:aa85:: with SMTP id d5mr73024563plr.245.1558335680631; Mon, 20 May 2019 00:01:20 -0700 (PDT) Received: from localhost (193-116-79-244.tpgi.com.au. [193.116.79.244]) by smtp.gmail.com with ESMTPSA id x28sm17981692pfo.78.2019.05.20.00.01.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 20 May 2019 00:01:20 -0700 (PDT) Date: Mon, 20 May 2019 17:00:21 +1000 From: Nicholas Piggin Subject: Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest To: bharata@linux.ibm.com Cc: aneesh.kumar@linux.ibm.com, bharata@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , srikanth References: <16a7a635-c592-27e2-75b4-d02071833278@linux.vnet.ibm.com> <20190518141434.GA22939@in.ibm.com> <878sv1993k.fsf@concordia.ellerman.id.au> <20190520042533.GB22939@in.ibm.com> <1558327521.633yjtl8ki.astroid@bobo.none> <20190520055622.GC22939@in.ibm.com> In-Reply-To: <20190520055622.GC22939@in.ibm.com> MIME-Version: 1.0 User-Agent: astroid/0.14.0 (https://github.com/astroidmail/astroid) Message-Id: <1558335484.9inx69a7ea.astroid@bobo.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Bharata B Rao's on May 20, 2019 3:56 pm: > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote: >> >> > git bisect points to >> >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d >> >> > Author: Nicholas Piggin >> >> > Date: Fri Jul 27 21:48:17 2018 +1000 >> >> > >> >> > powerpc/64s: Fix page table fragment refcount race vs speculati= ve references >> >> > >> >> > The page table fragment allocator uses the main page refcount r= acily >> >> > with respect to speculative references. A customer observed a B= UG due >> >> > to page table page refcount underflow in the fragment allocator= . This >> >> > can be caused by the fragment allocator set_page_count stomping= on a >> >> > speculative reference, and then the speculative failure handler >> >> > decrements the new reference, and the underflow eventually pops= when >> >> > the page tables are freed. >> >> > >> >> > Fix this by using a dedicated field in the struct page for the = page >> >> > table fragment allocator. >> >> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage"= ) >> >> > Cc: stable@vger.kernel.org # v3.10+ >> >>=20 >> >> That's the commit that added the BUG_ON(), so prior to that you won't >> >> see the crash. >> >=20 >> > Right, but the commit says it fixes page table page refcount underflow= by >> > introducing a new field &page->pt_frag_refcount. Now we are hitting th= e underflow >> > for this pt_frag_refcount. >>=20 >> The fixed underflow is caused by a bug (race on page count) that got=20 >> fixed by that patch. You are hitting a different underflow here. It's >> not certain my patch caused it, I'm just trying to reproduce now. >=20 > Ok. Can't reproduce I'm afraid, tried adding and removing 8GB memory from a 4GB guest (via host adding / removing memory device), and it just works. It's likely to be an edge case like an off by one or rounding error that just happens to trigger in your config. Might be easiest if you could test with a debug patch. Thanks, Nick =