Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp500536imm; Wed, 11 Jul 2018 06:18:34 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcVhRaMIN8aeFy1LUEoF5CyeL2R1K2/W8YYRjF4tkcUKcvZoA12aRwk4lF3hAS15PvPQcff X-Received: by 2002:a62:bd03:: with SMTP id a3-v6mr29867752pff.138.1531315114073; Wed, 11 Jul 2018 06:18:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531315114; cv=none; d=google.com; s=arc-20160816; b=Z1IimgCdeApyQnxezGxz4mCJbEOA5/kuQpL4JKyP1PzEvV35soONxRuViCE+K/zLTt x8dco+j1fWboMhJrky+cCi3hhDUHsEMOcgP3zNEw9SM1X31ZoeiSuz+IYp5DGLuxbtuE KbrGw5QK70s8p/S8XniLwUJkNpZ78IycSsRai325txzu5tYS4VuTvywcbgTPQfYicXxO Xw7nhh2fQceiacQcdy4CdufilKyPYcLd/S7Xy1Ddvdt2T71ryFcRJwv4br7pISMSBNBm ywudvl5ZkJidqJ0ef6qQgvEDyYy9Ma2BfusBuHZG7POHqSqa7YT87azTA2zQplJbk+kX 3Xxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=uQI8z+PBD36Z7lXClum07cRcvLg1/FR8fYPVIYtlElc=; b=Ex0OC5kU6aayNq8+Nb/jrQbBRSoFwTq/Bl7jU7Je6PUHKwzLcp00i8AfLfCx1Qo3XW pkkFjUVjG6dK4eVk4WTjAcQHgZl39Mb8gtRUY+rASbUm8aluW3RhvEk0MDF5UNpppwOc sGwDPBiXX/IuZW9mlCemH3MB0gS/W0mnfSzl8ZfUUTD8q2AFez1uL5MnXMYVy7ain/Lx M44d8kg5iyewzc8QzTaCRRDFw3qPlIvnaQRIgh3mWqCE52lHzU3UrnjgHyNPeB6/VQtu sNrlbEL4E4e8sslbwls1OAruWL/8DhovXa4lim/CIRkFs5mITEHsRyHoWUQUvIBH9+SU x1iA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e5-v6si17732480pgs.449.2018.07.11.06.18.19; Wed, 11 Jul 2018 06:18:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387625AbeGKNQr (ORCPT + 99 others); Wed, 11 Jul 2018 09:16:47 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:37458 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726456AbeGKNQr (ORCPT ); Wed, 11 Jul 2018 09:16:47 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6030D80D2F6B; Wed, 11 Jul 2018 13:12:29 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 59851111AF17; Wed, 11 Jul 2018 13:12:28 +0000 (UTC) Date: Wed, 11 Jul 2018 21:12:25 +0800 From: Baoquan He To: Michael Ellerman Cc: akpm@linux-foundation.org, broonie@kernel.org, mhocko@suse.cz, sfr@canb.auug.org.au, linux-next@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mm-commits@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, pasha.tatashin@oracle.com, "Aneesh Kumar K.V" , Anshuman Khandual Subject: Re: Boot failures with "mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER" on powerpc (was Re: mmotm 2018-07-10-16-50 uploaded) Message-ID: <20180711131225.GI1969@MiWiFi-R3L-srv> References: <20180710235044.vjlRV%akpm@linux-foundation.org> <87lgai9bt5.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lgai9bt5.fsf@concordia.ellerman.id.au> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 11 Jul 2018 13:12:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 11 Jul 2018 13:12:29 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'bhe@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michael, On 07/11/18 at 10:49pm, Michael Ellerman wrote: > akpm@linux-foundation.org writes: > > The mm-of-the-moment snapshot 2018-07-10-16-50 has been uploaded to > > > > http://www.ozlabs.org/~akpm/mmotm/ > ... > > > * mm-sparse-add-a-static-variable-nr_present_sections.patch > > * mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch > > * mm-sparsemem-defer-the-ms-section_mem_map-clearing-fix.patch > > * mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch > > * mm-sparse-optimize-memmap-allocation-during-sparse_init.patch > > * mm-sparse-optimize-memmap-allocation-during-sparse_init-checkpatch-fixes.patch > > > * mm-sparse-remove-config_sparsemem_alloc_mem_map_together.patch > > This seems to be breaking my powerpc pseries qemu boots. > > The boot log with some extra debug shows eg: > > $ make pseries_le_defconfig > $ qemu-system-ppc64 -nographic -vga none -M pseries -m 2G -kernel vmlinux > vmemmap_populate f000000000000000..f000000000024000, node 0 > * f000000000000000..f000000001000000 allocated at c000000076000000 > hash__vmemmap_create_mapping: start 0xf000000000000000 size 0x1000000 phys 0x76000000 > hash__vmemmap_create_mapping: failed -1 > > > > Then there's lots of other warnings about bad page states and eventually > a NULL deref and we panic(). > > > The problem seems to be that we're calling down into > hash__vmemmap_create_mapping() for every call to vmemmap_populate(), > whereas previously we would only call hash__vmemmap_create_mapping() > once because our vmemmap_populated() would return true. > > There's actually a comment in sparse_init() that says: > > * powerpc need to call sparse_init_one_section right after each > * sparse_early_mem_map_alloc, so allocate usemap_map at first. > > So changing that behaviour does seem to be the problem. > > I assume that comment is talking about the fact that we use pfn_valid() > in vmemmap_populated(). > > I'm not clear on how to fix it though. Have you tried reverting that patch and building kernel to test again? Does it work?