Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3993463imm; Mon, 25 Jun 2018 08:01:41 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIFTPeLwNfzji/Pr6IFGZxv0dUGnFV6pqKLQ+BGPCx218m0jF4zhmNCw8ogOcKKVLu+YOhW X-Received: by 2002:a63:9f1a:: with SMTP id g26-v6mr10741321pge.178.1529938901657; Mon, 25 Jun 2018 08:01:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529938901; cv=none; d=google.com; s=arc-20160816; b=WYkRj2Tn6hJXjdcYdqge0ZlcRAO+wLURWSh6Wod+t+foU40EsKnNXC6uaj+00NEWUZ 9onl7qBgf8PXPi4gcE3GV2MGi6+rwqgi+YjWBXsyKSSPibubOS838BqvbvihxFL5VK0B RjiCikMV09noKv8Fz4FppBtar2LGfD1xeQFi5N6S+GoIUyhg0RtZQgmKbisJOGJ3Eewo eVVYCtdGvpN9BKH1rFQn6W2oTJhggLJvz8lh/9aipwnyYywPr59kJRf1aBWDXm1YkXJD jdf71HWa0HQ0FsFJw75nKUcBwNtWDvD+/nlsvLIu55ZP+08BOMd4LqrCMwRu6RVCVZz9 FK/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=z6O0ISdK3m2j53R8xvBKVVz+RSUDTuxZtPNbqmUAhWE=; b=fvCbXPiB7J02M3gr05Is/xdBYj2X14srX03ciGBmjrppKRJalxsq3gyMTaD75hyTgS hy8e9mcGXwm5WnNny06CRr6414LZlhU7NymGAXidjpIuBc4c6C53tbl2PRI+1t/5VN3p 59zLW3VDQZl0LbL1R/g14QqRIBsRTHEat3rOIfwJxcmMB0UcMGfbYSbijyCoZPXwHt5F zT/e1h7eMJn6sgDD0sDqHKh/C9/EUy18Y0rr3Pb/rU2eUkCH5W2iM411LMsZerB+jOBL TgXBXs5t6bsXEdL0kWToAxr4e5Fiw+7iYc8XSwV7BDZ/KvpBQjtpXcrcjtyhcTcSpaCi Qfgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bg2-v6si14306857plb.243.2018.06.25.08.01.21; Mon, 25 Jun 2018 08:01:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934623AbeFYPAn (ORCPT + 99 others); Mon, 25 Jun 2018 11:00:43 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:46971 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934575AbeFYPAj (ORCPT ); Mon, 25 Jun 2018 11:00:39 -0400 Received: by mail-ot0-f196.google.com with SMTP id v24-v6so15363485otk.13 for ; Mon, 25 Jun 2018 08:00:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=z6O0ISdK3m2j53R8xvBKVVz+RSUDTuxZtPNbqmUAhWE=; b=tsiIOeNNWCcBMdngaFr1FhQtjbsfjOVvjW1+9eRf6lfagqC0zmsmMTXKjAJQC8F405 qOp/wDc1KdOlnHesd7jyw4RzrhPUBlzij1YgTiGxKQ+aa22mBOxOqBPGM6FtrxE3N4VF pOTVzCY76nV848yHRH4/ohfjnIxzgWxNyxFGQ6KpD6joSJnTuTjUOi18nfQpCdbik9e2 4d94FczxQmbexBHhpBdy9D2xM4DKruFe1FZ94r/MVCES41Lie8uPNBE6mWWxQ08FIg2D ivFwAXmVIpeoEo1bBYrIrm+7VPqwl3ZgxteZLyzfHzxmllUzeJk5wqkwNQuI391q758P p3NQ== X-Gm-Message-State: APt69E3Y2e/Dzu4xp++/o2PSZ+mAMKXEsFBttxyVtxd+2TF7ZNrYp0ca c7a7dmQyCevwnmMHMK/7SBt2vUnGvpvJ0KD8zH3U/A== X-Received: by 2002:a9d:624:: with SMTP id 33-v6mr4533543otn.327.1529938838133; Mon, 25 Jun 2018 08:00:38 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2b2c:0:0:0:0:0 with HTTP; Mon, 25 Jun 2018 08:00:37 -0700 (PDT) In-Reply-To: <20180621190721.bdr42xmp6f2uba7x@pburton-laptop> References: <20180316143855.29838-1-neelx@redhat.com> <20180621190721.bdr42xmp6f2uba7x@pburton-laptop> From: Daniel Vacek Date: Mon, 25 Jun 2018 17:00:37 +0200 Message-ID: Subject: Re: [PATCH] Revert "mm: page_alloc: skip over regions of invalid pfns where possible" To: Paul Burton Cc: open list , Linux-MM , Ard Biesheuvel , Andrew Morton , Michal Hocko , Vlastimil Babka , Mel Gorman , Pavel Tatashin , stable Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 21, 2018 at 9:07 PM, Paul Burton wrote: > Hi Daniel, > > Hmm... I only just noticed this because you CC'd an email address that > is no longer functional. I presume you're not using .mailmap, which > would have given you my current email address. Hi Paul, I do not remember exactly but I guess I used either get_maintainers script or the email from your commit. I'm sorry for the inconvenience. > On Fri, Mar 16, 2018 at 03:38:55PM +0100, Daniel Vacek wrote: >> This reverts commit b92df1de5d289c0b5d653e72414bf0850b8511e0. The commit >> is meant to be a boot init speed up skipping the loop in memmap_init_zone() >> for invalid pfns. But given some specific memory mapping on x86_64 (or more >> generally theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) > > My patch definitely wasn't ARM-specific & I have never tested it on ARM. > It was motivated by a MIPS platform with an extremely sparse memory map. > Could you explain why you think it depends on ARM or > CONFIG_HAVE_ARCH_PFN_VALID? Hopefully explained further below. >> the implementation also skips valid pfns which is plain wrong and causes >> 'kernel BUG at mm/page_alloc.c:1389!' > > Which VM_BUG_ON is that? I don't see one on line 1389 as of commit > b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where > possible") or any mainline final release since. The report was from RHEL kernel actually. But it still applied to upstream tree. It was this one https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?id=274a1ff0704bc8fef76dbe2d6fb197ddbc23f380#n1913 later changed with commit 3e04040df6d4 which I believe does not really change or improve much anything, unfortunately... >> crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1 >> kernel BUG at mm/page_alloc.c:1389! >> invalid opcode: 0000 [#1] SMP >> -- >> RIP: 0010:[] [] move_freepages+0x15e/0x160 >> RSP: 0018:ffff88054d727688 EFLAGS: 00010087 >> -- >> Call Trace: >> [] move_freepages_block+0x73/0x80 >> [] __rmqueue+0x263/0x460 >> [] get_page_from_freelist+0x7e1/0x9e0 >> [] __alloc_pages_nodemask+0x176/0x420 >> -- >> RIP [] move_freepages+0x15e/0x160 >> RSP >> >> crash> page_init_bug -v | grep RAM >> 1000 - 9bfff System RAM (620.00 KiB) >> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB) >> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB) >> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) >> 7b788000 - 7b7fffff System RAM (480.00 KiB) >> 100000000 - 67fffffff System RAM ( 22.00 GiB) >> >> crash> page_init_bug | head -6 >> 7b788000 - 7b7fffff System RAM (480.00 KiB) >> 1fffff00000000 0 1 DMA32 4096 1048575 >> 505736 505344 505855 >> 0 0 0 DMA 1 4095 >> 1fffff00000400 0 1 DMA32 4096 1048575 >> BUG, zones differ! >> >> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000 >> PAGE PHYSICAL MAPPING INDEX CNT FLAGS >> ffffea0001e00000 78000000 0 0 0 0 >> ffffea0001ed7fc0 7b5ff000 0 0 0 0 >> ffffea0001ed8000 7b600000 0 0 0 0 <<<< >> ffffea0001ede1c0 7b787000 0 0 0 0 >> ffffea0001ede200 7b788000 0 0 1 1fffff00000000 > > I'm not really sure what I'm looking at here. I presume you're saying > that memmap_init_zone() didn't initialize the struct page for > phys=0x7b788000? Quite the opposite. It's the first one which gets correctly initialized as it is the start of next usable range as returned by memblock_next_valid_pfn(). Though early_pfn_valid() returns true for all frames in this section starting with 0x78000 (at least on x86 where it is based on the memsection implementation) so the next valid pfn should correctly be frame 0x78000 instead of 0x7b788. The crash was caused by accessing page 0xffffea0001ed8000 (covering phys 0x7b600000) as move_freepages_block() aligns the start_pfn to pageblock_nr_pages before calling move_freepages(). The arm implementation of early_pfn_valid() is actually based on memblock and returns false for frames 0x78000 through 0x7b787 hence I thought you based the memblock_next_valid_pfn() implementation on this ARM semantics enabled by CONFIG_HAVE_ARCH_PFN_VALID instead of the generic early_pfn_valid() version based on memory sections implementation. When I am thinking about it now, instead of reverting it could also have been #ifdefed on CONFIG_HAVE_ARCH_PFN_VALID. That way ARM could still use the advantage but not MIPS I believe. > Could you describe the memblock region list, and what ranges > memmap_init_zone() skipped over? I guess that's already explained above. The memblock regions matched the usable 'System RAM' ranges as dumped from iomem resources in my commit message, IIRC. Let me dump the data if I can still find it. crash> memblock.memory.cnt,memory.regions memblock memory.cnt = 0x7, memory.regions = 0xffffffff81af1140 crash> memblock_region.base,size,flags,nid memblock_memory_init_regions 7 | sed 's/^ /\t/' | paste - - - - - | column -ts' ' base = 0x1000 size = 0x9b000 flags = 0x0 nid = 0x0 base = 0x100000 size = 0x42fc0000 flags = 0x0 nid = 0x0 base = 0x4b0c8000 size = 0xed5000 flags = 0x0 nid = 0x0 base = 0x4bfac000 size = 0x18706000 flags = 0x0 nid = 0x0 base = 0x7b788000 size = 0x78000 flags = 0x0 nid = 0x0 base = 0x100000000 size = 0x380000000 flags = 0x0 nid = 0x0 base = 0x480000000 size = 0x200000000 flags = 0x0 nid = 0x1 Yeah, so it matches with the node break added. > Thanks, > Paul Thank you for looking into it! If you have any further questions, just drop me an email. And have a nice day. --nX