Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751379AbeAPCOx (ORCPT + 1 other); Mon, 15 Jan 2018 21:14:53 -0500 Received: from mail-io0-f176.google.com ([209.85.223.176]:36372 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750842AbeAPCOw (ORCPT ); Mon, 15 Jan 2018 21:14:52 -0500 X-Google-Smtp-Source: ACJfBotxhCBdqsUnMzLPNU39YXToxuB+s2Lz2Vk8QOVIuZerFn/EGmJ8pjf0GMPpUgylkxMqJZ6ObHtQCnozZRnux3A= MIME-Version: 1.0 In-Reply-To: <201801160115.w0G1FOIG057203@www262.sakura.ne.jp> References: <201801142054.FAD95378.LVOOFQJOFtMFSH@I-love.SAKURA.ne.jp> <201801160115.w0G1FOIG057203@www262.sakura.ne.jp> From: Linus Torvalds Date: Mon, 15 Jan 2018 18:14:49 -0800 X-Google-Sender-Auth: zpq18PWh9A_AiV-fdJe_g75JlzI Message-ID: Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure. To: Tetsuo Handa , Dave Hansen Cc: Linux Kernel Mailing List , linux-mm , "the arch/x86 maintainers" , linux-fsdevel , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Mon, Jan 15, 2018 at 5:15 PM, Tetsuo Handa wrote: > > I can't reproduce this with CONFIG_FLATMEM=y . But I'm not sure whether > we are hitting a bug in CONFIG_SPARSEMEM=y code, for the bug is highly > timing dependent. Hmm. Maybe. But sparsemem really also generates *much* more complex code particularly for the pfn_to_page() case. It also has much less testing. For example, on x86-64 we do use sparsemem, but we use the VMEMMAP version of sparsemem: the version that does *not* play really odd and complex games with that whole pfn_to_page(). I've always felt like sparsemem was really damn complicated. The whole "section_mem_map" encoding is really subtle and odd. And considering that we're getting what appears to be a invalid page, in one of the more complicated sequences that very much does that whole pfn_to_page(), I really wonder. I wonder if somebody could add some VM_BUG_ON() checks to the non-vmemmap case of sparsemem in include/asm-generic/memory_model.h. Because this: #define __pfn_to_page(pfn) \ ({ unsigned long __pfn = (pfn); \ struct mem_section *__sec = __pfn_to_section(__pfn); \ __section_mem_map_addr(__sec) + __pfn; \ }) is really subtle, and if we have some case where we pass in an out-of-range pfn, or some case where we get the section wrong (because the pfn is between sections or whatever due to some subtle setup bug), things will really go sideways. The reason I was hoping you could do this for FLATMEM is that it's much easier to verify the pfn range in that case. The sparsemem cases really makes it much nastier. That said, all of that code is really old. Most of it goes back to -05/06 or so. But since you seem to be able to reproduce at least back to 4.8, I guess this bug does back years too. But I'm adding Dave Hansen explicitly to the cc, in case he has any ideas. Not because I blame him, but he's touched the sparsemem code fairly recently, so maybe he'd have some idea on adding sanity checking to the sparsemem version of pfn_to_page(). > I dont know why but selecting CONFIG_FLATMEM=y seems to avoid a different bug > where bootup of qemu randomly fails at Hmm. That looks very different indeed. But if CONFIG_SPARSEMEM (presumably together with HIGHMEM) has some odd off-by-one corner case or similar, who knows *what* issues it could trigger. Linus