Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2821252imm; Mon, 10 Sep 2018 07:01:53 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYohkv35Uq4WaUDyG1FLi0GW9IXIscItohws1BnVWgOEqZC5HrNjFJOzTVioGR9jBl7ENio X-Received: by 2002:a62:c90a:: with SMTP id k10-v6mr23869999pfg.180.1536588113854; Mon, 10 Sep 2018 07:01:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536588113; cv=none; d=google.com; s=arc-20160816; b=DUxzvHdxd+g2WrN7AuqbFDCmCQelagV5i8SjW8qdX14Tp6+z0b0XomUn2A+kIdg5yw wtOifLZH9MCqRnz6zoPlaedLzO3LmO0eYzBkY4RlB9vW2xPXzcqLslQcE43I3khBdban +oR09e16uN1vyfdD/ryBOB5jQAxMJv3WO0MKvazA6u7jg/oxae5komlW06crY5k8XU26 D7vJcYYjXVCnhBvcEA7WpwXDAkHVgmcp0q0cKuwjnXFn9Gtz2GlcW343KlgHlqTGGFQr WmI35PQKs1HUoZeV+jlB5OvYP7GU3mxlkMwFKXhQnldog6rVcQEStBVVs/+SUgwZm0CU AeRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=rYHnM4/0i3xEyAD9wHnb2Ka6+U3BKDRmFEo3/ij8oVk=; b=z0bGH0DjkbeoQYrE0ueJ9AdehcRRLlKmXu5QT7P24T5ERLm5ZLMcI2AGArWD3SW9Q1 oZPSf58zUoDksr+RKn1BxkSObiAWhuDWXo15G46wFCnVMIiIGVgiSzwr/o6ANfzC08UG q7n4ps7EpgdA1H5+EahVG41vU9VUiWrgozcpaE66RYpB4iGQXx0iqd5W6fcnDFpUqlTT r5JzGntnOfynw0U4u1fGEiljGigyc1KLvWzR/oWAY8238sLe8Qkq6tUhnTq35lGxGLWw qQvuvcUjSne8VYRFqhmDZuSJiOrrNlpQunwcgcauTALB4AAKwK/NI2CwNuFpUF4jQtgJ uLAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7-v6si17610997pgi.261.2018.09.10.07.01.37; Mon, 10 Sep 2018 07:01:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728813AbeIJSyQ (ORCPT + 99 others); Mon, 10 Sep 2018 14:54:16 -0400 Received: from mx2.suse.de ([195.135.220.15]:53910 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728095AbeIJSyQ (ORCPT ); Mon, 10 Sep 2018 14:54:16 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EA25BAE3A; Mon, 10 Sep 2018 13:59:59 +0000 (UTC) Date: Mon, 10 Sep 2018 15:59:59 +0200 From: Michal Hocko To: Pasha Tatashin Cc: Mikhail Zaslonko , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "osalvador@suse.de" , "gerald.schaefer@de.ibm.com" Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary Message-ID: <20180910135959.GI10951@dhcp22.suse.cz> References: <20180910123527.71209-1-zaslonko@linux.ibm.com> <20180910131754.GG10951@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 10-09-18 13:46:45, Pavel Tatashin wrote: > > > On 9/10/18 9:17 AM, Michal Hocko wrote: > > [Cc Pavel] > > > > On Mon 10-09-18 14:35:27, Mikhail Zaslonko wrote: > >> If memory end is not aligned with the linux memory section boundary, such > >> a section is only partly initialized. This may lead to VM_BUG_ON due to > >> uninitialized struct pages access from is_mem_section_removable() or > >> test_pages_in_a_zone() function. > >> > >> Here is one of the panic examples: > >> CONFIG_DEBUG_VM_PGFLAGS=y > >> kernel parameter mem=3075M > > > > OK, so the last memory section is not full and we have a partial memory > > block right? > > > >> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > > > > OK, this means that the struct page is not fully initialized. Do you > > have a specific place which has triggered this assert? > > > >> ------------[ cut here ]------------ > >> Call Trace: > >> ([<000000000039b8a4>] is_mem_section_removable+0xcc/0x1c0) > >> [<00000000009558ba>] show_mem_removable+0xda/0xe0 > >> [<00000000009325fc>] dev_attr_show+0x3c/0x80 > >> [<000000000047e7ea>] sysfs_kf_seq_show+0xda/0x160 > >> [<00000000003fc4e0>] seq_read+0x208/0x4c8 > >> [<00000000003cb80e>] __vfs_read+0x46/0x180 > >> [<00000000003cb9ce>] vfs_read+0x86/0x148 > >> [<00000000003cc06a>] ksys_read+0x62/0xc0 > >> [<0000000000c001c0>] system_call+0xdc/0x2d8 > >> > >> This fix checks if the page lies within the zone boundaries before > >> accessing the struct page data. The check is added to both functions. > >> Actually similar check has already been present in > >> is_pageblock_removable_nolock() function but only after the struct page > >> is accessed. > >> > > > > Well, I am afraid this is not the proper solution. We are relying on the > > full pageblock worth of initialized struct pages at many other place. We > > used to do that in the past because we have initialized the full > > section but this has been changed recently. Pavel, do you have any ideas > > how to deal with this partial mem sections now? > > We have: > > remove_memory() > BUG_ON(check_hotplug_memory_range(start, size)) > > That supposed to safely check for this condition: if [start, start + > size) not block size aligned (and we know block size is section > aligned), hot remove is not allowed. The problem is this check is late, > and only happens when invalid range has already passed through previous > checks. > > We could add check_hotplug_memory_range() to is_mem_section_removable(): > > is_mem_section_removable(start_pfn, nr_pages) > if (check_hotplug_memory_range(PFN_PHYS(start_pfn), PFN_PHYS(nr_pages))) > return false; > > I think it should work. I do not think we want to sprinkle these tests over all pfn walkers. Can we simply initialize those uninitialized holes as well and make them reserved without handing them over to the page allocator? That would be much more robust approach IMHO. -- Michal Hocko SUSE Labs