Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp499764imm; Mon, 1 Oct 2018 13:29:24 -0700 (PDT) X-Google-Smtp-Source: ACcGV61TEtUmd+54LY2rFz4FG+xbPR6yvWL54Lyxw1HyMdvmgPOmIVl8+kRI7AWC1FCbkIQI8277 X-Received: by 2002:a17:902:369:: with SMTP id 96-v6mr13161062pld.120.1538425764449; Mon, 01 Oct 2018 13:29:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538425764; cv=none; d=google.com; s=arc-20160816; b=W4W79t4WnwcwrYisoo0ObgKcPHpEM8LEE5Hqja71NEiwLZIUhYYbtVFbIksSYZVjoh iWmw7MgGJX5Ge8ymBJW8QS+j1L+4CBWcZgqDBLFzCorlNGtSI2hYWtuKG53YvA5B+n/z 2PtNtLlJ/TdCC66SQX/+5TRoZNuLb2ozvJ4D8NEpvIfjSjfULAGDxm640pujMXwzzqxl 8TjCzfzOrg0gkAEIeGiVJVBpYhk9hsEzLRBINnCSUqfY4EzfEOaBSM0cOIH7fDxWD7EE lqGCtdy2tiGdnL/qtPQ5Ig8lLYkGUl9+3OReXXZVraBhigumrtTf1acNHJxJaq1FWK3s 0MGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=kW0vHvODDOxVkfSjx7RHvzr55upmoL242qMlSsXMk9c=; b=N4lBq4KZ3Ww8XT5Bl5osvbXH19CTCMIqmEt2dAbd0lwXbVxRcWGmaBBBxesapL/XQq 2GU8ENJSeRxH2b1pXRvibbrlauSzy8kSQ8sdD/T895Xbp+pIbmZ2cxrDYw7YBx1b0E10 GxP5Si5rjlcEcy+fhu/5VRSmJlZZMV3ir0r0G5MOhoK4QSBTU95MILHalg8nuAbChVAd YExjwricr2tHqbGAvXi6L2SyGmjRy+Mo7Aht2BK8i0hehROFKVNzRPaFG8Vu+cXMZygH NCMFm6iWpgmBDnS+SjXytD2LPC0OhXMOFe3DEk+Tq7lUJkgA/+A10LtuLT17M6+N3diM k85Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x195-v6si13119726pgx.294.2018.10.01.13.29.10; Mon, 01 Oct 2018 13:29:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726344AbeJBDHA (ORCPT + 99 others); Mon, 1 Oct 2018 23:07:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:60898 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725936AbeJBDHA (ORCPT ); Mon, 1 Oct 2018 23:07:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id F2BEBAEE9; Mon, 1 Oct 2018 20:27:26 +0000 (UTC) Date: Mon, 1 Oct 2018 22:27:24 +0200 From: Michal Hocko To: Michael Bringmann Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michael Ellerman , Nathan Fontenot , Nicholas Piggin , Kees Cook , Thiago Jung Bauermann , Russell Currey , Mauricio Faria de Oliveira , Christophe Leroy , Andrew Morton , Pavel Tatashin , Dan Williams , Oscar Salvador , YASUAKI ISHIMATSU , Mathieu Malaterre , Juliet Kim , Tyrel Datwyler , Thomas Falcon Subject: Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node Message-ID: <20181001202724.GL18290@dhcp22.suse.cz> References: <20181001185616.11427.35521.stgit@ltcalpine2-lp9.aus.stglabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181001185616.11427.35521.stgit@ltcalpine2-lp9.aus.stglabs.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 01-10-18 13:56:25, Michael Bringmann wrote: > In some LPAR migration scenarios, device-tree modifications are > made to the affinity of the memory in the system. For instance, > it may occur that memory is installed to nodes 0,3 on a source > system, and to nodes 0,2 on a target system. Node 2 may not > have been initialized/allocated on the target system. > > After migration, if a RTAS PRRN memory remove is made to a > memory block that was in node 3 on the source system, then > try_offline_node tries to remove it from node 2 on the target. > The NODE_DATA(2) block would not be initialized on the target, > and there is no validation check in the current code to prevent > the use of a NULL pointer. I am not familiar with ppc and the above doesn't really help me much. Sorry about that. But from the above it is not clear to me whether it is the caller which does something unexpected or the hotplug code being not robust enough. From your changelog I would suggest the later but why don't we see the same problem for other archs? Is this a problem of unrolling a partial failure? dlpar_remove_lmb does the following nid = memory_add_physaddr_to_nid(lmb->base_addr); remove_memory(nid, lmb->base_addr, block_sz); /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); dlpar_remove_device_tree_lmb(lmb); Is the whole operation correct when remove_memory simply backs off silently. Why don't we have to care about memblock resp dlpar_remove_device_tree_lmb parts? In other words how come the physical memory range is valid while the node association is not? -- Michal Hocko SUSE Labs