Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp2148493imc; Tue, 12 Mar 2019 08:00:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqxHwOSgkhy/uWM6r+1NJjdMWlNje792C1uyS4HC3sPYx0d/eGp69fzjymuGFITMbtqciSze X-Received: by 2002:a17:902:8ecb:: with SMTP id x11mr39404580plo.40.1552402819473; Tue, 12 Mar 2019 08:00:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552402819; cv=none; d=google.com; s=arc-20160816; b=KnGpdpYFF76878CAcNiks/GpXPgUjEb893aYrqk8fh8QpfrCxJpmn1VwGZ2pbgbNr5 a9ySz3+xDT85OHdxxdQ8vfKbVSC+PqhIQ/lgj8jmbSe04UVJ5j0htROSitBJoxALQT9G QS6iCzr3M0gazgclgdXUltGa46vEtqgXtfrze64gUc4NhY0uE1E/kKE8DXxlP6vXFtzP R4bokEeMKXGhBkZKo/Fs+bjtDN43TAv8xMHea6oS8mxD6OdpjArSaLTygZbv8msaeoJi A+gP+rW0MlZ7m7qxs7GjVMYQod1wNfrVi3et3U0fWXODmZ83bSqeOiQt63KBZwIVj4D4 rzeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=wl7Ff/wPObHM79OrDDPjxHJ6QXsnrK3/b8nVrJfMOtc=; b=QEqk0WY99rpgNXswwnfmVzXPGCojqzxPOcznk/RTsag3RBDIaJgNUNGMla4VxpDbvl Eceknf4FyRR9+l2Hcp1LlgGp1/mkvEV6EmLvA4KIcqvRnkJOm66Zpgtf3EhEn4V2qotM 53Fo6+/AONw7U68jn9bYO20GuraHR4B+qGqXd9qpUPnVgLEOyB4tKZ2J0VwoVUQqXpsz a19s7/S6eW7yshFWAzpGaEtPtEmVV/GWMUgoxmTFhgPfh4umJFMk8H44ROgqGGnmjkzR S4FHCqVzgo8FUSUFyNGBUlH8I5QoNW0SNYRo6zYxxEy+Po/Y0QAKob0N3pArLAvBcR7Q 3ivg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z2si1413529pgp.1.2019.03.12.08.00.02; Tue, 12 Mar 2019 08:00:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726848AbfCLO6X (ORCPT + 99 others); Tue, 12 Mar 2019 10:58:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:37028 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726514AbfCLO6Q (ORCPT ); Tue, 12 Mar 2019 10:58:16 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 44819B682; Tue, 12 Mar 2019 14:58:15 +0000 (UTC) Date: Tue, 12 Mar 2019 15:58:13 +0100 From: Michal Hocko To: Laurent Dufour Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton Subject: Re: [PATCH] mm/slab: protect cache_reap() against CPU and memory hot plug operations Message-ID: <20190312145813.GS5721@dhcp22.suse.cz> References: <20190311191701.24325-1-ldufour@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190311191701.24325-1-ldufour@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 11-03-19 20:17:01, Laurent Dufour wrote: > The commit 95402b382901 ("cpu-hotplug: replace per-subsystem mutexes with > get_online_cpus()") remove the CPU_LOCK_ACQUIRE operation which was use to > grap the cache_chain_mutex lock which was protecting cache_reap() against > CPU hot plug operations. > > Later the commit 18004c5d4084 ("mm, sl[aou]b: Use a common mutex > definition") changed cache_chain_mutex to slab_mutex but this didn't help > fixing the missing the cache_reap() protection against CPU hot plug > operations. > > Here we are stopping the per cpu worker while holding the slab_mutex to > ensure that cache_reap() is not running in our back and will not be > triggered anymore for this cpu. > > This patch fixes that race leading to SLAB's data corruption when CPU > hotplug are triggered. We hit it while doing partition migration on PowerVM > leading to CPU reconfiguration through the CPU hotplug mechanism. What is the actual race? slab_offline_cpu calls cancel_delayed_work_sync so it removes a pending item and waits for the item to finish if they run concurently. So why do we need an additional lock? > This fix is covering kernel containing to the commit 6731d4f12315 ("slab: > Convert to hotplug state machine"), ie 4.9.1, earlier kernel needs a > slightly different patch. > > Cc: stable@vger.kernel.org > Cc: Christoph Lameter > Cc: Pekka Enberg > Cc: David Rientjes > Cc: Joonsoo Kim > Cc: Andrew Morton > Signed-off-by: Laurent Dufour > --- > mm/slab.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/slab.c b/mm/slab.c > index 28652e4218e0..ba499d90f27f 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1103,6 +1103,7 @@ static int slab_online_cpu(unsigned int cpu) > > static int slab_offline_cpu(unsigned int cpu) > { > + mutex_lock(&slab_mutex); > /* > * Shutdown cache reaper. Note that the slab_mutex is held so > * that if cache_reap() is invoked it cannot do anything > @@ -1112,6 +1113,7 @@ static int slab_offline_cpu(unsigned int cpu) > cancel_delayed_work_sync(&per_cpu(slab_reap_work, cpu)); > /* Now the cache_reaper is guaranteed to be not running. */ > per_cpu(slab_reap_work, cpu).work.func = NULL; > + mutex_unlock(&slab_mutex); > return 0; > } > > -- > 2.21.0 -- Michal Hocko SUSE Labs