Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Date:   Fri, 19 Mar 2021 19:09:24 +0000
From:   Luis Chamberlain <mcgrof@kernel.org>
To:     Minchan Kim <minchan@kernel.org>
Cc:     gregkh@linuxfoundation.org, ngupta@vflare.org,
        sergey.senozhatsky.work@gmail.com, axboe@kernel.dk,
        mbenes@suse.com, linux-block@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug
 multistate
Message-ID: <20210319190924.GK4332@42.do-not-panic.com>
References: <20210306022035.11266-1-mcgrof@kernel.org>
 <20210306022035.11266-2-mcgrof@kernel.org>
 <YEbjom8FIclEgRYv@google.com>
 <20210310212128.GR4332@42.do-not-panic.com>
 <YErOkGrvtQODXtB0@google.com>
 <20210312183238.GW4332@42.do-not-panic.com>
 <YEvA1dzDsFOuKdZ/@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <YEvA1dzDsFOuKdZ/@google.com>
Precedence: bulk

On Fri, Mar 12, 2021 at 11:28:21AM -0800, Minchan Kim wrote:
> On Fri, Mar 12, 2021 at 06:32:38PM +0000, Luis Chamberlain wrote:
> > On Thu, Mar 11, 2021 at 06:14:40PM -0800, Minchan Kim wrote:
> > > On Wed, Mar 10, 2021 at 09:21:28PM +0000, Luis Chamberlain wrote:
> > > > On Mon, Mar 08, 2021 at 06:55:30PM -0800, Minchan Kim wrote:
> > > > > If I understand correctly, bugs you found were related to module
> > > > > unloading race while the zram are still working.
> > > > 
> > > > No, that is a simplifcation of the issue. The issue consists of
> > > > two separate issues:
> > > > 
> > > >  a) race against module unloading in light of incorrect racty use of
> > > >     cpu hotplug multistate support
> > > 
> > > 
> > > Could you add some pusedo code sequence to show the race cleary?
> > 
> > Let us deal with each issue one at time. First, let's address
> > understanding the kernel warning can be reproduced easily by
> > triggering zram02.sh from LTP twice:
> > 
> > kernel: ------------[ cut here ]------------
> > kernel: Error: Removing state 63 which has instances left.
> > kernel: WARNING: CPU: 7 PID: 70457 at kernel/cpu.c:2069 __cpuhp_remove_state_cpuslocked+0xf9/0x100
> > kernel: Modules linked in: zram(E-) zsmalloc(E) <etc>
> > 
> > The first patch prevents this race. This race is possible because on
> > module init we associate callbacks for CPU hotplug add / remove:
> > 
> > static int __init zram_init(void)                                               
> > {
> > 	...
> > 	ret = cpuhp_setup_state_multi(CPUHP_ZCOMP_PREPARE, "block/zram:prepare",
> > 	                              zcomp_cpu_up_prepare, zcomp_cpu_dead); 
> > 	...
> > }
> > 
> > The zcomp_cpu_dead() accesses the zcom->comp, and if zcomp->comp is
> > removed and this function is called, clearly we'll be accessing some
> > random data here and can easily crash afterwards:
> 
> I am trying to understand this crash. You mentioned the warning
> above but here mention crash(I understand sysfs race but this is
> different topic). What's the relation with above warning and
> crash here? You are talking the warning could change to the
> crash sometimes? 

Indeed one issue is a consequence of the other but a bit better
description can be put together for both separately.

The warning comes up when cpu hotplug detects that the callback
is being removed, but yet "instances" for multistate are left behind,
ie, not removed. CPU hotplug multistate allows us to dedicate a callback
for zram so that it gets called every time a CPU hot plugs or unplugs.
In the zram driver's case we want to trigger the callback per each
struct zcomp, one is used per supported and used supported compression
algorithm. The zram driver allocates a struct zcomp for an compression
algorithm on a need basis. The default compressor is CONFIG_ZRAM_DEF_COMP
which today is CONFIG_ZRAM_DEF_COMP_LZORLE ("lzo-rle"). Although we may
have different zram devices, zram devices which use the same compression
algorithm share the same struct zcomp. The "multi" in CPU hotplug multstate
refers to these different struct zcomp instances, each of these will
have the callback routine registered run for it. The kernel's CPU
hotplug multistate keeps a linked list of these different structures
so that it will iterate over them on CPU transitions. By default at
driver initialization we will create just one zram device (num_devices=1)
and a zcomp structure then set for the lzo-rle comrpession algorithm.
At driver removal we first remove each zram device, and so we destroy
the struct zcomp. But since we expose sysfs attributes to create new
devices or reset / initialize existing ones, we can easily end up
re-initializing a struct zcomp before the exit routine of the module
removes the cpu hotplug callback. When this happens the kernel's
CPU hotplug will detect that at least one instance (struct zcomp for
us) exists.

And so we can have:

static void destroy_devices(void)
{
	class_unregister(&zram_control_class);
	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
	zram_debugfs_destroy();
	idr_destroy(&zram_index_idr);
	unregister_blkdev(zram_major, "zram");
	cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
}

While destroy_devices() runs we can have this race:


CPU 1                            CPU 2

class_unregister(...);
idr_for_each(...);
zram_debugfs_destroy();
                                disksize_store(...)
idr_destroy(...);
unregister_blkdev(...);
cpuhp_remove_multi_state(...);


The warning comes up on cpuhp_remove_multi_state() when it sees
that the state for CPUHP_ZCOMP_PREPARE does not have an empty
instance linked list.

After the idr_destory() its also easy to run into a crash. The
above predicament also leads to memory leaks.

And so we need to have a state machine for when we're up and when
we're going down generically.

Let me know if it makes sense now, if so I can amend the commit log
accordingly.

  Luis