Subject: [ANNOUNCE] 4.8-rt1

Dear RT folks!

I'm pleased to announce the v4.8-rt1 patch set.

Changes since v4.6.7-rt14:

- rebased to v4.8

Known issues
- CPU hotplug got a little better but can deadlock.

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v4.8-rt1

The RT patch against 4.8 can be found here:

https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.8/older/patch-4.8-rt1.patch.xz

The split quilt queue is available at:

https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.8/older/patches-4.8-rt1.tar.xz

Sebastian


2016-10-16 03:08:36

by Mike Galbraith

[permalink] [raw]
Subject: [patch] ftrace: Fix latency trace header alignment

Line up helper arrows to the right column.

Signed-off-by: Mike Galbraith <[email protected]>
---
kernel/trace/trace.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2896,17 +2896,17 @@ get_total_entries(struct trace_buffer *b

static void print_lat_help_header(struct seq_file *m)
{
- seq_puts(m, "# _--------=> CPU# \n"
- "# / _-------=> irqs-off \n"
- "# | / _------=> need-resched \n"
- "# || / _-----=> need-resched_lazy \n"
- "# ||| / _----=> hardirq/softirq \n"
- "# |||| / _---=> preempt-depth \n"
- "# ||||| / _--=> preempt-lazy-depth\n"
- "# |||||| / _-=> migrate-disable \n"
- "# ||||||| / delay \n"
- "# cmd pid |||||||| time | caller \n"
- "# \\ / |||||||| \\ | / \n");
+ seq_puts(m, "# _--------=> CPU# \n"
+ "# / _-------=> irqs-off \n"
+ "# | / _------=> need-resched \n"
+ "# || / _-----=> need-resched_lazy \n"
+ "# ||| / _----=> hardirq/softirq \n"
+ "# |||| / _---=> preempt-depth \n"
+ "# ||||| / _--=> preempt-lazy-depth\n"
+ "# |||||| / _-=> migrate-disable \n"
+ "# ||||||| / delay \n"
+ "# cmd pid |||||||| time | caller \n"
+ "# \\ / |||||||| \\ | / \n");
}

static void print_event_info(struct trace_buffer *buf, struct seq_file *m)

2016-10-16 03:12:07

by Mike Galbraith

[permalink] [raw]
Subject: [patch] drivers,connector: Protect send_msg() with a local lock for RT


[ 6496.323071] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
[ 6496.323072] in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
[ 6496.323077] Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
[ 6496.323077]
[ 6496.323080] CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
[ 6496.323081] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
[ 6496.323084] 0000000000000000 ffff8801051d3d08 ffffffff813436cd 0000000000000000
[ 6496.323086] ffff880167ccab80 ffff8801051d3d28 ffffffff8109c425 ffffffff81ce91c0
[ 6496.323088] 0000000000000000 ffff8801051d3d40 ffffffff816406b0 ffffffff81ce91c0
[ 6496.323089] Call Trace:
[ 6496.323092] [<ffffffff813436cd>] dump_stack+0x65/0x88
[ 6496.323094] [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
[ 6496.323097] [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
[ 6496.323100] [<ffffffff81640978>] rt_read_lock+0x28/0x30
[ 6496.323103] [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
[ 6496.323106] [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
[ 6496.323109] [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
[ 6496.323111] [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
[ 6496.323114] [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
[ 6496.323116] [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
[ 6496.323119] [<ffffffff81077f71>] do_exit+0x5d1/0xba0
[ 6496.323122] [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
[ 6496.323125] [<ffffffff81078654>] SyS_exit_group+0x14/0x20
[ 6496.323127] [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Signed-off-by: Mike Galbraith <[email protected]>
---
drivers/connector/cn_proc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -32,6 +32,7 @@
#include <linux/pid_namespace.h>

#include <linux/cn_proc.h>
+#include <linux/locallock.h>

/*
* Size of a cn_msg followed by a proc_event structure. Since the
@@ -54,10 +55,12 @@ static struct cb_id cn_proc_event_id = {

/* proc_event_counts is used as the sequence number of the netlink message */
static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
+static DEFINE_LOCAL_IRQ_LOCK(send_msg_lock);

static inline void send_msg(struct cn_msg *msg)
{
- preempt_disable();
+ /* RT ordering protection, maps to preempt_disable() for !RT */
+ local_lock(send_msg_lock);

msg->seq = __this_cpu_inc_return(proc_event_counts) - 1;
((struct proc_event *)msg->data)->cpu = smp_processor_id();
@@ -70,7 +73,7 @@ static inline void send_msg(struct cn_ms
*/
cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);

- preempt_enable();
+ local_unlock(send_msg_lock);
}

void proc_fork_connector(struct task_struct *task)

2016-10-16 03:14:38

by Mike Galbraith

[permalink] [raw]
Subject: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()


In v4.7, the driver switched to percpu compression streams, disabling
preemption (get/put_cpu_ptr()). Use get/put_cpu_light() instead.

Signed-off-by: Mike Galbraith <[email protected]>
---
drivers/block/zram/zcomp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -118,12 +118,12 @@ ssize_t zcomp_available_show(const char

struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
- return *get_cpu_ptr(comp->stream);
+ return *per_cpu_ptr(comp->stream, get_cpu_light());
}

void zcomp_stream_put(struct zcomp *comp)
{
- put_cpu_ptr(comp->stream);
+ put_cpu_light();
}

int zcomp_compress(struct zcomp_strm *zstrm,

2016-10-16 03:18:17

by Mike Galbraith

[permalink] [raw]
Subject: [patch ]mm/zs_malloc: Fix bit spinlock replacement


Do not alter HANDLE_SIZE, memory corruption ensues. The handle is
a pointer, allocate space for the struct it points to and align it
ZS_ALIGN. Also, when accessing the struct, mask HANDLE_PIN_BIT.

Signed-off-by: Mike Galbraith <[email protected]>
---
mm/zsmalloc.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -71,6 +71,8 @@
#define ZS_MAX_ZSPAGE_ORDER 2
#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)

+#define ZS_HANDLE_SIZE (sizeof(unsigned long))
+
#ifdef CONFIG_PREEMPT_RT_BASE

struct zsmalloc_handle {
@@ -78,11 +80,11 @@ struct zsmalloc_handle {
struct mutex lock;
};

-#define ZS_HANDLE_SIZE (sizeof(struct zsmalloc_handle))
+#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle))

#else

-#define ZS_HANDLE_SIZE (sizeof(unsigned long))
+#define ZS_HANDLE_ALLOC_SIZE ZS_HANDLE_SIZE
#endif

/*
@@ -339,8 +341,9 @@ static void SetZsPageMovable(struct zs_p

static int create_cache(struct zs_pool *pool)
{
- pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
- 0, 0, NULL);
+ pool->handle_cachep = kmem_cache_create("zs_handle",
+ ZS_HANDLE_ALLOC_SIZE,
+ ZS_ALIGN, 0, NULL);
if (!pool->handle_cachep)
return 1;

@@ -380,7 +383,7 @@ static unsigned long cache_alloc_handle(
#ifdef CONFIG_PREEMPT_RT_BASE
static struct zsmalloc_handle *zs_get_pure_handle(unsigned long handle)
{
- return (void *)(handle &~((1 << OBJ_TAG_BITS) - 1));
+ return (void *)(handle & ~BIT(HANDLE_PIN_BIT));
}
#endif


Subject: Re: [patch] ftrace: Fix latency trace header alignment

On 2016-10-16 05:08:30 [+0200], Mike Galbraith wrote:
> Line up helper arrows to the right column.

thanks. And while at it I fixed the function tracer header.

> Signed-off-by: Mike Galbraith <[email protected]>

Sebastian

Subject: Re: [patch] drivers,connector: Protect send_msg() with a local lock for RT

On 2016-10-16 05:11:54 [+0200], Mike Galbraith wrote:
>

applied with smaller changes. This is v4.8-RT only.

Sebastian

Subject: Re: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On 2016-10-16 05:14:22 [+0200], Mike Galbraith wrote:
>
> In v4.7, the driver switched to percpu compression streams, disabling
> preemption (get/put_cpu_ptr()). Use get/put_cpu_light() instead.

I am not convinced that this will work. Nothing prevents
zram_bvec_write() to be reentrant on the same CPU what I can tell from
browsing over the code and since it uses zstrm->buffer for compression
it can go wrong. Also I don't know if crypto's tfm handler can be used
in parallel for any ops (it usually does not work for crypto).

I suggest a local lock or a good reason why the this patch works.

Sebastian

Subject: Re: [patch ]mm/zs_malloc: Fix bit spinlock replacement

On 2016-10-16 05:18:03 [+0200], Mike Galbraith wrote:
>
> Do not alter HANDLE_SIZE, memory corruption ensues. The handle is
> a pointer, allocate space for the struct it points to and align it
> ZS_ALIGN. Also, when accessing the struct, mask HANDLE_PIN_BIT.

So this is to merged / folded into "mm/zsmalloc: Use get/put_cpu_light
in zs_map_object()/zs_unmap_object()" which I re-did for v4.8?
How was this tested?
I have:
CONFIG_FRONTSWAP=y
# CONFIG_CMA is not set
CONFIG_ZSWAP=y
CONFIG_ZPOOL=y
CONFIG_ZBUD=m
CONFIG_Z3FOLD=m
CONFIG_ZSMALLOC=m
# CONFIG_PGTABLE_MAPPING is not set
CONFIG_ZSMALLOC_STAT=y

and

# cat /sys/module/zswap/parameters/enabled
Y
cat /sys/module/zswap/parameters/zpool
zbud
# cat /sys/module/zswap/parameters/compressor
lzo
# cat /sys/module/zswap/parameters/max_pool_percent
20

and I do have 1GiB of swap on /dev/vdc. While I get swap to be used, I
see no firework. Is there something wrong with my setup? I would assume
so due to the lack of the fireworks on my sideā€¦

> Signed-off-by: Mike Galbraith <[email protected]>

Sebastian

2016-10-17 16:12:42

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch ]mm/zs_malloc: Fix bit spinlock replacement

On Mon, 2016-10-17 at 17:15 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-16 05:18:03 [+0200], Mike Galbraith wrote:
> >
> > Do not alter HANDLE_SIZE, memory corruption ensues. The handle is
> > a pointer, allocate space for the struct it points to and align it
> > ZS_ALIGN. Also, when accessing the struct, mask HANDLE_PIN_BIT.
>
> So this is to merged / folded into "mm/zsmalloc: Use get/put_cpu_light
> in zs_map_object()/zs_unmap_object()" which I re-did for v4.8?

Yeah.

> How was this tested?

Latest LTP. You need latest, else it'll abort early.

> I have:
> CONFIG_FRONTSWAP=y
> # CONFIG_CMA is not set
> CONFIG_ZSWAP=y
> CONFIG_ZPOOL=y
> CONFIG_ZBUD=m
> CONFIG_Z3FOLD=m
> CONFIG_ZSMALLOC=m
> # CONFIG_PGTABLE_MAPPING is not set
> CONFIG_ZSMALLOC_STAT=y
>
> and
>
> # cat /sys/module/zswap/parameters/enabled
> Y
> cat /sys/module/zswap/parameters/zpool
> zbud
> # cat /sys/module/zswap/parameters/compressor
> lzo
> # cat /sys/module/zswap/parameters/max_pool_percent
> 20
>
> and I do have 1GiB of swap on /dev/vdc. While I get swap to be used, I
> see no firework. Is there something wrong with my setup? I would assume
> so due to the lack of the fireworks on my side?

Run the ltp testcase, and you'll meet the below every time. It'll
write 23 time, then explode.

[ 117.527727] zram: Added device: zram0
[ 132.913046] SFW2-INext-DROP-DEFLT IN=br0 OUT= MAC= SRC=fe80:0000:0000:0000:d63d:7eff:fefc:4f09 DST=ff02:0000:0000:0000:0000:0000:0000:00fb LEN=138 TC=0 HOPLIMIT=255 FLOWLBL=240223 PROTO=UDP SPT=5353 DPT=5353 LEN=98
[ 145.205893] loop: module loaded
[ 145.388652] zram0: detected capacity change from 0 to 536870912
[ 146.096042] BUG: unable to handle kernel paging request at ffff880389fa0000
[ 146.096045] IP: [<ffffffff813aa516>] memcpy_erms+0x6/0x10
[ 146.096046] PGD 2ded067 PUD 3f8f52063 PMD 38befc063 PTE 8000000389fa0161
[ 146.096048] Oops: 0003 [#1] PREEMPT SMP
[ 146.096050] Dumping ftrace buffer:
[ 146.096053] (ftrace buffer empty)
[ 146.096064] Modules linked in: loop(E) zram(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) intel_rapl(E) nls_cp437(E) intel_powerclamp(E) coretemp(E) vfat(E) fat(E) kvm_intel(E) kvm(E) pl2303(E) usbserial(E) dm_mod(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E)
[ 146.096077] snd_hda_intel(E) snd_hda_codec(E) irqbypass(E) sr_mod(E) cdrom(E) joydev(E) iTCO_wdt(E) crct10dif_pclmul(E) iTCO_vendor_support(E) crc32_pclmul(E) lpc_ich(E) mfd_core(E) ghash_clmulni_intel(E) aesni_intel(E) snd_hda_core(E) aes_x86_64(E) lrw(E) mei_me(E) mei(E) i2c_i801(E) gf128mul(E) i2c_smbus(E) pcspkr(E) shpchp(E) serio_raw(E) intel_smartconnect(E) tpm_infineon(E) battery(E) snd_hwdep(E) glue_helper(E) ablk_helper(E) snd_pcm(E) snd_timer(E) thermal(E) snd(E) nfsd(E) cryptd(E) fan(E) soundcore(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) efivarfs(E) hid_logitech_hidpp(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_logitech_dj(E) sd_mod(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) crc32c_intel(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
[ 146.096081] sysimgblt(E) ahci(E) ehci_pci(E) fb_sys_fops(E) libahci(E) xhci_pci(E) r8169(E) ehci_hcd(E) mii(E) ttm(E) xhci_hcd(E) libata(E) drm(E) usbcore(E) usb_common(E) fjes(E) video(E) button(E) sg(E) scsi_mod(E) autofs4(E)
[ 146.096083] CPU: 1 PID: 4168 Comm: zram01 Tainted: G E 4.8.1-rt1-virgin_debug #6
[ 146.096083] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 146.096084] task: ffff88038e763200 task.stack: ffff8803f7e4c000
[ 146.096085] RIP: 0010:[<ffffffff813aa516>] [<ffffffff813aa516>] memcpy_erms+0x6/0x10
[ 146.096085] RSP: 0018:ffff8803f7e4f820 EFLAGS: 00010286
[ 146.096086] RAX: ffff880386d1a050 RBX: ffff880377d42b80 RCX: fffffffffcd7a000
[ 146.096086] RDX: ffffffffffffffb0 RSI: ffff880400551030 RDI: ffff880389fa0000
[ 146.096086] RBP: ffff8803f7e4f870 R08: ffff88038e763200 R09: 0000000000000000
[ 146.096087] R10: 0000000000000004 R11: 0000000000000001 R12: ffff880375767000
[ 146.096087] R13: ffffea000df02d00 R14: 0000000000000080 R15: ffffffffffffffb0
[ 146.096088] FS: 00007f8313fd4700(0000) GS:ffff88041ec40000(0000) knlGS:0000000000000000
[ 146.096088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 146.096089] CR2: ffff880389fa0000 CR3: 000000037c627000 CR4: 00000000001406e0
[ 146.096089] Stack:
[ 146.096090] ffffffff8124bb53 00000fd077d42b80 ffff88038e763200 000000000e1b4640
[ 146.096091] ffff8803fd2cb080 ffff8803d32c6400 0000000000000000 ffff880377d42b80
[ 146.096092] ffff88038e763200 ffff8803f7e4f940 ffff8803f7e4f8f8 ffffffffa0a23571
[ 146.096092] Call Trace:
[ 146.096095] [<ffffffff8124bb53>] ? zs_unmap_object+0x153/0x2a0
[ 146.096098] [<ffffffffa0a23571>] zram_bvec_rw+0x3d1/0x850 [zram]
[ 146.096100] [<ffffffffa0a23c9d>] zram_make_request+0x19d/0x3b6 [zram]
[ 146.096101] [<ffffffff81366c18>] ? blk_queue_enter+0x38/0x2c0
[ 146.096102] [<ffffffff81366fae>] generic_make_request+0x10e/0x2e0
[ 146.096103] [<ffffffff813671ed>] submit_bio+0x6d/0x150
[ 146.096105] [<ffffffff8135d8e8>] ? bio_alloc_bioset+0x168/0x2a0
[ 146.096107] [<ffffffff8129508c>] submit_bh_wbc+0x15c/0x1a0
[ 146.096109] [<ffffffff812951fc>] __block_write_full_page+0x12c/0x3b0
[ 146.096110] [<ffffffff81297a90>] ? I_BDEV+0x20/0x20
[ 146.096111] [<ffffffff81297a90>] ? I_BDEV+0x20/0x20
[ 146.096112] [<ffffffff8129569f>] block_write_full_page+0xff/0x130
[ 146.096113] [<ffffffff812984c8>] blkdev_writepage+0x18/0x20
[ 146.096116] [<ffffffff811cea26>] __writepage+0x16/0x50
[ 146.096117] [<ffffffff811d055f>] write_cache_pages+0x2af/0x690
[ 146.096118] [<ffffffff811c8bc3>] ? free_pcppages_bulk+0x33/0x560
[ 146.096119] [<ffffffff811cea10>] ? compound_head+0x20/0x20
[ 146.096121] [<ffffffff811d0986>] generic_writepages+0x46/0x60
[ 146.096122] [<ffffffff8129847f>] blkdev_writepages+0x2f/0x40
[ 146.096123] [<ffffffff811d2541>] do_writepages+0x21/0x40
[ 146.096124] [<ffffffff811c374a>] __filemap_fdatawrite_range+0xaa/0xf0
[ 146.096125] [<ffffffff811c3800>] filemap_write_and_wait+0x40/0x80
[ 146.096126] [<ffffffff8129904f>] __sync_blockdev+0x1f/0x40
[ 146.096126] [<ffffffff812993a8>] __blkdev_put+0x78/0x3a0
[ 146.096127] [<ffffffff8129971e>] blkdev_put+0x4e/0x150
[ 146.096128] [<ffffffff81299848>] blkdev_close+0x28/0x30
[ 146.096130] [<ffffffff8125610b>] __fput+0xfb/0x230
[ 146.096131] [<ffffffff8125627e>] ____fput+0xe/0x10
[ 146.096132] [<ffffffff8109f393>] task_work_run+0x83/0xc0
[ 146.096134] [<ffffffff81072672>] exit_to_usermode_loop+0xb4/0xee
[ 146.096135] [<ffffffff81002afb>] syscall_return_slowpath+0xbb/0x130
[ 146.096137] [<ffffffff816de118>] entry_SYSCALL_64_fastpath+0xbb/0xbd
[ 146.096146] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[ 146.096147] RIP [<ffffffff813aa516>] memcpy_erms+0x6/0x10
[ 146.096147] RSP <ffff8803f7e4f820>
[ 146.096148] CR2: ffff880389fa0000

2016-10-17 16:19:14

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On Mon, 2016-10-17 at 16:24 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-16 05:14:22 [+0200], Mike Galbraith wrote:
> >
> > In v4.7, the driver switched to percpu compression streams, disabling
> > preemption (get/put_cpu_ptr()). Use get/put_cpu_light() instead.
>
> I am not convinced that this will work. Nothing prevents
> zram_bvec_write() to be reentrant on the same CPU what I can tell from
> browsing over the code and since it uses zstrm->buffer for compression
> it can go wrong. Also I don't know if crypto's tfm handler can be used
> in parallel for any ops (it usually does not work for crypto).
>
> I suggest a local lock or a good reason why the this patch works.

I used a local lock first, but lockdep was unhappy with it. Ok, back
to the drawing board. Seems to work, but...

-Mike

Subject: Re: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On 2016-10-17 18:19:00 [+0200], Mike Galbraith wrote:
> I used a local lock first, but lockdep was unhappy with it. Ok, back
> to the drawing board. Seems to work, but...

locallock can be taken recursively so unless preemption was already
disabled, lockdep shouldn't complain. But then from the small context
it should not be taken recursively.

> -Mike

Sebastian

2016-10-17 17:18:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On Mon, 2016-10-17 at 18:29 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-17 18:19:00 [+0200], Mike Galbraith wrote:
> > I used a local lock first, but lockdep was unhappy with it. Ok,
> > back
> > to the drawing board. Seems to work, but...
>
> locallock can be taken recursively so unless preemption was already
> disabled, lockdep shouldn't complain. But then from the small context
> it should not be taken recursively.

FWIW here's the lockdep gripe.

BTW, 4.8 either needs the btrfs deadlock fix (0ccd05285e7f) or the LTP
testcase has to be hacked to not test btrfs. It also fails the first
time it's run in 4.8/4.8-rt, doesn't do that in master/tip-rt.

[ 130.090247] zram: Added device: zram0
[ 130.163407] zram0: detected capacity change from 0 to 536870912
[ 131.760327] zram: 4188 (zram01) Attribute compr_data_size (and others) will be removed. See zram documentation.

[ 131.760923] ======================================================
[ 131.760923] [ INFO: possible circular locking dependency detected ]
[ 131.760924] 4.8.2-rt1-virgin_debug #20 Tainted: G E
[ 131.760924] -------------------------------------------------------
[ 131.760924] zram01/4188 is trying to acquire lock:
[ 131.760928] ((null)){+.+...}, at: [<ffffffffa0a28384>] zcomp_stream_get+0x44/0xd0 [zram]
[ 131.760929] but task is already holding lock:
[ 131.760932] (&zspage->lock){+.+...}, at: [<ffffffff8124b7ab>] zs_map_object+0x8b/0x2e0
[ 131.760932] which lock already depends on the new lock.
[ 131.760932] the existing dependency chain (in reverse order) is:
[ 131.760933] -> #2 (&zspage->lock){+.+...}:
[ 131.760936] [<ffffffff810d9f5d>] lock_acquire+0xbd/0x260
[ 131.760939] [<ffffffff816dee57>] rt_read_lock+0x47/0x60
[ 131.760940] [<ffffffff8124b7ab>] zs_map_object+0x8b/0x2e0
[ 131.760941] [<ffffffffa0a2a523>] zram_bvec_rw+0x383/0x850 [zram]
[ 131.760942] [<ffffffffa0a2ac9d>] zram_make_request+0x19d/0x3b6 [zram]
[ 131.760944] [<ffffffff8136707e>] generic_make_request+0x10e/0x2e0
[ 131.760944] [<ffffffff813672bd>] submit_bio+0x6d/0x150
[ 131.760947] [<ffffffff812950bc>] submit_bh_wbc+0x15c/0x1a0
[ 131.760948] [<ffffffff8129522c>] __block_write_full_page+0x12c/0x3b0
[ 131.760949] [<ffffffff812956cf>] block_write_full_page+0xff/0x130
[ 131.760951] [<ffffffff812984f8>] blkdev_writepage+0x18/0x20
[ 131.760953] [<ffffffff811cea66>] __writepage+0x16/0x50
[ 131.760954] [<ffffffff811d059f>] write_cache_pages+0x2af/0x690
[ 131.760955] [<ffffffff811d09c6>] generic_writepages+0x46/0x60
[ 131.760957] [<ffffffff812984af>] blkdev_writepages+0x2f/0x40
[ 131.760958] [<ffffffff811d2581>] do_writepages+0x21/0x40
[ 131.760959] [<ffffffff811c378a>] __filemap_fdatawrite_range+0xaa/0xf0
[ 131.760960] [<ffffffff811c3840>] filemap_write_and_wait+0x40/0x80
[ 131.760961] [<ffffffff8129907f>] __sync_blockdev+0x1f/0x40
[ 131.760961] [<ffffffff812993d8>] __blkdev_put+0x78/0x3a0
[ 131.760962] [<ffffffff8129974e>] blkdev_put+0x4e/0x150
[ 131.760963] [<ffffffff81299878>] blkdev_close+0x28/0x30
[ 131.760964] [<ffffffff8125613b>] __fput+0xfb/0x230
[ 131.760965] [<ffffffff812562ae>] ____fput+0xe/0x10
[ 131.760967] [<ffffffff8109f393>] task_work_run+0x83/0xc0
[ 131.760968] [<ffffffff81072672>] exit_to_usermode_loop+0xb4/0xee
[ 131.760970] [<ffffffff81002afb>] syscall_return_slowpath+0xbb/0x130
[ 131.760971] [<ffffffff816df118>] entry_SYSCALL_64_fastpath+0xbb/0xbd
[ 131.760971] -> #1 (&zh->lock){+.+...}:
[ 131.760973] [<ffffffff810d9f5d>] lock_acquire+0xbd/0x260
[ 131.760974] [<ffffffff816deac1>] _mutex_lock+0x31/0x40
[ 131.760975] [<ffffffff8124b768>] zs_map_object+0x48/0x2e0
[ 131.760976] [<ffffffffa0a2a523>] zram_bvec_rw+0x383/0x850 [zram]
[ 131.760977] [<ffffffffa0a2ac9d>] zram_make_request+0x19d/0x3b6 [zram]
[ 131.760978] [<ffffffff8136707e>] generic_make_request+0x10e/0x2e0
[ 131.760978] [<ffffffff813672bd>] submit_bio+0x6d/0x150
[ 131.760979] [<ffffffff812950bc>] submit_bh_wbc+0x15c/0x1a0
[ 131.760980] [<ffffffff8129522c>] __block_write_full_page+0x12c/0x3b0
[ 131.760982] [<ffffffff812956cf>] block_write_full_page+0xff/0x130
[ 131.760983] [<ffffffff812984f8>] blkdev_writepage+0x18/0x20
[ 131.760984] [<ffffffff811cea66>] __writepage+0x16/0x50
[ 131.760985] [<ffffffff811d059f>] write_cache_pages+0x2af/0x690
[ 131.760986] [<ffffffff811d09c6>] generic_writepages+0x46/0x60
[ 131.760987] [<ffffffff812984af>] blkdev_writepages+0x2f/0x40
[ 131.760988] [<ffffffff811d2581>] do_writepages+0x21/0x40
[ 131.760989] [<ffffffff811c378a>] __filemap_fdatawrite_range+0xaa/0xf0
[ 131.760990] [<ffffffff811c3840>] filemap_write_and_wait+0x40/0x80
[ 131.760990] [<ffffffff8129907f>] __sync_blockdev+0x1f/0x40
[ 131.760991] [<ffffffff812993d8>] __blkdev_put+0x78/0x3a0
[ 131.760992] [<ffffffff8129974e>] blkdev_put+0x4e/0x150
[ 131.760992] [<ffffffff81299878>] blkdev_close+0x28/0x30
[ 131.760993] [<ffffffff8125613b>] __fput+0xfb/0x230
[ 131.760994] [<ffffffff812562ae>] ____fput+0xe/0x10
[ 131.760995] [<ffffffff8109f393>] task_work_run+0x83/0xc0
[ 131.760996] [<ffffffff81072672>] exit_to_usermode_loop+0xb4/0xee
[ 131.760996] [<ffffffff81002afb>] syscall_return_slowpath+0xbb/0x130
[ 131.760997] [<ffffffff816df118>] entry_SYSCALL_64_fastpath+0xbb/0xbd
[ 131.760998] -> #0 ((null)){+.+...}:
[ 131.760999] [<ffffffff810d9b1c>] __lock_acquire+0x162c/0x1660
[ 131.761000] [<ffffffff810d9f5d>] lock_acquire+0xbd/0x260
[ 131.761001] [<ffffffff816de92a>] rt_spin_lock__no_mg+0x5a/0x70
[ 131.761002] [<ffffffffa0a28384>] zcomp_stream_get+0x44/0xd0 [zram]
[ 131.761003] [<ffffffffa0a29204>] zram_decompress_page.isra.17+0xc4/0x150 [zram]
[ 131.761004] [<ffffffffa0a2a694>] zram_bvec_rw+0x4f4/0x850 [zram]
[ 131.761005] [<ffffffffa0a2aa9c>] zram_rw_page+0xac/0x110 [zram]
[ 131.761007] [<ffffffff81297d24>] bdev_read_page+0x84/0xb0
[ 131.761007] [<ffffffff8129eb2f>] do_mpage_readpage+0x53f/0x780
[ 131.761008] [<ffffffff8129eeb4>] mpage_readpages+0x144/0x1b0
[ 131.761009] [<ffffffff8129847d>] blkdev_readpages+0x1d/0x20
[ 131.761011] [<ffffffff811d3046>] __do_page_cache_readahead+0x286/0x360
[ 131.761011] [<ffffffff811c4d1a>] filemap_fault+0x44a/0x6a0
[ 131.761013] [<ffffffff811fb033>] __do_fault+0x73/0xf0
[ 131.761014] [<ffffffff81200b3c>] handle_mm_fault+0xc7c/0x10a0
[ 131.761017] [<ffffffff8105c6ef>] __do_page_fault+0x1bf/0x5a0
[ 131.761018] [<ffffffff8105cb00>] do_page_fault+0x30/0x80
[ 131.761019] [<ffffffff816e0338>] page_fault+0x28/0x30
[ 131.761019] other info that might help us debug this:
[ 131.761020] Chain exists of: (null) --> &zh->lock --> &zspage->lock
[ 131.761020] Possible unsafe locking scenario:
[ 131.761020] CPU0 CPU1
[ 131.761020] ---- ----
[ 131.761021] lock(&zspage->lock);
[ 131.761021] lock(&zh->lock);
[ 131.761022] lock(&zspage->lock);
[ 131.761022] lock((null));
[ 131.761022] *** DEADLOCK ***
[ 131.761023] 4 locks held by zram01/4188:
[ 131.761024] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8105c65a>] __do_page_fault+0x12a/0x5a0
[ 131.761026] #1: (lock#5){+.+...}, at: [<ffffffffa0a2917c>] zram_decompress_page.isra.17+0x3c/0x150 [zram]
[ 131.761027] #2: (&zh->lock){+.+...}, at: [<ffffffff8124b768>] zs_map_object+0x48/0x2e0
[ 131.761029] #3: (&zspage->lock){+.+...}, at: [<ffffffff8124b7ab>] zs_map_object+0x8b/0x2e0
[ 131.761029] stack backtrace:
[ 131.761030] CPU: 2 PID: 4188 Comm: zram01 Tainted: G E 4.8.2-rt1-virgin_debug #20
[ 131.761030] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 131.761032] 0000000000000000 ffff88038fbbb668 ffffffff8139b9fd ffffffff826ffa90
[ 131.761033] ffffffff826ffdf0 ffff88038fbbb6a8 ffffffff811be11f ffff88038fbbb6e0
[ 131.761034] ffff880392212300 0000000000000003 0000000000000004 ffff880392211900
[ 131.761034] Call Trace:
[ 131.761035] [<ffffffff8139b9fd>] dump_stack+0x85/0xc8
[ 131.761037] [<ffffffff811be11f>] print_circular_bug+0x1f9/0x207
[ 131.761038] [<ffffffff810d9b1c>] __lock_acquire+0x162c/0x1660
[ 131.761039] [<ffffffff810d9f5d>] lock_acquire+0xbd/0x260
[ 131.761041] [<ffffffffa0a28384>] ? zcomp_stream_get+0x44/0xd0 [zram]
[ 131.761042] [<ffffffff816de92a>] rt_spin_lock__no_mg+0x5a/0x70
[ 131.761043] [<ffffffffa0a28384>] ? zcomp_stream_get+0x44/0xd0 [zram]
[ 131.761044] [<ffffffffa0a28384>] zcomp_stream_get+0x44/0xd0 [zram]
[ 131.761045] [<ffffffffa0a29204>] zram_decompress_page.isra.17+0xc4/0x150 [zram]
[ 131.761046] [<ffffffffa0a2a694>] zram_bvec_rw+0x4f4/0x850 [zram]
[ 131.761048] [<ffffffffa0a2aa9c>] zram_rw_page+0xac/0x110 [zram]
[ 131.761049] [<ffffffff81297d24>] bdev_read_page+0x84/0xb0
[ 131.761050] [<ffffffff8129eb2f>] do_mpage_readpage+0x53f/0x780
[ 131.761051] [<ffffffff811d607e>] ? lru_cache_add+0xe/0x10
[ 131.761052] [<ffffffff8129eeb4>] mpage_readpages+0x144/0x1b0
[ 131.761053] [<ffffffff81297ac0>] ? I_BDEV+0x20/0x20
[ 131.761054] [<ffffffff81297ac0>] ? I_BDEV+0x20/0x20
[ 131.761055] [<ffffffff813bc1f7>] ? debug_smp_processor_id+0x17/0x20
[ 131.761056] [<ffffffff811cbe1a>] ? get_page_from_freelist+0x39a/0xd90
[ 131.761057] [<ffffffff810d67c9>] ? __lock_is_held+0x49/0x70
[ 131.761058] [<ffffffff810d67c9>] ? __lock_is_held+0x49/0x70
[ 131.761060] [<ffffffff810f7b73>] ? rcu_read_lock_sched_held+0x93/0xa0
[ 131.761061] [<ffffffff811cce62>] ? __alloc_pages_nodemask+0x392/0x480
[ 131.761062] [<ffffffff81223347>] ? alloc_pages_current+0x97/0x1b0
[ 131.761063] [<ffffffff811c0c8f>] ? __page_cache_alloc+0x12f/0x160
[ 131.761065] [<ffffffff8129847d>] blkdev_readpages+0x1d/0x20
[ 131.761066] [<ffffffff811d3046>] __do_page_cache_readahead+0x286/0x360
[ 131.761067] [<ffffffff811d2f30>] ? __do_page_cache_readahead+0x170/0x360
[ 131.761068] [<ffffffff811c4d1a>] filemap_fault+0x44a/0x6a0
[ 131.761069] [<ffffffff813bc1f7>] ? debug_smp_processor_id+0x17/0x20
[ 131.761070] [<ffffffff811fb033>] __do_fault+0x73/0xf0
[ 131.761071] [<ffffffff81200b3c>] handle_mm_fault+0xc7c/0x10a0
[ 131.761072] [<ffffffff810d67c9>] ? __lock_is_held+0x49/0x70
[ 131.761073] [<ffffffff8105c6ef>] __do_page_fault+0x1bf/0x5a0
[ 131.761074] [<ffffffff8105cb00>] do_page_fault+0x30/0x80
[ 131.761075] [<ffffffff816e0338>] page_fault+0x28/0x30
[ 132.696315] zram0: detected capacity change from 536870912 to 0
[ 132.702019] zram: Removed device: zram0
[ 132.801476] zram: Added device: zram0
[ 132.802011] zram: Added device: zram1
[ 132.803332] zram: Added device: zram2
[ 132.804999] zram: Added device: zram3
[ 132.830229] zram0: detected capacity change from 0 to 26214400
[ 132.831491] zram1: detected capacity change from 0 to 26214400
[ 132.832725] zram2: detected capacity change from 0 to 26214400
[ 132.834931] zram3: detected capacity change from 0 to 41943040
[ 133.003077] raid6: sse2x1 gen() 12140 MB/s
[ 133.020078] raid6: sse2x1 xor() 9453 MB/s
[ 133.037077] raid6: sse2x2 gen() 15566 MB/s
[ 133.054079] raid6: sse2x2 xor() 10304 MB/s
[ 133.071084] raid6: sse2x4 gen() 17945 MB/s
[ 133.088084] raid6: sse2x4 xor() 12447 MB/s
[ 133.105087] raid6: avx2x1 gen() 23656 MB/s
[ 133.122089] raid6: avx2x2 gen() 28191 MB/s
[ 133.139090] raid6: avx2x4 gen() 32050 MB/s
[ 133.139091] raid6: using algorithm avx2x4 gen() 32050 MB/s
[ 133.139092] raid6: using avx2x2 recovery algorithm
[ 133.153651] xor: automatically using best checksumming function:
[ 133.163098] avx : 36704.000 MB/sec
[ 133.372902] Btrfs loaded, crc32c=crc32c-intel, assert=on
[ 133.373255] BTRFS: device fsid e04952e8-f9fa-4145-8bd9-43b23dfd995f devid 1 transid 3 /dev/zram3
[ 133.396333] EXT4-fs (zram0): mounting ext3 file system using the ext4 subsystem
[ 133.396684] EXT4-fs (zram0): mounted filesystem with ordered data mode. Opts: (null)
[ 133.402146] EXT4-fs (zram1): mounted filesystem with ordered data mode. Opts: (null)
[ 133.716775] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[ 133.718729] XFS (zram2): Mounting V4 Filesystem
[ 133.720285] XFS (zram2): Ending clean mount
[ 133.725864] BTRFS info (device zram3): disk space caching is enabled
[ 133.725869] BTRFS info (device zram3): has skinny extents
[ 133.726570] BTRFS info (device zram3): detected SSD devices, enabling SSD mode
[ 133.726633] BTRFS info (device zram3): creating UUID tree
[ 151.080729] SFW2-INext-DROP-DEFLT IN=br0 OUT= MAC= SRC=fe80:0000:0000:0000:d63d:7eff:fefc:4f09 DST=ff02:0000:0000:0000:0000:0000:0000:00fb LEN=138 TC=0 HOPLIMIT=255 FLOWLBL=855088 PROTO=UDP SPT=5353 DPT=5353 LEN=98
[ 181.364952] XFS (zram2): Unmounting Filesystem
[ 181.408367] zram0: detected capacity change from 26214400 to 0
[ 181.408578] zram1: detected capacity change from 26214400 to 0
[ 181.408969] zram2: detected capacity change from 26214400 to 0
[ 181.409262] zram3: detected capacity change from 41943040 to 0
[ 181.409978] zram: Removed device: zram0
[ 181.419062] zram: Removed device: zram1
[ 181.433933] zram: Removed device: zram2
[ 181.451062] zram: Removed device: zram3
[ 181.510512] zram: Added device: zram0
[ 185.667788] zram0: detected capacity change from 0 to 107374182400
[ 185.692536] Adding 104857596k swap on /dev/zram0. Priority:-1 extents:1 across:104857596k SSFS
[ 186.069786] zram0: detected capacity change from 107374182400 to 0
[ 186.070654] zram: Removed device: zram0
[ 186.131458] zram: Added device: zram0
[ 186.155595] zram0: detected capacity change from 0 to 536870912
[ 187.064553] zram: 18984 (zram03) Attribute compr_data_size (and others) will be removed. See zram documentation.
[ 188.006358] zram0: detected capacity change from 536870912 to 0
[ 188.008209] zram: Removed device: zram0

2016-10-17 17:47:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On Mon, 2016-10-17 at 19:18 +0200, Mike Galbraith wrote:

> BTW, 4.8 either needs the btrfs deadlock fix (0ccd05285e7f) or the LTP
> testcase has to be hacked to not test btrfs. It also fails the first
> time it's run in 4.8/4.8-rt, doesn't do that in master/tip-rt.

Belay that, the first run failure is there in master/tip too, it's just
intermittent. From there on, it's solid. Hohum.

-Mike

2016-10-19 15:50:58

by Mike Galbraith

[permalink] [raw]
Subject: [patch v2 ] mm/zs_malloc: Fix bit spinlock replacement


Do not alter HANDLE_SIZE, memory corruption ensues. The handle is
a pointer, allocate space for the struct it points to and align it
ZS_ALIGN. Also, when accessing the struct, mask HANDLE_PIN_BIT.

v2: mutex is only needed for PREEMPT_RT_FULL, with PREEMPT_RT_RTB,
preemption is disabled when we take it...

Signed-off-by: Mike Galbraith <[email protected]>
---
mm/zsmalloc.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -71,18 +71,20 @@
#define ZS_MAX_ZSPAGE_ORDER 2
#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)

-#ifdef CONFIG_PREEMPT_RT_BASE
+#define ZS_HANDLE_SIZE (sizeof(unsigned long))
+
+#ifdef CONFIG_PREEMPT_RT_FULL

struct zsmalloc_handle {
unsigned long addr;
struct mutex lock;
};

-#define ZS_HANDLE_SIZE (sizeof(struct zsmalloc_handle))
+#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle))

#else

-#define ZS_HANDLE_SIZE (sizeof(unsigned long))
+#define ZS_HANDLE_ALLOC_SIZE ZS_HANDLE_SIZE
#endif

/*
@@ -339,8 +341,9 @@ static void SetZsPageMovable(struct zs_p

static int create_cache(struct zs_pool *pool)
{
- pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
- 0, 0, NULL);
+ pool->handle_cachep = kmem_cache_create("zs_handle",
+ ZS_HANDLE_ALLOC_SIZE,
+ ZS_ALIGN, 0, NULL);
if (!pool->handle_cachep)
return 1;

@@ -367,7 +370,7 @@ static unsigned long cache_alloc_handle(

p = kmem_cache_alloc(pool->handle_cachep,
gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
if (p) {
struct zsmalloc_handle *zh = p;

@@ -377,10 +380,10 @@ static unsigned long cache_alloc_handle(
return (unsigned long)p;
}

-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
static struct zsmalloc_handle *zs_get_pure_handle(unsigned long handle)
{
- return (void *)(handle &~((1 << OBJ_TAG_BITS) - 1));
+ return (void *)(handle & ~BIT(HANDLE_PIN_BIT));
}
#endif

@@ -402,7 +405,7 @@ static void cache_free_zspage(struct zs_

static void record_obj(unsigned long handle, unsigned long obj)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

WRITE_ONCE(zh->addr, obj);
@@ -937,7 +940,7 @@ static unsigned long location_to_obj(str

static unsigned long handle_to_obj(unsigned long handle)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

return zh->addr;
@@ -957,7 +960,7 @@ static unsigned long obj_to_head(struct

static inline int testpin_tag(unsigned long handle)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

return mutex_is_locked(&zh->lock);
@@ -968,7 +971,7 @@ static inline int testpin_tag(unsigned l

static inline int trypin_tag(unsigned long handle)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

return mutex_trylock(&zh->lock);
@@ -979,7 +982,7 @@ static inline int trypin_tag(unsigned lo

static void pin_tag(unsigned long handle)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

return mutex_lock(&zh->lock);
@@ -990,7 +993,7 @@ static void pin_tag(unsigned long handle

static void unpin_tag(unsigned long handle)
{
-#ifdef CONFIG_PREEMPT_RT_BASE
+#ifdef CONFIG_PREEMPT_RT_FULL
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);

return mutex_unlock(&zh->lock);

2016-10-19 15:56:38

by Mike Galbraith

[permalink] [raw]
Subject: [patch v2] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On Mon, 2016-10-17 at 16:24 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-16 05:14:22 [+0200], Mike Galbraith wrote:
> >
> > In v4.7, the driver switched to percpu compression streams, disabling
> > preemption (get/put_cpu_ptr()). Use get/put_cpu_light() instead.
>
> I am not convinced that this will work. Nothing prevents
> zram_bvec_write() to be reentrant on the same CPU what I can tell from
> browsing over the code and since it uses zstrm->buffer for compression
> it can go wrong. Also I don't know if crypto's tfm handler can be used
> in parallel for any ops (it usually does not work for crypto).
>
> I suggest a local lock or a good reason why the this patch works.

(taking a break from hotplug squabble to pick on something easier:)

drivers/zram: Don't disable preemption in zcomp_stream_get/put()

In v4.7, the driver switched to percpu compression streams, disabling
preemption vai get/put_cpu_ptr(). Use a local lock instead for RT.
We also have to fix an RT lock order issue in zram_decompress_page()
such that zs_map_object() nests inside of zcomp_stream_put() as it
does in zram_bvec_write().

Signed-off-by: Mike Galbraith <[email protected]>
---
drivers/block/zram/zcomp.c | 7 +++++--
drivers/block/zram/zram_drv.c | 11 +++++------
2 files changed, 10 insertions(+), 8 deletions(-)

--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -15,6 +15,7 @@
#include <linux/sched.h>
#include <linux/cpu.h>
#include <linux/crypto.h>
+#include <linux/locallock.h>

#include "zcomp.h"

@@ -116,14 +117,16 @@ ssize_t zcomp_available_show(const char
return sz;
}

+DEFINE_LOCAL_IRQ_LOCK(zram_stream_lock);
+
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
- return *get_cpu_ptr(comp->stream);
+ return get_locked_var(zram_stream_lock, *comp->stream);
}

void zcomp_stream_put(struct zcomp *comp)
{
- put_cpu_ptr(comp->stream);
+ put_locked_var(zram_stream_lock, *comp->stream);
}

int zcomp_compress(struct zcomp_strm *zstrm,
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -566,6 +566,7 @@ static int zram_decompress_page(struct z
int ret = 0;
unsigned char *cmem;
struct zram_meta *meta = zram->meta;
+ struct zcomp_strm *zstrm;
unsigned long handle;
unsigned int size;

@@ -579,16 +580,14 @@ static int zram_decompress_page(struct z
return 0;
}

+ zstrm = zcomp_stream_get(zram->comp);
cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
- if (size == PAGE_SIZE) {
+ if (size == PAGE_SIZE)
copy_page(mem, cmem);
- } else {
- struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
-
+ else
ret = zcomp_decompress(zstrm, cmem, size, mem);
- zcomp_stream_put(zram->comp);
- }
zs_unmap_object(meta->mem_pool, handle);
+ zcomp_stream_put(zram->comp);
zram_unlock_table(&meta->table[index]);

/* Should NEVER happen. Return bio error if it does. */


Subject: Re: [patch v2] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On 2016-10-19 17:56:30 [+0200], Mike Galbraith wrote:
> In v4.7, the driver switched to percpu compression streams, disabling
> preemption vai get/put_cpu_ptr(). Use a local lock instead for RT.
> We also have to fix an RT lock order issue in zram_decompress_page()
> such that zs_map_object() nests inside of zcomp_stream_put() as it
> does in zram_bvec_write().

good. I almost had it myself. So let me get that one. And your previous
one (the spinlock replacement) looks also reasonable. With this hunk

@@ -313,6 +314,7 @@ struct mapping_area {
#endif
char *vm_addr; /* address of kmap_atomic()'ed pages */
enum zs_mapmode vm_mm; /* mapping mode */
+ spinlock_t ma_lock;
};

#ifdef CONFIG_COMPACTION
@@ -1489,6 +1491,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
off = (class->size * obj_idx) & ~PAGE_MASK;

area = per_cpu_ptr(&zs_map_area, get_cpu_light());
+ spin_lock(&area->ma_lock);
area->vm_mm = mm;
if (off + class->size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
@@ -1542,6 +1545,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)

__zs_unmap_object(area, pages, off, class->size);
}
+ spin_unlock(&area->ma_lock);
put_cpu_light();

migrate_read_unlock(zspage);


I don't see this anymore:

|gcc: internal compiler error: Killed (program cc1)
|Please submit a full bug report,
|with preprocessed source if appropriate.
|See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
|Makefile:134: recipe for target 'servconf.o' failed
|make: *** [servconf.o] Error 4
|make: *** Waiting for unfinished jobs....
|
|
|loginrec.c:1722:1: internal compiler error: Bus error
| }
| ^

only
|gcc: internal compiler error: Killed (program cc1)

which is expected.

> Signed-off-by: Mike Galbraith <[email protected]>

Sebastian

2016-10-20 02:59:12

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch v2] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On Wed, 2016-10-19 at 18:54 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-19 17:56:30 [+0200], Mike Galbraith wrote:
> > In v4.7, the driver switched to percpu compression streams, disabling
> > preemption vai get/put_cpu_ptr(). Use a local lock instead for RT.
> > We also have to fix an RT lock order issue in zram_decompress_page()
> > such that zs_map_object() nests inside of zcomp_stream_put() as it
> > does in zram_bvec_write().
>
> good. I almost had it myself. So let me get that one. And your previous
> one (the spinlock replacement) looks also reasonable. With this hunk
>
> @@ -313,6 +314,7 @@ struct mapping_area {
> #endif
> char *vm_addr; /* address of kmap_atomic()'ed pages */
> enum zs_mapmode vm_mm; /* mapping mode */
> + spinlock_t ma_lock;
> };
>
> #ifdef CONFIG_COMPACTION
> @@ -1489,6 +1491,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> off = (class->size * obj_idx) & ~PAGE_MASK;
>
> area = per_cpu_ptr(&zs_map_area, get_cpu_light());
> + spin_lock(&area->ma_lock);
> area->vm_mm = mm;
> if (off + class->size <= PAGE_SIZE) {
> /* this object is contained entirely within a page */
> @@ -1542,6 +1545,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>
> __zs_unmap_object(area, pages, off, class->size);
> }
> + spin_unlock(&area->ma_lock);
> put_cpu_light();
>
> migrate_read_unlock(zspage);

Ew, yeah, as the other, I thought it was covered, but nope.

-Mike

2016-10-20 09:34:10

by Mike Galbraith

[permalink] [raw]
Subject: [rfc patch] hotplug: Call mmdrop_delayed() in sched_cpu_dying() if PREEMPT_RT_FULL

My 64 core box just passed an hour running Steven's hotplug stress
script along with stockfish and futextests (tip-rt.today w. hotplug
hacks you saw a while back), and seems content to just keep on grinding
away. Without it, box quickly becomes a doorstop.

[ 634.896901] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
[ 634.896902] in_atomic(): 1, irqs_disabled(): 1, pid: 104, name: migration/6
[ 634.896902] no locks held by migration/6/104.
[ 634.896903] irq event stamp: 1208518
[ 634.896907] hardirqs last enabled at (1208517): [<ffffffff816de46c>] _raw_spin_unlock_irqrestore+0x8c/0xa0
[ 634.896910] hardirqs last disabled at (1208518): [<ffffffff81146055>] multi_cpu_stop+0xc5/0x110
[ 634.896912] softirqs last enabled at (0): [<ffffffff81075dd2>] copy_process.part.32+0x672/0x1fc0
[ 634.896913] softirqs last disabled at (0): [< (null)>] (null)
[ 634.896914] Preemption disabled at:[<ffffffff8114629c>] cpu_stopper_thread+0x8c/0x120
[ 634.896914]
[ 634.896915] CPU: 6 PID: 104 Comm: migration/6 Tainted: G E 4.8.2-rt1-rt_debug #23
[ 634.896916] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 634.896918] 0000000000000000 ffff880176fb3c40 ffffffff8139c04d 0000000000000000
[ 634.896919] ffff880176fa8000 ffff880176fb3c68 ffffffff810a8102 ffffffff81c29cc0
[ 634.896919] ffff8803fc825640 ffff8803fc825640 ffff880176fb3c88 ffffffff816de754
[ 634.896920] Call Trace:
[ 634.896923] [<ffffffff8139c04d>] dump_stack+0x85/0xc8
[ 634.896924] [<ffffffff810a8102>] ___might_sleep+0x152/0x250
[ 634.896926] [<ffffffff816de754>] rt_spin_lock+0x24/0x80
[ 634.896928] [<ffffffff810d67f9>] ? __lock_is_held+0x49/0x70
[ 634.896929] [<ffffffff810623ee>] pgd_free+0x1e/0xb0
[ 634.896930] [<ffffffff81074877>] __mmdrop+0x27/0xd0
[ 634.896932] [<ffffffff810b4a0d>] sched_cpu_dying+0x24d/0x2c0
[ 634.896933] [<ffffffff810b47c0>] ? sched_cpu_starting+0x60/0x60
[ 634.896934] [<ffffffff81079864>] cpuhp_invoke_callback+0xd4/0x350
[ 634.896935] [<ffffffff81079e56>] take_cpu_down+0x86/0xd0
[ 634.896936] [<ffffffff81146060>] multi_cpu_stop+0xd0/0x110
[ 634.896937] [<ffffffff81145f90>] ? cpu_stop_queue_work+0x90/0x90
[ 634.896938] [<ffffffff811462a2>] cpu_stopper_thread+0x92/0x120
[ 634.896940] [<ffffffff810a50fe>] smpboot_thread_fn+0x1de/0x360
[ 634.896941] [<ffffffff810a4f20>] ? smpboot_update_cpumask_percpu_thread+0x130/0x130
[ 634.896942] [<ffffffff810a093f>] kthread+0xef/0x110
[ 634.896944] [<ffffffff816df16f>] ret_from_fork+0x1f/0x40
[ 634.896945] [<ffffffff810a0850>] ? kthread_park+0x60/0x60
[ 634.896970] smpboot: CPU 6 is now offline

Signed-off-by: Mike Galbraith <[email protected]>
---
kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7569,6 +7569,9 @@ int sched_cpu_dying(unsigned int cpu)
nohz_balance_exit_idle(cpu);
hrtick_clear(rq);
if (per_cpu(idle_last_mm, cpu)) {
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
+ mmdrop_delayed(per_cpu(idle_last_mm, cpu));
+ else
mmdrop(per_cpu(idle_last_mm, cpu));
per_cpu(idle_last_mm, cpu) = NULL;
}

Subject: Re: [patch v2 ] mm/zs_malloc: Fix bit spinlock replacement

On 2016-10-19 17:50:38 [+0200], Mike Galbraith wrote:
>
> Do not alter HANDLE_SIZE, memory corruption ensues. The handle is
> a pointer, allocate space for the struct it points to and align it
> ZS_ALIGN. Also, when accessing the struct, mask HANDLE_PIN_BIT.
>
> v2: mutex is only needed for PREEMPT_RT_FULL, with PREEMPT_RT_RTB,
> preemption is disabled when we take it...
>
> Signed-off-by: Mike Galbraith <[email protected]>

The folded version:


From: Mike Galbraith <[email protected]>
Date: Tue, 22 Mar 2016 11:16:09 +0100
Subject: [PATCH] mm/zsmalloc: copy with get_cpu_var() and locking

get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.

Signed-off-by: Mike Galbraith <[email protected]>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
mm/zsmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 74 insertions(+), 6 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -53,6 +53,7 @@
#include <linux/mount.h>
#include <linux/migrate.h>
#include <linux/pagemap.h>
+#include <linux/locallock.h>

#define ZSPAGE_MAGIC 0x58

@@ -70,9 +71,22 @@
*/
#define ZS_MAX_ZSPAGE_ORDER 2
#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
#define ZS_HANDLE_SIZE (sizeof(unsigned long))

+#ifdef CONFIG_PREEMPT_RT_FULL
+
+struct zsmalloc_handle {
+ unsigned long addr;
+ struct mutex lock;
+};
+
+#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle))
+
+#else
+
+#define ZS_HANDLE_ALLOC_SIZE (sizeof(unsigned long))
+#endif
+
/*
* Object location (<PFN>, <obj_idx>) is encoded as
* as single (unsigned long) handle value.
@@ -327,7 +341,7 @@ static void SetZsPageMovable(struct zs_p

static int create_cache(struct zs_pool *pool)
{
- pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
+ pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_ALLOC_SIZE,
0, 0, NULL);
if (!pool->handle_cachep)
return 1;
@@ -351,10 +365,27 @@ static void destroy_cache(struct zs_pool

static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
{
- return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
- gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+ void *p;
+
+ p = kmem_cache_alloc(pool->handle_cachep,
+ gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+#ifdef CONFIG_PREEMPT_RT_FULL
+ if (p) {
+ struct zsmalloc_handle *zh = p;
+
+ mutex_init(&zh->lock);
+ }
+#endif
+ return (unsigned long)p;
}

+#ifdef CONFIG_PREEMPT_RT_FULL
+static struct zsmalloc_handle *zs_get_pure_handle(unsigned long handle)
+{
+ return (void *)(handle &~((1 << OBJ_TAG_BITS) - 1));
+}
+#endif
+
static void cache_free_handle(struct zs_pool *pool, unsigned long handle)
{
kmem_cache_free(pool->handle_cachep, (void *)handle);
@@ -373,12 +404,18 @@ static void cache_free_zspage(struct zs_

static void record_obj(unsigned long handle, unsigned long obj)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ WRITE_ONCE(zh->addr, obj);
+#else
/*
* lsb of @obj represents handle lock while other bits
* represent object value the handle is pointing so
* updating shouldn't do store tearing.
*/
WRITE_ONCE(*(unsigned long *)handle, obj);
+#endif
}

/* zpool driver */
@@ -467,6 +504,7 @@ MODULE_ALIAS("zpool-zsmalloc");

/* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
+static DEFINE_LOCAL_IRQ_LOCK(zs_map_area_lock);

static bool is_zspage_isolated(struct zspage *zspage)
{
@@ -902,7 +940,13 @@ static unsigned long location_to_obj(str

static unsigned long handle_to_obj(unsigned long handle)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ return zh->addr;
+#else
return *(unsigned long *)handle;
+#endif
}

static unsigned long obj_to_head(struct page *page, void *obj)
@@ -916,22 +960,46 @@ static unsigned long obj_to_head(struct

static inline int testpin_tag(unsigned long handle)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ return mutex_is_locked(&zh->lock);
+#else
return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle);
+#endif
}

static inline int trypin_tag(unsigned long handle)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ return mutex_trylock(&zh->lock);
+#else
return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
+#endif
}

static void pin_tag(unsigned long handle)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ return mutex_lock(&zh->lock);
+#else
bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
+#endif
}

static void unpin_tag(unsigned long handle)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
+
+ return mutex_unlock(&zh->lock);
+#else
bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle);
+#endif
}

static void reset_page(struct page *page)
@@ -1423,7 +1491,7 @@ void *zs_map_object(struct zs_pool *pool
class = pool->size_class[class_idx];
off = (class->size * obj_idx) & ~PAGE_MASK;

- area = &get_cpu_var(zs_map_area);
+ area = &get_locked_var(zs_map_area_lock, zs_map_area);
area->vm_mm = mm;
if (off + class->size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
@@ -1477,7 +1545,7 @@ void zs_unmap_object(struct zs_pool *poo

__zs_unmap_object(area, pages, off, class->size);
}
- put_cpu_var(zs_map_area);
+ put_locked_var(zs_map_area_lock, zs_map_area);

migrate_read_unlock(zspage);
unpin_tag(handle);

Sebastian

Subject: Re: [patch v2] drivers/zram: Don't disable preemption in zcomp_stream_get/put()

On 2016-10-19 17:56:30 [+0200], Mike Galbraith wrote:
> In v4.7, the driver switched to percpu compression streams, disabling
> preemption vai get/put_cpu_ptr(). Use a local lock instead for RT.

since you could have multiple streams which would be then serialized
with the local lock I went for this:

From: Mike Galbraith <[email protected]>
Date: Thu, 20 Oct 2016 11:15:22 +0200
Subject: [PATCH] drivers/zram: Don't disable preemption in
zcomp_stream_get/put()

In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().

Signed-off-by: Mike Galbraith <[email protected]>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/block/zram/zcomp.c | 12 ++++++++++--
drivers/block/zram/zcomp.h | 1 +
drivers/block/zram/zram_drv.c | 6 +++---
3 files changed, 14 insertions(+), 5 deletions(-)

--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -118,12 +118,19 @@ ssize_t zcomp_available_show(const char

struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
- return *get_cpu_ptr(comp->stream);
+ struct zcomp_strm *zstrm;
+
+ zstrm = *this_cpu_ptr(comp->stream);
+ spin_lock(&zstrm->zcomp_lock);
+ return zstrm;
}

void zcomp_stream_put(struct zcomp *comp)
{
- put_cpu_ptr(comp->stream);
+ struct zcomp_strm *zstrm;
+
+ zstrm = *this_cpu_ptr(comp->stream);
+ spin_unlock(&zstrm->zcomp_lock);
}

int zcomp_compress(struct zcomp_strm *zstrm,
@@ -174,6 +181,7 @@ static int __zcomp_cpu_notifier(struct z
pr_err("Can't allocate a compression stream\n");
return NOTIFY_BAD;
}
+ spin_lock_init(&zstrm->zcomp_lock);
*per_cpu_ptr(comp->stream, cpu) = zstrm;
break;
case CPU_DEAD:
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -14,6 +14,7 @@ struct zcomp_strm {
/* compression/decompression buffer */
void *buffer;
struct crypto_comp *tfm;
+ spinlock_t zcomp_lock;
};

/* dynamic per-device compression frontend */
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -568,6 +568,7 @@ static int zram_decompress_page(struct z
struct zram_meta *meta = zram->meta;
unsigned long handle;
unsigned int size;
+ struct zcomp_strm *zstrm;

zram_lock_table(&meta->table[index]);
handle = meta->table[index].handle;
@@ -579,16 +580,15 @@ static int zram_decompress_page(struct z
return 0;
}

+ zstrm = zcomp_stream_get(zram->comp);
cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
if (size == PAGE_SIZE) {
copy_page(mem, cmem);
} else {
- struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
-
ret = zcomp_decompress(zstrm, cmem, size, mem);
- zcomp_stream_put(zram->comp);
}
zs_unmap_object(meta->mem_pool, handle);
+ zcomp_stream_put(zram->comp);
zram_unlock_table(&meta->table[index]);

/* Should NEVER happen. Return bio error if it does. */

Sebastian

Subject: Re: [rfc patch] hotplug: Call mmdrop_delayed() in sched_cpu_dying() if PREEMPT_RT_FULL

On 2016-10-20 11:34:03 [+0200], Mike Galbraith wrote:
> My 64 core box just passed an hour running Steven's hotplug stress
> script along with stockfish and futextests (tip-rt.today w. hotplug
> hacks you saw a while back), and seems content to just keep on grinding
> away. Without it, box quickly becomes a doorstop.

This is new. It is v4.7-rc1 new. The hotplug rework moved the notified
around and after

e9cd8fa4fcfd ("sched/migration: Move calc_load_migrate() into CPU_DYING")
f2785ddb5367 ("sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()")

we have no more CPU_DEAD, just CPU_DYING which is invoked with
interrupts off. I don't see anything wrong with pushing this over to
RCU.

Sebastian