Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1381332pxf; Fri, 26 Mar 2021 06:56:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzqAqvoRST0PxpwquMK8NNLsasbp+fMNQua04vFIaxDG7WXKVBtjiHBN/TudZ2BDMy7a/ra X-Received: by 2002:a05:6402:34d:: with SMTP id r13mr15347507edw.64.1616766969153; Fri, 26 Mar 2021 06:56:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616766969; cv=none; d=google.com; s=arc-20160816; b=VXggF6fTKauGjacSlHSlin2txP4FsVlVdU7ofDSjgQ1NZd+hBmi+KXtQox/y64RTDN gMkfwu1EkfGbOh0CvxUoXKJ4zmAtoQSmmwRpvFhldNS3B2Z31/CrMrHMummSG5dj3Cg7 qkAd2isJX0sGlUbjKjrP8toPHvDat0+Wb+/snFJNZZedTvEH/xFeBTCLkaFrWKDOOxnR TTam49TD0mC79y89Il4zMXfIfqfvOiiqktLiSFO5mWqKHYOzYR+DQt6pj2ROPed0l/8O r2J3m6AB9tVmi8C6uUrmflRGrNmbtaUnHZMsHfcpSYAILklI0S1Cfv9NesGegF81+99r jtkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=in55maNvUqroruqu2ACmFsc6adigcRi+fvJiQOCiECg=; b=fuyMuvaC+Mgr12U8Ervwf7vVijN63zhpsuDWoN0hvLTsY0uIfJTaD8mmYj8cWqvYyW sKSPWXNF3+CbwJLf4cKeV0plIgIN2Zj0UU3yMBO+R4hr/niTMbg+cOukNkghCrYhI+Zj LFAL0N7xQ7UU2E/xl3zPr+9kZ6TGaq6ZkpxOJT/yWuBBuqlWuHYs9wyaEhe+5stm80M/ 6H7nkg51xAM7UfRcO/zUE6F77lhu3zkfTIkrHS/B/2hqm2Opnrj8uEEy7Ns1NbAWoLj6 G2CkbnJgtqFaoHNKWWy65+UXFI9SnHwipFNjX4QymOeooN+BrXxyNpuQe1fLLuYSXnH0 d8XQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Bx9YxTU4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bd28si6380241edb.570.2021.03.26.06.55.45; Fri, 26 Mar 2021 06:56:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Bx9YxTU4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229779AbhCZNyu (ORCPT + 99 others); Fri, 26 Mar 2021 09:54:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbhCZNye (ORCPT ); Fri, 26 Mar 2021 09:54:34 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9172C0613B1 for ; Fri, 26 Mar 2021 06:54:33 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id v3so4685054pgq.2 for ; Fri, 26 Mar 2021 06:54:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=in55maNvUqroruqu2ACmFsc6adigcRi+fvJiQOCiECg=; b=Bx9YxTU4thqcTdCFDbC4H8fg21CcrRwyR/StZleBjYnEj5PZUyYyy+R9fqXcstB/eM In/WVcwhwmrWSpXhze9gsr5kdrDyBc96bJxy8h2zfV7RowgDexoK3yadSgIQX/gUQXgO HIsHrjWHYblqB8P0gBR/MA36abze2QvHWAsDdSGKo5i7X5yXdqEBNlMfGyrGLqkpxh+P dZk7GF2PIjLp8UIDR6qs9RlXN88AOdxRwLDZ8dMZuljoNY4BDfOpGBrDill4elEB3o0K UtcJZ6j4He9AAOENqXLMqiHvX5qQQ0CklvxS6FLsswjvWKAlL9IqWpOx9O2VdH8iyE2B +TmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=in55maNvUqroruqu2ACmFsc6adigcRi+fvJiQOCiECg=; b=aIGUD2B15jWzqbTqtOu/77ObpwV2lCzJ8HqakOl2Tm4m5S6oRwAyEhHLRXWZK+RHiK uJ1e6U40jcw++VCM8OObxhhT0NKsHcxe6lSbqvOHlfDZ00OeK0t4C2Z91fN79r8iYw12 Ulv/qvxtZuePD4d55VT+jFa48yYbtOtqV0dkjMcunEP/jZX8w+jEhH7Pemu/amDx8Kll 75HrD4nyFk8wcpqzpGk2fQlGWmz1oT5GnYk6THtckuwe0wshP+42nosil1G7SNaIo4YB DceBWBE9lSiG5HQWLVxyOwS3ozN7PUWuU18aS05NOwQRcHMgdJSVlGIic6Gf1Wt/LGGl A79g== X-Gm-Message-State: AOAM5318JSCHTzQApC3ZPHrP8CkhHtLoFZ5kjq8+SJCSc0BKN3tUsKNv i/GsZ2mcFgn+MOqeZyff8WBu4LEu3yLf7w== X-Received: by 2002:aa7:9431:0:b029:1f1:52fd:5444 with SMTP id y17-20020aa794310000b02901f152fd5444mr12731159pfo.47.1616766872681; Fri, 26 Mar 2021 06:54:32 -0700 (PDT) Received: from [192.168.1.134] ([66.219.217.173]) by smtp.gmail.com with ESMTPSA id 186sm9929907pfb.143.2021.03.26.06.54.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 Mar 2021 06:54:32 -0700 (PDT) Subject: Re: [PATCH 0/6] Allow signals for IO threads To: Stefan Metzmacher , io-uring@vger.kernel.org Cc: torvalds@linux-foundation.org, ebiederm@xmission.com, oleg@redhat.com, linux-kernel@vger.kernel.org References: <20210326003928.978750-1-axboe@kernel.dk> <8efd9977-003b-be65-8ae2-4b04d8dd1224@samba.org> From: Jens Axboe Message-ID: <0c91d9e7-82cd-bec2-19ae-cc592ec757c6@kernel.dk> Date: Fri, 26 Mar 2021 07:54:30 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <8efd9977-003b-be65-8ae2-4b04d8dd1224@samba.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/26/21 7:31 AM, Stefan Metzmacher wrote: > Am 26.03.21 um 13:56 schrieb Jens Axboe: >> On 3/26/21 5:48 AM, Stefan Metzmacher wrote: >>> >>> Am 26.03.21 um 01:39 schrieb Jens Axboe: >>>> Hi, >>>> >>>> As discussed in a previous thread today, the seemingly much saner approach >>>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO >>>> threads. If we just have the threads call get_signal() for >>>> signal_pending(), then everything just falls out naturally with how >>>> we receive and handle signals. >>>> >>>> Patch 1 adds support for checking and calling get_signal() from the >>>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks >>>> SIGSTOP from the default IO thread blocked mask, and the rest just revert >>>> special cases that were put in place for PF_IO_WORKER threads. >>>> >>>> With this done, only two special cases remain for PF_IO_WORKER, and they >>>> aren't related to signals so not part of this patchset. But both of them >>>> can go away as well now that we have "real" threads as IO workers, and >>>> then we'll have zero special cases for PF_IO_WORKER. >>>> >>>> This passes the usual regression testing, my other usual 24h run has been >>>> kicked off. But I wanted to send this out early. >>>> >>>> Thanks to Linus for the suggestion. As with most other good ideas, it's >>>> obvious once you hear it. The fact that we end up with _zero_ special >>>> cases with this is a clear sign that this is the right way to do it >>>> indeed. The fact that this series is 2/3rds revert further drives that >>>> point home. Also thanks to Eric for diligent review on the signal side >>>> of things for the past changes (and hopefully ditto on this series :-)) >>> >>> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from >>> your io_uring-5.12 branch. >>> >>> And using this patch: >>> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c >>> index cc7a227a5ec7..6e26a4214015 100644 >>> --- a/examples/io_uring-cp.c >>> +++ b/examples/io_uring-cp.c >>> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct io_data *data) >>> io_uring_submit(ring); >>> } >>> >>> -static int copy_file(struct io_uring *ring, off_t insize) >>> +static int copy_file(struct io_uring *ring, off_t _insize) >>> { >>> + off_t insize = _insize; >>> unsigned long reads, writes; >>> struct io_uring_cqe *cqe; >>> off_t write_left, offset; >>> int ret; >>> >>> +again: >>> + insize = _insize; >>> write_left = insize; >>> writes = reads = offset = 0; >>> >>> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize) >>> } >>> } >>> >>> + { >>> + struct timespec ts = { .tv_nsec = 999999, }; >>> + nanosleep(&ts, NULL); >>> + goto again; >>> + } >>> + >>> return 0; >>> } >>> >>> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file >>> What I see is this: >>> >>> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in >>> >>> root@ub1704-166:~# head /proc/2061/status >>> Name: iou-wrk-2041 >>> Umask: 0022 >>> State: R (running) >>> Tgid: 2041 >>> Ngid: 0 >>> Pid: 2061 >>> PPid: 1857 >>> TracerPid: 0 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2041/status >>> Name: io_uring-cp >>> Umask: 0022 >>> State: T (stopped) >>> Tgid: 2041 >>> Ngid: 0 >>> Pid: 2041 >>> PPid: 1857 >>> TracerPid: 0 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2042/status >>> Name: iou-mgr-2041 >>> Umask: 0022 >>> State: T (stopped) >>> Tgid: 2041 >>> Ngid: 0 >>> Pid: 2042 >>> PPid: 1857 >>> TracerPid: 0 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> >>> So userspace and iou-mgr-2041 stop, but the workers don't. >>> 49 workers burn cpu as much as possible. >>> >>> kill -KILL 2061 >>> results in this: >>> - all workers are gone >>> - iou-mgr-2041 is gone >>> - io_uring-cp waits in status D forever >>> >>> root@ub1704-166:~# head /proc/2041/status >>> Name: io_uring-cp >>> Umask: 0022 >>> State: D (disk sleep) >>> Tgid: 2041 >>> Ngid: 0 >>> Pid: 2041 >>> PPid: 1857 >>> TracerPid: 0 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# cat /proc/2041/stack >>> [<0>] io_wq_destroy_manager+0x36/0xa0 >>> [<0>] io_wq_put_and_exit+0x2b/0x40 >>> [<0>] io_uring_clean_tctx+0xc5/0x110 >>> [<0>] __io_uring_files_cancel+0x336/0x4e0 >>> [<0>] do_exit+0x16b/0x13b0 >>> [<0>] do_group_exit+0x8b/0x140 >>> [<0>] get_signal+0x219/0xc90 >>> [<0>] arch_do_signal_or_restart+0x1eb/0xeb0 >>> [<0>] exit_to_user_mode_prepare+0x115/0x1a0 >>> [<0>] syscall_exit_to_user_mode+0x27/0x50 >>> [<0>] do_syscall_64+0x45/0x90 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae >>> >>> The 3rd problem is that gdb in a ubuntu 20.04 userspace vm hangs forever: >>> >>> root@ub1704-166:~/samba.git# LANG=C strace -o /dev/shm/strace.txt -f -ttT gdb --pid 2417 >>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 >>> Copyright (C) 2020 Free Software Foundation, Inc. >>> License GPLv3+: GNU GPL version 3 or later >>> This is free software: you are free to change and redistribute it. >>> There is NO WARRANTY, to the extent permitted by law. >>> Type "show copying" and "show warranty" for details. >>> This GDB was configured as "x86_64-linux-gnu". >>> Type "show configuration" for configuration details. >>> For bug reporting instructions, please see: >>> . >>> Find the GDB manual and other documentation resources online at: >>> . >>> >>> For help, type "help". >>> Type "apropos word" to search for commands related to "word". >>> Attaching to process 2417 >>> [New LWP 2418] >>> [New LWP 2419] >>> >>> >>> >>> The related parts of 'pstree -a -t -p': >>> >>> ├─bash,2048 >>> │ └─io_uring-cp,2417 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file >>> │ ├─{iou-mgr-2417},2418 >>> │ └─{iou-wrk-2417},2419 >>> ├─bash,2167 >>> │ └─strace,2489 -o /dev/shm/strace.txt -f -ttT gdb --pid 2417 >>> │ └─gdb,2492 --pid 2417 >>> │ └─gdb,2494 --pid 2417 >>> >>> root@ub1704-166:~# cat /proc/sys/kernel/yama/ptrace_scope >>> 0 >>> >>> root@ub1704-166:~# head /proc/2417/status >>> Name: io_uring-cp >>> Umask: 0022 >>> State: t (tracing stop) >>> Tgid: 2417 >>> Ngid: 0 >>> Pid: 2417 >>> PPid: 2048 >>> TracerPid: 2492 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2418/status >>> Name: iou-mgr-2417 >>> Umask: 0022 >>> State: t (tracing stop) >>> Tgid: 2417 >>> Ngid: 0 >>> Pid: 2418 >>> PPid: 2048 >>> TracerPid: 2492 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2419/status >>> Name: iou-wrk-2417 >>> Umask: 0022 >>> State: R (running) >>> Tgid: 2417 >>> Ngid: 0 >>> Pid: 2419 >>> PPid: 2048 >>> TracerPid: 2492 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2492/status >>> Name: gdb >>> Umask: 0022 >>> State: S (sleeping) >>> Tgid: 2492 >>> Ngid: 0 >>> Pid: 2492 >>> PPid: 2489 >>> TracerPid: 2489 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> root@ub1704-166:~# head /proc/2494/status >>> Name: gdb >>> Umask: 0022 >>> State: t (tracing stop) >>> Tgid: 2494 >>> Ngid: 0 >>> Pid: 2494 >>> PPid: 2492 >>> TracerPid: 2489 >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> >>> >>> Maybe these are related and 2494 gets the SIGSTOP that was supposed to >>> be handled by 2419. >>> >>> strace.txt is attached. >>> >>> Just a wild guess (I don't have time to test this), but maybe this >>> will fix it: >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index 07e7d61524c7..ee5a402450db 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -503,8 +503,7 @@ static int io_wqe_worker(void *data) >>> if (io_flush_signals()) >>> continue; >>> ret = schedule_timeout(WORKER_IDLE_TIMEOUT); >>> - if (try_to_freeze() || ret) >>> - continue; >>> + try_to_freeze(); >>> if (signal_pending(current)) { >>> struct ksignal ksig; >>> >>> @@ -514,8 +513,7 @@ static int io_wqe_worker(void *data) >>> continue; >>> } >>> /* timed out, exit unless we're the fixed worker */ >>> - if (test_bit(IO_WQ_BIT_EXIT, &wq->state) || >>> - !(worker->flags & IO_WORKER_F_FIXED)) >>> + if (ret == 0 && !(worker->flags & IO_WORKER_F_FIXED)) >>> break; >>> } >>> >>> When the worker got a signal I guess ret is not 0 and we'll >>> never hit the if (signal_pending()) statement... >> >> Right, the logic was a bit wrong there, and we can also just drop >> try_to_freeze() from all of them now, we don't have to special >> case that anymore. > > I tested my version and it fixes the SIGSTOP problem, I guess your > branch will also fix it. > > So the looping workers are gone and doesn't hang anymore. > > But gdb still shows very strange things, with a 5.10 kernel I see this: > > root@ub1704-166:~# LANG=C gdb --pid 1274 > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word". > Attaching to process 1274 > Reading symbols from /root/liburing/examples/io_uring-cp... > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 > 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory. > (gdb) > > > And now: > > root@ub1704-166:~# LANG=C gdb --pid 1320 > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word". > Attaching to process 1320 > [New LWP 1321] > [New LWP 1322] > > warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386 > > warning: Architecture rejected target-supplied description > syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 > 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory. > (gdb) > > pstree -a -t -p looks like this: > > │ └─io_uring-cp,1320 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file > │ ├─{iou-mgr-1320},1321 > │ └─{iou-wrk-1320},1322 > > I'm wondering why there's the architecture related stuff... I'm not sure about that either, I'm guessing it's because it's not receiving any valid information on those IO threads and the decoding ends up being a bit nonsensical and the warning too. >> Can you try the current branch? I folded in fixes for that. >> That will definitely fix case 1+3, the #2 with kill -KILL is kind >> of puzzling. I'll try and reproduce that with the current tree and see >> what happens. But that feels like it's either not a new thing, or it's >> the same core issue as 1+3 (though I don't quite see how, unless the >> failure to catch the signal will elude get_signal() forever in the >> worker, I guess that's possible). > > The KILL after STOP deadlock still exists. In which tree? Sounds like you're still on the old one with that incremental you sent, which wasn't complete. > Does io_wq_manager() exits without cleaning up on SIGKILL? No, it should kill up in all cases. I'll try your stop + kill, I just tested both of them separately and didn't observe anything. I also ran your io_uring-cp example (and found a bug in the example, fixed and pushed), fwiw. -- Jens Axboe