Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp251025iob; Mon, 2 May 2022 18:34:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzxbnkp8RhMh+cZs3E2yhdOMLCOk9VtdqxM9xx8FdPHUpB95w58BhJEia8+qlJhANC+pVYQ X-Received: by 2002:a17:902:c951:b0:15e:abf8:d025 with SMTP id i17-20020a170902c95100b0015eabf8d025mr5267298pla.10.1651541648144; Mon, 02 May 2022 18:34:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651541648; cv=none; d=google.com; s=arc-20160816; b=DQOXqQ8GImVFy+/JTCE9jYfGCUINv0PO7aZnIx01orpiOxo6hR6mpejA/jz90IHZXF wyrmua+P6VRqjRoNp5UndmZYXPJLdpsmVeq5X6GK1Coin/eC8jheKUCEnaUDMEuY4shj kAGFCysFWVgbhzTNOwAM+ZjglQvCqOQyg5FoRQWI9WGKLS4udhqXV4emp1RQNFr6YmEn GLZX+zbSp2Rs3TVMdGqOAbe9el5gW3Fz/EjjoaVPqAUVBAFCHDHm+hAliGLfp3NidEbT PRxr1FPmdhuEiHD/gGG55ZRLmJveFCnThuKGHdB0TvASbTtDHyj4dDslzjDHaR9iEYCw GFTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=5yilTDVWyYVQnat0wTW/cshHY+EZUdB/qoHF/HgKqh0=; b=Ya0h3OniTKZ2MfhcsA72+2yUYEWIYwDmfHs1iYdKPJcCC4jiEV1x22A31sJdFabKCj b4jBojKCly6XTfbUgdvboGA6tlawe8K3AsZ91pWCVaNA5OC6KQvpGyuft6S3G/DfgVvA ptik3XlDZubYYpSjr2UcihVK1B9zZSQ9YZX4jXfO1CnQ5ZIqBQ199s2q2KSKTaferru6 O4xltnieD5AtswVcS6Xh9hhyvDXB/E8XstYgGqqXrX8UEGtGNXX9VYlV+GWb+wlCfZqn JLQ/FBXJOobzDMgMyBEBycnYkspCqH0emf/Kw6wJ3DeEs0I/mhzb845188hdzdDw1CN9 tlmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="b5t3fQ/s"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id f15-20020a63554f000000b003816043f0f0si14885189pgm.741.2022.05.02.18.34.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 18:34:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="b5t3fQ/s"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C7805522E2; Mon, 2 May 2022 18:17:54 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352328AbiEBRom (ORCPT + 99 others); Mon, 2 May 2022 13:44:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244389AbiEBRod (ORCPT ); Mon, 2 May 2022 13:44:33 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5947421B4; Mon, 2 May 2022 10:41:03 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id w4so20403591wrg.12; Mon, 02 May 2022 10:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=5yilTDVWyYVQnat0wTW/cshHY+EZUdB/qoHF/HgKqh0=; b=b5t3fQ/sc6WafpGvCpiTDmwe3DSD9Yby3pt2wIsQT7bEHa9TxNfH77jTWrmVEbXWxA cZaPeDyBcyK6NQ/LmprunhCxGTPygSeR5bbfTUuzcVEFnTXZUlVVDYkqjo/xnt/NAs6r jzIj+2nw8hdtCD17bTlCjgzZTI3GnlOfNVBnqp13eAOY+xWZT20UnHbUsiBZgkfUaufI oR0QVi7JMCCi+xVFp+6Gg53rQQasX3deNRymuQjbPpHrPJiaq/F6tzdlmTXFL8AQg/FQ EhW6UrpH7nmXmGZ5pULLkrdPabj8ULkG9W3rc3JFbX6JK3fWCfJICLwBpkCIjLtuQn6N vpuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=5yilTDVWyYVQnat0wTW/cshHY+EZUdB/qoHF/HgKqh0=; b=nwYA1rE0NKnv66ZfmVaat9Yl2QAKdTlfL3ovwxLT+jX+E9rrdArrlvJte3Qjc+1R97 JoAjcZt364b6BkoTyKJL/pxtU8JskVVokZ65jlm9qFcDm+QUglo+Mi7CUHajwyuiqmOw RSpMEvCGGiEZmjrsnilRWhTQSN4erX//eKRvN4l9HaKl5fPMjp+7X4txieHgTtUu7mAG XInsO/dlAiyRmdLL2Ku3yiFQBvn/1V8X3dU4MjV2rMFM3xghWZiQgJw8ehEJ4FYt/yox o31DyTAObwsF+utb8n/YCyTY6Yz5dI489LXrlAZ++njQBHmnZ6Tb1Vz/1kCm7VfqojrG 25ZA== X-Gm-Message-State: AOAM533Et4hNnfiN8PNCFfX2DJhkSjQVgls+J9SSCtrC0otC1lw06xm4 vvpLo9FJohXBlxsq485vULis/H9wXVw= X-Received: by 2002:a05:6000:110d:b0:20a:ea3b:8d48 with SMTP id z13-20020a056000110d00b0020aea3b8d48mr10313168wrw.196.1651513261679; Mon, 02 May 2022 10:41:01 -0700 (PDT) Received: from [192.168.8.198] ([85.255.235.73]) by smtp.gmail.com with ESMTPSA id y20-20020a7bc194000000b003942a244f48sm6657482wmi.33.2022.05.02.10.41.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 May 2022 10:41:00 -0700 (PDT) Message-ID: <2436d42c-85ca-d060-6508-350c769804f1@gmail.com> Date: Mon, 2 May 2022 18:40:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [REGRESSION] lxc-stop hang on 5.17.x kernels Content-Language: en-US To: Jens Axboe , Daniel Harding Cc: regressions@lists.linux.dev, io-uring@vger.kernel.org, linux-kernel@vger.kernel.org References: <7925e262-e0d4-6791-e43b-d37e9d693414@living180.net> <6ad38ecc-b2a9-f0e9-f7c7-f312a2763f97@kernel.dk> <371c01dd-258c-e428-7428-ff390b664752@kernel.dk> From: Pavel Begunkov In-Reply-To: <371c01dd-258c-e428-7428-ff390b664752@kernel.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/2/22 18:00, Jens Axboe wrote: > On 5/2/22 7:59 AM, Jens Axboe wrote: >> On 5/2/22 7:36 AM, Daniel Harding wrote: >>> On 5/2/22 16:26, Jens Axboe wrote: >>>> On 5/2/22 7:17 AM, Daniel Harding wrote: >>>>> I use lxc-4.0.12 on Gentoo, built with io-uring support >>>>> (--enable-liburing), targeting liburing-2.1. My kernel config is a >>>>> very lightly modified version of Fedora's generic kernel config. After >>>>> moving from the 5.16.x series to the 5.17.x kernel series, I started >>>>> noticed frequent hangs in lxc-stop. It doesn't happen 100% of the >>>>> time, but definitely more than 50% of the time. Bisecting narrowed >>>>> down the issue to commit aa43477b040251f451db0d844073ac00a8ab66ee: >>>>> io_uring: poll rework. Testing indicates the problem is still present >>>>> in 5.18-rc5. Unfortunately I do not have the expertise with the >>>>> codebases of either lxc or io-uring to try to debug the problem >>>>> further on my own, but I can easily apply patches to any of the >>>>> involved components (lxc, liburing, kernel) and rebuild for testing or >>>>> validation. I am also happy to provide any further information that >>>>> would be helpful with reproducing or debugging the problem. >>>> Do you have a recipe to reproduce the hang? That would make it >>>> significantly easier to figure out. >>> >>> I can reproduce it with just the following: >>> >>> sudo lxc-create --n lxc-test --template download --bdev dir --dir /var/lib/lxc/lxc-test/rootfs -- -d ubuntu -r bionic -a amd64 >>> sudo lxc-start -n lxc-test >>> sudo lxc-stop -n lxc-test >>> >>> The lxc-stop command never exits and the container continues running. >>> If that isn't sufficient to reproduce, please let me know. >> >> Thanks, that's useful! I'm at a conference this week and hence have >> limited amount of time to debug, hopefully Pavel has time to take a look >> at this. > > Didn't manage to reproduce. Can you try, on both the good and bad > kernel, to do: Same here, it doesn't reproduce for me > # echo 1 > /sys/kernel/debug/tracing/events/io_uring/enable > > run lxc-stop > > # cp /sys/kernel/debug/tracing/trace ~/iou-trace > > so we can see what's going on? Looking at the source, lxc is just using > plain POLL_ADD, so I'm guessing it's not getting a notification when it > expects to, or it's POLL_REMOVE not doing its job. If we have a trace > from both a working and broken kernel, that might shed some light on it. -- Pavel Begunkov