Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp250631iob; Mon, 2 May 2022 18:33:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw8K+rfB+gtNAS5azaTO4hMXKMXaNIq8DQ3/8wOntd1XkYKc7b5ZQfc3ttaW7coIW05aqed X-Received: by 2002:a17:902:f605:b0:154:aa89:bd13 with SMTP id n5-20020a170902f60500b00154aa89bd13mr14653309plg.112.1651541607875; Mon, 02 May 2022 18:33:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651541607; cv=none; d=google.com; s=arc-20160816; b=eDiJBhPJpyU1kIPdAtKoTvXnrWwb6z5hiGxOXc+Y7fWDCbnKIyEs1VmfQ9Ue7dp0ED Bbtgb8ZRcCFSgyGkI9bpRNjurV6m+Lrq6/2QR+tMocp+eNZtuRPh/adUFphs5nfHw0Py ThS6xq0ETsJxlzZGTymRBj5p7eLMyDqsQHb/d8jS+uzAyM2zDtfWbSku6hlSNJXvdmQa 6T4xZ5iOWV5Qlwf12LSDx+wAfQxUMMecYn2VeqAO703KbDlhxoe35TveRlfuW+Dnvso/ CjAzRnaWOlvZC7I0ryKV2ipPErmQ/uvOewuEn5OuxoVKfItP5j/+An197o5cEk/dpd7T BkIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=eHBpjvV0xsoV/2E/yxAKoD7xjO9zfwPQ1d4jUnUE31Y=; b=Y4AVwayyLKi9IjAtne6dlTbE6pbeouN/cfjSaBs+iUg1zeBx8kNRE9otw6RdhnKLMQ 9nx7JyTNgWhI1XdF0ozqCdx6DldK1mD43U5TDXbMQcHaAri9FIcewAzmIKNfc3r0gJn9 jgthd7QSWOl3zufOISgan5A3Mg3DXJJQR0KbrCAF3bSX10/bAJs5LHQVsnYYarCLQvZC GLdUFp7y4k00aZ5m+vQukEXBA5U1z5/cMVnGzM76N7+n1+ReXpE0D30ScXqiD173iycP gaXSDTW5NUFyGTHJ1Qy2NbE73cjuIYq2RL84TTy6IjwfolvWt6+MzAtCBrt4sePpm7/n +ing== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b=lvPK8H4v; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id 126-20020a630384000000b003a94e627cc2si15465404pgd.360.2022.05.02.18.33.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 18:33:27 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b=lvPK8H4v; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B9A1649CBB; Mon, 2 May 2022 18:17:14 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386449AbiEBREY (ORCPT + 99 others); Mon, 2 May 2022 13:04:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386641AbiEBREL (ORCPT ); Mon, 2 May 2022 13:04:11 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC0F410D4 for ; Mon, 2 May 2022 10:00:41 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id c23so12965561plo.0 for ; Mon, 02 May 2022 10:00:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language :from:to:cc:references:in-reply-to:content-transfer-encoding; bh=eHBpjvV0xsoV/2E/yxAKoD7xjO9zfwPQ1d4jUnUE31Y=; b=lvPK8H4vc+BAN657Xx1F3TXqqpbu97mZf4LGDeuVQ7CbIZ119ejBP80HsujmBElg5y 3OhzGwB5VlmeWNEpAOhWTHzjWyWgdkx3UVnNXF9steLeFtmf/a7us+5RUQWiHvUf1Wuz hvkMpVlZ5dxVrU+4mElHsi5xYRNkKYWOnKe7ovwse/i6hZHIDI30x5QwktxDXfJxvb2d scRF/cieQHMixK1Tyr/AH6Zw6BRZ5qhpF08qqvFcoonw0cM0ApL/TA5zVXmsuZJQ5JbG OogWPxMHngRSfkPBHGOeCooTysASe6w3InrtP+UEMuHam2naL+jmOJlwn7DAhxcwAFUa fN4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=eHBpjvV0xsoV/2E/yxAKoD7xjO9zfwPQ1d4jUnUE31Y=; b=GEW8zDYnZMyqTmFZvYWTMeyW5oVRQNgejzJGOCxyZJqOE4l7DR4WbAfQZPdpNobZ+a EmZ/RenM7vT5P+1G9BLuqvJP7ZgpZtMG2X6sqHXG7IPYiMys0JfiKVzHnIS1j4ZWVPbf 5lgYaTXWTUb7dm5EhYiearRvnrKceWfmJhbENNz0AFcG2cqczcXXhSPK4jyFkPNZ04Ja 87fbT3FzPiJcBXWHsaObx4Klc9icL0Lj/3JNqZ6zODJXWuWIbSZH+1pdo/sc/+1kMDSM baUq6m1g7dj58pjbKfnk+YrAYXMugg4RqXCWnLgKryEFpQ5HkRudzPpCfMJyuWwHzUph q3LA== X-Gm-Message-State: AOAM531YnICNiOVXahaCRpc+x0D44r1DIiWVQFLGT1X3RTuukfiTkIHB CMhom/h9AppfcJ2WoUXXPKrGlg== X-Received: by 2002:a17:902:70c1:b0:156:16c0:dc7b with SMTP id l1-20020a17090270c100b0015616c0dc7bmr12658855plt.85.1651510840897; Mon, 02 May 2022 10:00:40 -0700 (PDT) Received: from [10.10.71.43] ([8.34.116.185]) by smtp.gmail.com with ESMTPSA id m7-20020a170902f64700b0015e8d4eb1fdsm4908003plg.71.2022.05.02.10.00.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 May 2022 10:00:40 -0700 (PDT) Message-ID: <371c01dd-258c-e428-7428-ff390b664752@kernel.dk> Date: Mon, 2 May 2022 11:00:38 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [REGRESSION] lxc-stop hang on 5.17.x kernels Content-Language: en-US From: Jens Axboe To: Daniel Harding , Pavel Begunkov Cc: regressions@lists.linux.dev, io-uring@vger.kernel.org, linux-kernel@vger.kernel.org References: <7925e262-e0d4-6791-e43b-d37e9d693414@living180.net> <6ad38ecc-b2a9-f0e9-f7c7-f312a2763f97@kernel.dk> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/2/22 7:59 AM, Jens Axboe wrote: > On 5/2/22 7:36 AM, Daniel Harding wrote: >> On 5/2/22 16:26, Jens Axboe wrote: >>> On 5/2/22 7:17 AM, Daniel Harding wrote: >>>> I use lxc-4.0.12 on Gentoo, built with io-uring support >>>> (--enable-liburing), targeting liburing-2.1. My kernel config is a >>>> very lightly modified version of Fedora's generic kernel config. After >>>> moving from the 5.16.x series to the 5.17.x kernel series, I started >>>> noticed frequent hangs in lxc-stop. It doesn't happen 100% of the >>>> time, but definitely more than 50% of the time. Bisecting narrowed >>>> down the issue to commit aa43477b040251f451db0d844073ac00a8ab66ee: >>>> io_uring: poll rework. Testing indicates the problem is still present >>>> in 5.18-rc5. Unfortunately I do not have the expertise with the >>>> codebases of either lxc or io-uring to try to debug the problem >>>> further on my own, but I can easily apply patches to any of the >>>> involved components (lxc, liburing, kernel) and rebuild for testing or >>>> validation. I am also happy to provide any further information that >>>> would be helpful with reproducing or debugging the problem. >>> Do you have a recipe to reproduce the hang? That would make it >>> significantly easier to figure out. >> >> I can reproduce it with just the following: >> >> sudo lxc-create --n lxc-test --template download --bdev dir --dir /var/lib/lxc/lxc-test/rootfs -- -d ubuntu -r bionic -a amd64 >> sudo lxc-start -n lxc-test >> sudo lxc-stop -n lxc-test >> >> The lxc-stop command never exits and the container continues running. >> If that isn't sufficient to reproduce, please let me know. > > Thanks, that's useful! I'm at a conference this week and hence have > limited amount of time to debug, hopefully Pavel has time to take a look > at this. Didn't manage to reproduce. Can you try, on both the good and bad kernel, to do: # echo 1 > /sys/kernel/debug/tracing/events/io_uring/enable run lxc-stop # cp /sys/kernel/debug/tracing/trace ~/iou-trace so we can see what's going on? Looking at the source, lxc is just using plain POLL_ADD, so I'm guessing it's not getting a notification when it expects to, or it's POLL_REMOVE not doing its job. If we have a trace from both a working and broken kernel, that might shed some light on it. -- Jens Axboe