Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp4740756ybp; Mon, 7 Oct 2019 13:12:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzrGBT5ZQ6Qiu85woCiarkqFIZAflEZYRQvtReS4xlOCeBuFiQegh+kjlAGrCk2OvPE0Ryp X-Received: by 2002:aa7:cf86:: with SMTP id z6mr29847906edx.230.1570479141771; Mon, 07 Oct 2019 13:12:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570479141; cv=none; d=google.com; s=arc-20160816; b=p7zdXLoDZN4P4MWdFHQM8utkXpbFW8SdJq53+J+7rP5DNbW0MaFl6a+bDYmusoIpg/ YXi05/imTFi9zScVZimGSDnbe8Uoj2rM4DnH8s4Il3ffKE1jhOlj1tjZzGrzYUAoHW4Q y1KTyiH9QRwN7NySpya6kVvjIir8S3M5Zi/59kLlGk9S5RI6ijhfl9s4l4jAufdbrmz0 KoCKDHFIrExGKp58vkIquZxEEhDdbL41S8jI8wfQG7dqnX4WOpbXOlhktVOadVAhEYU5 P4DtvlfMDPOrXzsmHMhGzAwCB8cTMy8WiQtt91kUl1y9BDPUIDrIRS50jpWpGevKIbxh 0FCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oHp2V4cqwKnA3Bgcpofc2k1MdyH3DCDWUfUFdzKftko=; b=GaM8xYScYC4UYON9N/5ZpjLSEk1l8MXTxEqzG9uheVq1YaEQf+aEkamOxpZQtZmi/l pBp+cBkf6DwBxxt/TkgLEV8ptmGm+h74U/vkaoF/YkHNq408Ol81ruIzJxn6sykFCrux b9/JI9cLe8lwuwGhNGs4EKFi6Ht79IIjNHlzlsyhNcXkgGTVMYcd0JcywvUS+bEZHJsB Tbw4VIZLiSByxOIJzLc7F4nDacnq5GCH0YPOyYG/b5uAZdlfAhX2xp4v9D9w633Wky/H U4EbJ+FwGKjVoPljz9JKpsPKjYNB5aSJucaUkFFcZTNS3RLcAsO/Y/4sJLBtD6ZchGQH bE9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DeLR00SG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f45si9721912edb.166.2019.10.07.13.11.58; Mon, 07 Oct 2019 13:12:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DeLR00SG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729056AbfJGULW (ORCPT + 99 others); Mon, 7 Oct 2019 16:11:22 -0400 Received: from mail-lj1-f196.google.com ([209.85.208.196]:37129 "EHLO mail-lj1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728212AbfJGULV (ORCPT ); Mon, 7 Oct 2019 16:11:21 -0400 Received: by mail-lj1-f196.google.com with SMTP id l21so15087572lje.4 for ; Mon, 07 Oct 2019 13:11:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oHp2V4cqwKnA3Bgcpofc2k1MdyH3DCDWUfUFdzKftko=; b=DeLR00SGfnrW8PeCxJzb8BGpTMyjQiGkQKZW6c/ojrpGQ6XNN24B8xJ/P5z2HVtIb8 YlsdFcA/ZTOS9DV1q4ds5ttbPQU8Dl5W5xsf3MlERXoKXdycdx2XkQqsgAPGIfpXJDFN 2GCVSac53cO/w9cSeDI/3Of0hdqe9KwCFtbP5NGnwde/37Xe1dpK2yCnDUN0uZqrmjr4 lyd4gtf3qj6g1coRkOLoPcrdhidi2H2XDBCnMHHraWcqi9dX6XidzL0CLt15UJWGgH0Y v+i6CLYpM+Q+pnUZuEh70niwJYStRa/9GDhEthIP7JN9Ik4d6HxM3s2WYXGcpkAAJeoM vgvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oHp2V4cqwKnA3Bgcpofc2k1MdyH3DCDWUfUFdzKftko=; b=KwXfQQsndFfdyZ1CBLHORh5PgaZ9EploJGQH+zcQH04BkXyZ6PxqeEjiIZkOTqetgm bzijx7m+vvHotZ0E17wQ7o3UOFVDIGSfRBBDdn8HxKvM6Gky4vam9URMFg165txQf779 qWm/5EEPIuMPAS6R91Pq+Bh8vq/30bbqNxZNl/Kfrxl/EWEW8lvwGNF3ztUA2c0OoQMC FJMwnRk1b7LhppSGLvJHPVq59QoyiIPqCjjcBkTz5pm4IYKSefT4GdZrWtKOzHXts3iE mDRuoaipJ2sFtJa00BEGuu9lT7VATnJiFn7PricnP98H1X8h3NH03t54xzjC8jnOuAiu hTAg== X-Gm-Message-State: APjAAAWYBXXiMMTP4j4Xd5+35bukwcKCptJ/+MsaVCPbuCUIeZbHxllS 7wb173NWNEad+qJ3bVoZFWxEg0KTglsOgOHvvDM= X-Received: by 2002:a2e:9905:: with SMTP id v5mr16723400lji.42.1570479079372; Mon, 07 Oct 2019 13:11:19 -0700 (PDT) MIME-Version: 1.0 References: <20191002020100.GA6436@castle.dhcp.thefacebook.com> <20191002181914.GA7617@castle.DHCP.thefacebook.com> <20191004000913.GA5519@castle.DHCP.thefacebook.com> In-Reply-To: From: Bruce Ashfield Date: Mon, 7 Oct 2019 16:11:07 -0400 Message-ID: Subject: Re: ptrace/strace and freezer oddities and v5.2+ kernels To: Roman Gushchin Cc: "linux-kernel@vger.kernel.org" , "tj@kernel.org" , Richard Purdie , "oleg@redhat.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 7, 2019 at 8:54 AM Bruce Ashfield wrote: > > On Thu, Oct 3, 2019 at 8:09 PM Roman Gushchin wrote: > > > > On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > > > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > > > > > The Yocto project has an upcoming release this fall, and I've been trying to > > > > > > > sort through some issues that are happening with kernel 5.2+ .. although > > > > > > > there is a specific yocto kernel, I'm testing and seeing this with > > > > > > > normal / vanilla > > > > > > > mainline kernels as well. > > > > > > > > > > > > > > I'm running into an issue that is *very* similar to the one discussed in the > > > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > > > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > > > > > I can confirm that I have the proposed fix for the initial regression report in > > > > > > > my build (05b2892637 [signal: unconditionally leave the frozen state > > > > > > > in ptrace_stop()]), > > > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to take 3 or > > > > > > > 4 seconds. > > > > > > > > > > > > So, the problem is that you're experiencing a severe performance regression > > > > > > in some test, right? > > > > > > > > > > Hi Roman, > > > > > > > > > > Correct. In particular, running some of the tests that ship with strace itself. > > > > > The performance change is so drastic, that it definitely makes you wonder > > > > > "What have I done wrong? Since everyone must be seeing this" .. and I > > > > > always blame myself first. > > > > > > > > > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come up empty > > > > > > > at either fixing it myself, or figuring out a viable workaround. (well, I can > > > > > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > > > > > > but obviously, > > > > > > > that is just me trying to figure out what could be causing the issue). > > > > > > > > > > > > > > As part of the release, we run tests that come with various applications. The > > > > > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > > > $ time ../strace -o log -qq -esignal=none -e/clock ./printpath-umovestr>ttt > > > > > > > > > > > > > > (I can provide as many details as needed, but I wanted to keep this initial > > > > > > > email relatively short). > > > > > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I grabbed the > > > > > > > email list from the regression report in May to see if anyone has any ideas > > > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > > > > > I'm definitely happy to help, but it's a bit hard to say anything from what > > > > > > you've provided. I'm not aware of any open issues with the freezer except > > > > > > some spurious cgroup frozen<->not frozen transitions which can happen in some > > > > > > cases. If you'll describe how can I reproduce the issue, and I'll try to take > > > > > > a look asap. > > > > > > > > > > That would be great. > > > > > > > > > > I'll attempt to remove all of the build system specifics out of this > > > > > (and Richard Purdie > > > > > on the cc' of this can probably help provide more details / setup info as well). > > > > > > > > > > We are running the built-in tests of strace. So here's a cut and paste of what I > > > > > did to get the tests available (ignore/skip what is common sense or isn't needed > > > > > in your test rig). > > > > > > > > > > % git clone https://github.com/strace/strace.git > > > > > % cd strace > > > > > % ./bootstrap > > > > > # the --enable flag isn't strictly required, but may break on some > > > > > build machines > > > > > % ./configure --enable-mpers=no > > > > > % make > > > > > % make check-TESTS > > > > > > > > > > That last step will not only build the tests, but run them all .. so > > > > > ^c the run once > > > > > it starts, since it is a lot of noise (we carry a patch to strace that > > > > > allows us to build > > > > > the tests without running them). > > > > > > > > > > % cd tests > > > > > % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff > > > > > real 0m2.566s > > > > > user 0m0.284s > > > > > sys 0m2.519 > > > > > > > > > > On pre-cgroup2 freezer kernels, you see a run time similar to what I have above. > > > > > On the newer kernels we are testing, it is taking 3 or 4 minutes to > > > > > run the test. > > > > > > > > > > I hope that is simple enough to setup and try. Since I've been seeing > > > > > this on both > > > > > mainline kernels and the yocto reference kernels, I don't think it is > > > > > something that > > > > > I'm carrying in the distro/reference kernel that is causing this (but > > > > > again, I always > > > > > blame myself first). If you don't see that same run time, then that > > > > > does point the finger > > > > > back at what we are doing and I'll have to apologize for chewing up some of your > > > > > time. > > > > > > > > Thank you for the detailed description! > > > > I'll try to reproduce the issue and will be back > > > > by the end of the week. > > > > > > Thanks again! > > > > > > While discussing the issue with a few yocto folks today, it came up that > > > someone wasn't seeing the same behaviour on the opensuse v5.2 kernel > > > (but I've yet to figure out exactly where to find that tree) .. but when I do, > > > I'll try and confirm that and will look for patches or config differences that > > > could explain the results. > > > > > > I did confirm that 5.3 shows the same thing today, and I'll do a 5.4-rc1 test > > > tomorrow. > > > > > > We are also primarily reproducing the issue on qemux86-64, so I'm also > > > going to try and rule out qemu (but the same version of qemu with just > > > the kernel changing shows the issue). > > > > Hi Bruce! > > > > I've tried to follow your steps, but unfortunately failed to reproduce the issue. > > I've executed the test on my desktop running 5.2 and cgroup v1 (Fedora 30), > > and also a qemu vm with vanilla 5.1 and 5.3 and cgroup v2 mounted by default. > > In all cases the test execution time was about 4.5 seconds. > > Hi Roman, > > Thanks for the time you spent on this. I had *thought* that I ruled out my > config before posting .. but clearly, it is either not my config or something > else in the environment. > > > > > Looks like something makes your setup special. If you'll provide your > > build config, qemu arguments or any other configuration files, I can try > > to reproduce it on my side. > > Indeed. I'm going to dive back in and see what I can find. If I can > find something > that is reproducible in a completely different environment and easy to configure > components, I'll follow up with details. > > When I figure out what is going on with the config here, I'll follow up as well, > so the solution can be captured in any archives. Actually, now that I think about it. Would it be possible to see the .config that you used for testing (and even how you launched the VM) ?. I just built a 5.2.17 kernel and the long runtimes persist here. I'm not seeing anything in my .config that is jumping out, and am now looking at how we are launching qemu .. but it would be helpful to have a known good baseline for comparison. If you've already tossed that config, no worries, I'll muddle along and figure it out eventually. Bruce > > Thanks again, > > Bruce > > > > > Thanks! > > > > Roman > > > > -- > - Thou shalt not follow the NULL pointer, for chaos and madness await > thee at its end > - "Use the force Harry" - Gandalf, Star Trek II -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II