Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2305214rwd; Fri, 9 Jun 2023 09:17:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ575LkYS1swiV0Bdw8jLsOZML2mz/psZx0eM0rcKuIo9IGJlnNPvMAutQ+HRxgoK/ZTtyav X-Received: by 2002:a05:6a00:189f:b0:644:18fe:91cc with SMTP id x31-20020a056a00189f00b0064418fe91ccmr1607939pfh.12.1686327431051; Fri, 09 Jun 2023 09:17:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686327431; cv=none; d=google.com; s=arc-20160816; b=dn/mmhbCXkemNaP1JLMVRNiefbxVTUUHovdEgHuGpDMd+KVRxOY9PwJEO8pgSrHAQV yIfT95WEwuaUkp06bNEs77demQxMTXsi7rzDy6ds+pjXzJLzqyBDymALsF+w+AQQowap BURU4cMBs4TgJzKGpRnR5XrSbCBTgBVIjCsU/giD9V4PmGuSgvGXdNvSx8npMo7BLBO5 3MoaM0V+YIkIl3sE8844zUxli2K2Yy4rX7hk5K4LJc1t9Zfy/LnzP2t2tq3G8QGIv/Fx X4cE6yy5ZTj3nT0b3bDrWICjXMH4+/m1l2OQ26PR4krYjIixJw8Zs5DlIj59QXz5GXev tj1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7xXdGSBw/jAKY3Aljlfk/WgJH1/fIqiIWxWBCL7uHns=; b=Mc3TX+eXwSJoKUixQHOoFvsBTw2O0WxlU06eonMQDv3vGzyGL4qb7nmGJ8q2RPGrFo bqOcAXipGi+KEbFtWMraZpZZR+J6AyDuHqZu3FCRGwX86OO6w2sG0oOXYvFCOn3MJ1Q6 6YktWr6fKNUUbtl5AIiBSZWH6lOYHzioemWlIKJh5YdM+qtv31SxWM4FPN6+iCggC+Tq SblPAWierq/Ra2g630ecCFsx6WAoIoxDOs+De2L5gPGCWszhPdMBuy2XTfmsaveQvtL3 qFf/7W6YMM+FN4nEdhR5+kRx8ApbxiopdqsVT01OpLyS84xy6eZNmY95Gikayfg9VLFf K8fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WxLREPkI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z26-20020aa79f9a000000b006545ec4799asi2667575pfr.271.2023.06.09.09.16.56; Fri, 09 Jun 2023 09:17:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WxLREPkI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241386AbjFIQAV (ORCPT + 99 others); Fri, 9 Jun 2023 12:00:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241486AbjFIQAQ (ORCPT ); Fri, 9 Jun 2023 12:00:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDCDE213C; Fri, 9 Jun 2023 09:00:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6417565983; Fri, 9 Jun 2023 16:00:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C48A1C433D2; Fri, 9 Jun 2023 16:00:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686326413; bh=ufWLuxradDqeh+nb+oWhUGNPyT4gY9LfBxEpIG/6q7I=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=WxLREPkI+MD3rNK4kzrcuQrMNKX44bUUVsThkroaAXthDt47qSH78p91AwJGloN6L 2Llgnxt0GEG+t4gRDZj3vLFZv0cPVEvBfkdkZv6+RfToDdRUYwkV1zF8jPbikL8wFz LjbCo7Gn0GH1sO9wqLjDR0ke7oRXBb6gE5FFR7DN36vK1b5D572kNiwjrVCizGboVi x0K7948fhhj4+kNdY2RFSWfNO1aS+X0n0JBJZCt6tsBlydw5trQeaIyncaWIKz/tFv oAsMaqRMdQ9Ydq8pJlcVn9RoQS41gjXguvFTBVAwyOqeMIrn6ox99aw9KqGFuz0f15 hRw8TaJY1ABcA== Received: by mail-ot1-f48.google.com with SMTP id 46e09a7af769-6b2a4655352so778920a34.3; Fri, 09 Jun 2023 09:00:13 -0700 (PDT) X-Gm-Message-State: AC+VfDxBM499ZEYrV4RuCbzYYupr15gVD656rg7uMleyzFEOD1DSo1yE 8UVrq8eJW1S7SXyZx0aMMVvpJ6goC3daahSTKOI= X-Received: by 2002:a05:6871:345:b0:192:7320:ce with SMTP id c5-20020a056871034500b00192732000cemr1512317oag.40.1686326412964; Fri, 09 Jun 2023 09:00:12 -0700 (PDT) MIME-Version: 1.0 References: <7403dd164cff7d9217999cddb66135db47564c4b.camel@intel.com> In-Reply-To: <7403dd164cff7d9217999cddb66135db47564c4b.camel@intel.com> From: Masahiro Yamada Date: Sat, 10 Jun 2023 00:59:36 +0900 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC] [kbuild test robot] random-order parallel building To: "Liu, Yujie" Cc: "Li, Philip" , "linux-kbuild@vger.kernel.org" , lkp , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 9, 2023 at 5:41=E2=80=AFPM Liu, Yujie wro= te: > > Hi Masahiro, > > On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote: > > On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote: > > > Hello, maintainers of the kbuild test robot. > > > > > > I have a proposal for the 0day tests. > > > > Thanks a lot for the proposal for the shuffle make, we will do some > > investigation to try this random order parallel build. The gnu make > > we currently use is 4.3, we will try the 4.4 to see any problem. > > > > For the timeline, we may provide update later this month. > > We've upgraded to make v4.4.1 in kernel test robot and enabled random- > order parallel compiling in our randconfig build tests. The shuffle > seed is generated by hashing the randconfig, so it changes overtime and > can cover various random orders. We are still doing some internal > testing and will put it online once everything is done. > > > > > > > > > > GNU Make traditionally processes the dependency from left to right. > > > > > > For example, if you have dependency like this: > > > > > > all: foo bar baz > > > > > > GNU Make builds foo, bar, baz, in this order. > > > > > > > > > Some projects that are not capable of parallel builds > > > rely on that behavior implicitly. > > > > > > Kbuild, however, is intended to work well in parallel. > > > (As the maintainer, I really care about it.) > > > > > > > > > From time to time, people add "just worked for me" code, > > > but apparently that lacks proper dependency. > > > Sometimes it requires an expensive CPU to reproduce > > > parallel build issues. > > > > > > > > > For example, see this report, > > > https://lkml.org/lkml/2016/11/30/587 > > > > > > The report says 'make -j112' reproduces the broken parallel build. > > > Most people do not have such a build machine that comes with 112 > > > cores. > > > It is difficult to reproduce it (or even notice it). > > > > > > (Some time later, it was root-caused by 07a422bb213a) > > Thanks a lot for sharing this case. We tried to reproduce it, but looks > it dates back to v4.9-rc7 and throws some other errors when compiling > in our kbuild env, so we are not able to reproduce it yet. Not sure if > it is related with toolchain/compiler version or the kernel config. > > This case mentioned that 'make -j112' can reproduce the breakage. We > assume this is under traditional serial order build. Does it imply that > it is likely to take much less parallel jobs to reproduce the breakage > when shuffle is set, say 'make --shuffle=3DSEED -j32', so developers are > able to reproduce it on an ordinary CPU with less cores? I think --shuffle will help a build machine with fewer cores catch issues, but it is not a full randomization. In my understanding, --shuffle still traverses depth-first. Consider this example. all: foo bar foo: foo-sub bar: bar-sub Only either [1] or [2] happens. [1] foo-sub -> foo -> bar-sub -> bar -> all [2] bar-sub -> bar -> foo-sub -> foo -> all foo-sub -> bar-sub -> bar -> foo -> all is a possible order, but --shuffle never schedules like that. > Not sure if there are other known cases of parallel build breakage > (especially in recent kernels). If any, it would be very kind if you > could also share them. We can first try reproducing them in the bot to > confirm our test flow works well. I do not remember any other real breakage. > > Another question is about bisection. Say the bot catches a breakage on > commit1 which root-caused to a previous commit2. If we keep the options > "--shuffle=3D -j" consistent during the whole process of > bisection, will the breakage 100% show up on all the commits between > commit2 and commit1, or it is kind of possible to reproduce the > breakage, but not 100% reproducible on every commit during bisection? I am not sure, but I _guess_ git-bisect may not point to commit 2 if there is a Makefile change in between. commit2 (root cause) -> commitA (add Makefile change) -> commit1 (0 day bot noticed an issue here) Even if the same --shuffle=3DSEED is given, the issue may not be reproducible on commit2..commitA if commitA changes a Makefile. Thanks for considering this. > Thanks a lot for this parallel building proposal, and we will keep > updating the status. > > -- > Best Regards, > Yujie Liu > > > > > > > > > > GNU Make 4.4 got this option. > > > > > > --shuffle[=3D{SEED|random|reverse|none}] > > > Perform shuffle of prerequisites and goals. > > > > > > > > > > > > 'make --shuffle=3Dreverse' will build in reverse order. > > > In the example above, baz, bar, foo. > > > > > > 'make --shuffle' will randomize the build order. > > > > > > > > > If there exists a missing dependency among foo, bar, baz, > > > it will fail to build. > > > > > > > > > > > > We already perform the randconfig daily basis. > > > So, random-order parallel building is a similar idea. > > > > > > Perhaps, it makes sense to add the "--shuffle=3DSEED" option > > > but it requires GNU Make 4.4. (or GNU Make 4.4.1) > > > Is this too new? > > > > Our production environment is 4.3 right now. It will take extra > > time for us to upgrade the environment but it's doable for us. > > > > > > > > > > > > > > -- > > > Best Regards > > > Masahiro Yamada > > > --=20 Best Regards Masahiro Yamada