Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7003514rwr; Tue, 25 Apr 2023 07:00:10 -0700 (PDT) X-Google-Smtp-Source: AKy350YKzC/hcd3xCc9awhiXWN5Up77GarJHsbZWH6ML7V0kGk0TUr57Xdq/DBtrwhTP+A+KmlXz X-Received: by 2002:a17:902:f292:b0:1a6:b247:4316 with SMTP id k18-20020a170902f29200b001a6b2474316mr16990550plc.62.1682431210540; Tue, 25 Apr 2023 07:00:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682431210; cv=none; d=google.com; s=arc-20160816; b=XVX9893Gl1ZfbnsiOrhGjAaZSpHKCBOjCj6/if8TqpoyBhGrRIEuDL/ItxQeyFfh35 3+C/8B7Ukyi+l4eNkaDDGi9zmC1y/yJmq+65u3G8ODJDw4L5d/dE0gyekYUbGWeC5+H7 tw889XM0/31S/TPW9YQx0mQaHnlZd3vQwJUq25ael3vCt2MllsIdtvqRKX4k1CZWnWDM TfFdJNO/V0V7JHa53gHzrocrr2dGFqA37PKgieGg2tCtmuAJQlEMJIw5SZO1XpYmgEkd gbbep8W8tiVG460BNKsKG+VDKTnEfvVn7qYFpXwLq6LUfR1wfNp5MjJnpyVqnv8mZs3I T97w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=2lAuwknu3oppNAiUSZpnFi04sAphUW8dNNMCjOkBABs=; b=GZ0eDHRIJAYOVpvRF3Soqv32xG1aqcMepdNv1oj9VqMvhlLUtNlj0QgElXTqdUyVTI qjJAhg8Rb/lsStQrVJs4V9aO0Jrofzic6ITDT+rK3F2vc+jpl5BL5/g5QzzGMHmE5FPm xvJx0miMvoHfulmaJ6xwXkt+N7v5myJMk3msonX+rGBG2VmV+bhx3/tr0sNXy4GkcURh w9m9q55NT6DiqkIB1S1FYXnne3jDJKiMKzWAzvu1oGE7ghwp5gAAPz4E4Xv95Ce7a+Nf 5bt3vosHvO1w5miKY/yQTe4eTSXRWzW6JsHeOxRNo/6pmdhWhgsFBJvzvmyRu0X4ACn0 eVPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=jamzRusO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j15-20020a170902da8f00b001a6e9ae0c0bsi15035093plx.577.2023.04.25.06.59.53; Tue, 25 Apr 2023 07:00:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=jamzRusO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234130AbjDYNuG (ORCPT + 99 others); Tue, 25 Apr 2023 09:50:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233976AbjDYNuD (ORCPT ); Tue, 25 Apr 2023 09:50:03 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E1EE1708; Tue, 25 Apr 2023 06:50:02 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-518d325b8a2so5844385a12.0; Tue, 25 Apr 2023 06:50:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682430601; x=1685022601; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2lAuwknu3oppNAiUSZpnFi04sAphUW8dNNMCjOkBABs=; b=jamzRusOo2NjkX7molZM2aCKuhRTgTL0DhlzMquEyNRxa2MjYgzuDwRNGa0LIxc34N e7mJmtfFgZIs1brtVwmW0LO16YcSGlUfOW0x1kkRsrDlhFOFWXLM8tkC3Lr+Qzi0+gJ5 xPnzh3BkhQsDNdVyh3OTN4FijvnyjBZUN7LLcBKSblbnQ3jiyL4Sg283i7vxiR3b1UZE 5RsspSs88YBOkut/edo7rBsRVJid5/o1rjaYnm8iPdbIv2VXkLV7iEGB4tRqfLHJSuhP QXkmfm23JUgcXyzpXo6uQNN8HbRcXvJUVlJRHu5MRWeLNyg6tq8TmZvurN6sTuKwdmwa cSZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682430601; x=1685022601; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2lAuwknu3oppNAiUSZpnFi04sAphUW8dNNMCjOkBABs=; b=FfHH9/w9dtOvPin1lTChwG9nOoy4oyuf96xyL8uh4vKAntzfhEiIbfUNcWnue52Bew /jQfpOLYOHDglbZzFxS4lFKqxeaMo6zuIcGU2/Qh8YQjzDRUuJVYKKheDy5s3VDWYDBx mCJaXcggDO7Y3r9JcAlUI7lLDIQ4n58t/qUYBDmBoKSD538cpq+CTl0VcpJk7+yutTyV VHV7TZSbAvOCuzt60edcL7sDWEQm+zb2JVLo/GizAoB0R5dZZMoKy+oqVqMVpmDhlLnc 9P226lzzSFfYCGnCTsbbkVcc8E8VL33L3a+5DE4RtKUU+8a3cZatcfeI9Vw1xjUtCM0J W70A== X-Gm-Message-State: AAQBX9eXjR71fCz1sWXQu+3pIsrl+jaRKIcOL0FTKALgjbKMGNCmmAhT 7md5j7udqftsy9eEpyaY+aNsY3soPzygbvojW8c= X-Received: by 2002:a17:90a:6447:b0:23f:a4da:1208 with SMTP id y7-20020a17090a644700b0023fa4da1208mr16749390pjm.39.1682430601304; Tue, 25 Apr 2023 06:50:01 -0700 (PDT) MIME-Version: 1.0 References: <87fs8pzalj.fsf@mail.concordia> <20230424151351.GP19790@gate.crashing.org> <20230425101324.GD1331236@hirez.programming.kicks-ass.net> <528b2adc-9955-5545-9e9d-affd1f935838@csgroup.eu> In-Reply-To: <528b2adc-9955-5545-9e9d-affd1f935838@csgroup.eu> From: Zhouyi Zhou Date: Tue, 25 Apr 2023 21:49:50 +0800 Message-ID: Subject: Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail To: Christophe Leroy Cc: Joel Fernandes , Peter Zijlstra , Boqun Feng , Segher Boessenkool , Michael Ellerman , linuxppc-dev , rcu , linux-kernel , "lance@osuosl.org" , "Paul E. McKenney" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NORMAL_HTTP_TO_IP, NUMERIC_HTTP_ADDR,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi On Tue, Apr 25, 2023 at 9:40=E2=80=AFPM Christophe Leroy wrote: > > > > Le 25/04/2023 =C3=A0 13:06, Joel Fernandes a =C3=A9crit : > > On Tue, Apr 25, 2023 at 6:58=E2=80=AFAM Zhouyi Zhou wrote: > >> > >> hi > >> > >> On Tue, Apr 25, 2023 at 6:13=E2=80=AFPM Peter Zijlstra wrote: > >>> > >>> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote: > >>>> This is amazing debugging Boqun, like a boss! One comment below: > >>>> > >>>>>>> Or something simple I haven't thought of? :) > >>>>>> > >>>>>> At what points can r13 change? Only when some particular function= s are > >>>>>> called? > >>>>>> > >>>>> > >>>>> r13 is the local paca: > >>>>> > >>>>> register struct paca_struct *local_paca asm("r13"); > >>>>> > >>>>> , which is a pointer to percpu data. > >>>>> > >>>>> So if a task schedule from one CPU to anotehr CPU, the value gets > >>>>> changed. > >>>> > >>>> It appears the whole issue, per your analysis, is that the stack > >>>> checking code in gcc should not cache or alias r13, and must read it= s > >>>> most up-to-date value during stack checking, as its value may have > >>>> changed during a migration to a new CPU. > >>>> > >>>> Did I get that right? > >>>> > >>>> IMO, even without a reproducer, gcc on PPC should just not do that, > >>>> that feels terribly broken for the kernel. I wonder what clang does, > >>>> I'll go poke around with compilerexplorer after lunch. > >>>> > >>>> Adding +Peter Zijlstra as well to join the party as I have a feeling > >>>> he'll be interested. ;-) > >>> > >>> I'm a little confused; the way I understand the whole stack protector > >>> thing to work is that we push a canary on the stack at call and on > >>> return check it is still valid. Since in general tasks randomly migra= te, > >>> the per-cpu validation canary should be the same on all CPUs. > >>> > >>> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use > >>> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu= , > >>> but no guarantees. > >>> > >>> Both cases use r13 (paca) in a racy manner, and in both cases it shou= ld > >>> be safe. > >> New test results today: both gcc build from git (git clone > >> git://gcc.gnu.org/git/gcc.git) and Ubuntu 22.04 gcc-12.1.0 > >> are immune from the above issue. We can see the assembly code on > >> http://140.211.169.189/0425/srcu_gp_start_if_needed-gcc-12.txt > >> > >> while > >> Both native gcc on PPC vm (gcc version 9.4.0), and gcc cross compiler > >> on my x86 laptop (gcc version 10.4.0) will reproduce the bug. > > > > Do you know what fixes the issue? I would not declare victory yet. My > > feeling is something changes in timing, or compiler codegen which > > hides the issue. So the issue is still there but it is just a matter > > of time before someone else reports it. > > > > Out of curiosity for PPC folks, why cannot 64-bit PPC use per-task > > canary? Michael, is this an optimization? Adding Christophe as well > > since it came in a few years ago via the following commit: > > It uses per-task canary. But unlike PPC32, PPC64 doesn't have a fixed > register pointing to 'current' at all time so the canary is copied into > a per-cpu struct during _switch(). > > If GCC keeps an old value of the per-cpu struct pointer, it then gets > the canary from the wrong CPU struct so from a different task. This is a fruitful learning process for me! Christophe: Do you think there is still a need to bisect GCC ? If so, I am very glad to continue Cheers Zhouyi > > Christophe > > > > > commit 06ec27aea9fc84d9c6d879eb64b5bcf28a8a1eb7 > > Author: Christophe Leroy > > Date: Thu Sep 27 07:05:55 2018 +0000 > > > > powerpc/64: add stack protector support > > > > On PPC64, as register r13 points to the paca_struct at all time, > > this patch adds a copy of the canary there, which is copied at > > task_switch. > > That new canary is then used by using the following GCC options: > > -mstack-protector-guard=3Dtls > > -mstack-protector-guard-reg=3Dr13 > > -mstack-protector-guard-offset=3Doffsetof(struct paca_struct, cana= ry)) > > > > Signed-off-by: Christophe Leroy > > Signed-off-by: Michael Ellerman > > > > - Joel