Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2278578ioo; Mon, 23 May 2022 14:40:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/2oHOki6Lp8HzmCM+FMBc579gi+3mONz1PEBw2LRkAdyo3nMSVNGt4Law4nv3r8k5dhYU X-Received: by 2002:a05:6a00:1a91:b0:518:7bca:d095 with SMTP id e17-20020a056a001a9100b005187bcad095mr14962556pfv.13.1653342025371; Mon, 23 May 2022 14:40:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653342025; cv=none; d=google.com; s=arc-20160816; b=JwSU2QBvL5GXuIXvRcc7aT8ZaHx5pDL+HE4zTY3DDykPfc6E+1m/TQGXO2Yt7reXVu Z0JodX7QhObL0hLnPV/kqArt5VUzsFG23sLYOs4vlUCuhnJffp+rQ6dXZ1lcOnjExiXI WndbB+r6ws+iVIyj0D268GpeCjtTLY39QJukL+vzBogbAEmOF8y2rrjcmX8HnPbgNh2O htQ7eAKy8cNqKuZCBQHBnQvldv9wt8DPQSiGHYtXEW5JLRNAmwj3NGVMxUkwM3jjihkH wXf+GWA8FPikV9L+Uo0klJapB3TMpy4G9rFNVV1ezThetgMTu1bAe135oVu4xsMDKb0G YY4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=zzdR40l4t49BhcKjBeDt6RD1hRuvcftxidhZX+LxdXA=; b=LQVWSMZFqYyfefKzK71S8f5kb6oaRfXGH1mPS/0rXOsY+2YTL+SURAICQYx6fU7Phq rMfpTqgaOl1rSCI8+EAnphtpYGuTsi5VUXYOUJeePPPp2lRby91GZ7v27nXMOYqZQ60e UrC6e5WxbaIAxI0O+JeZtHEYPVHGKybPigBKGkZIjDKbbaXAyvRA0CJPJtpr9L2I/iJ2 hUtmW4nI6a2B6Hadbw19uvkj87tCvt7IZxVaq57uzrHREYntl3HQRxLuzPBmMs5pSABg aw/z1t+j4dgRtFWAwvkuUJUNDucEyoeEvzHRN5P3pNMHi9tjCNqpcs9vyxCtFEm63yZe 0Pfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DZdJON7t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t19-20020a63eb13000000b003f60495f184si10149452pgh.236.2022.05.23.14.40.14; Mon, 23 May 2022 14:40:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DZdJON7t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229645AbiEWU4z (ORCPT + 99 others); Mon, 23 May 2022 16:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbiEWU4v (ORCPT ); Mon, 23 May 2022 16:56:51 -0400 Received: from mail-vk1-xa2e.google.com (mail-vk1-xa2e.google.com [IPv6:2607:f8b0:4864:20::a2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEABD24090 for ; Mon, 23 May 2022 13:56:49 -0700 (PDT) Received: by mail-vk1-xa2e.google.com with SMTP id x11so7155938vkn.11 for ; Mon, 23 May 2022 13:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zzdR40l4t49BhcKjBeDt6RD1hRuvcftxidhZX+LxdXA=; b=DZdJON7t3PwgCrrmoucnoFvSfX5CB986DtveajogrvKr6pbaUxLeSsuyQgxal0XNhS KtUMABkOtZddGLrKTjmkE65iB7IBOMFLTOkO0T+JtTANcIxtK2pw75L4/fulZM/sIv3h 9w7OjUket0wgm0eyX2hnQquy+pbw3X7FBf0cdOz4/mr3FAKvSrbXpXxx4Px8rX0wzKxu FK6Jc3VfmpkY39qs5xYc4jSO+jajCnvH6nUZ8z49/7le/3VS/Ut/yN2jQrVCSxGc1Eo/ u/QcFtp7hLmI/TtYQlEYEIeQLjQExhL4r5PyZsLRMj4OYJECPC3Edhqg9azGHa3WKDLS wRBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zzdR40l4t49BhcKjBeDt6RD1hRuvcftxidhZX+LxdXA=; b=jSp03P+iX2betISUQC3tz76bVfhYPV0yy6Mkkz5gE5wNi8Vcjvsvt3LgN6Zev3OSiL Rtdat3FugBdvLud6nbLsvNB0TyPE2by49BzpwTimnm/Bn46ypPFpG8TK46hOA5rYuKYC qTnm8ZTbi/9n4k9Hce8mCfeQJwf/fx9+fx+00ac/pfOa9Ega2YmKGMAlilMyr+vGuy3m fuRsYF9VLnYFjpsMYvV0j5Vl1QBNoODZCh707ibzEIkbQw3ZlEs+F8NYXS6iY6pnr1NP BR1yNkxu8+ga0+vKeJgF/oKWA22NUaJtEz+rWDbi5fHc6LOk2wNsOyy6BCZ8XkuSs19v gNug== X-Gm-Message-State: AOAM532p+Gct4+oIPZnvGlu0WWiw7Fa9miHo1E7pLpof8XZzg6Mq4+Rg a0sIuYfnbYOJN0hi45NVIk1hUCZexk6HvVSGo0r1Sw== X-Received: by 2002:a1f:ec45:0:b0:34e:6cdc:334e with SMTP id k66-20020a1fec45000000b0034e6cdc334emr8914633vkh.26.1653339408905; Mon, 23 May 2022 13:56:48 -0700 (PDT) MIME-Version: 1.0 References: <20220507015646.5377-1-hdanton@sina.com> In-Reply-To: From: Yu Zhao Date: Mon, 23 May 2022 14:56:12 -0600 Message-ID: Subject: Re: Alpha: rare random memory corruption/segfault in user space bisected To: Michael Cree Cc: Linux-MM , linux-kernel , Hillf Danton , Joonsoo Kim Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 11, 2022 at 2:37 PM Michael Cree wrote: > > On Sat, May 07, 2022 at 11:27:15AM -0700, Yu Zhao wrote: > > On Fri, May 6, 2022 at 6:57 PM Hillf Danton wrote: > > > > > > On Sat, 7 May 2022 09:21:25 +1200 Michael Cree wrote: > > > > Alpha kernel has been exhibiting rare and random memory > > > > corruptions/segaults in user space since the 5.9.y kernel. First seen > > > > on the Debian Ports build daemon when running 5.10.y kernel resulting > > > > in the occasional (one or two a day) build failures with gcc ICEs either > > > > due to self detected corrupt memory structures or segfaults. Have been > > > > running 5.8.y kernel without such problems for over six months. > > > > > > > > Tried bisecting last year but went off track with incorrect good/bad > > > > determinations due to rare nature of bug. After trying a 5.16.y kernel > > > > early this year and seen the bug is still present retried the bisection > > > > and have got to: > > > > > > > > aae466b0052e1888edd1d7f473d4310d64936196 is the first bad commit > > > > commit aae466b0052e1888edd1d7f473d4310d64936196 > > > > Author: Joonsoo Kim > > > > Date: Tue Aug 11 18:30:50 2020 -0700 > > > > > > > > mm/swap: implement workingset detection for anonymous LRU > > > > This commit seems innocent to me. While not ruling out anything, i.e., > > this commit, compiler, qemu, userspace itself, etc., my wild guess is > > the problem is memory barrier related. Two lock/unlock pairs, which > > imply two full barriers, were removed. This is not a small deal on > > Alpha, since it imposes no constraints on cache coherency, AFAIK. > > > > Can you please try the attached patch on top of this commit? Thanks! > > Thanks, I have that running now for a day without any problem showing > up, but that's not long enough to be sure it has fixed the problem. Will > get back to you after another day or two of testing. Any luck? Thanks!