Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp2829208ybg; Thu, 24 Oct 2019 16:10:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqwDqZSafwBy1Gvte1LRTlsKIyPw7ZxjnsD150dh1pGPZX6vHylYiEslkdO3qD+NP+0JgGfO X-Received: by 2002:a17:906:1ec6:: with SMTP id m6mr637359ejj.6.1571958604205; Thu, 24 Oct 2019 16:10:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571958604; cv=none; d=google.com; s=arc-20160816; b=j1FncYMPRZRn5GSFVbWSdaT7hZy7e1KIwoR4J8HJxzNhgC46h2Y9aQMxM9rHTgbetz 3ioHwFkWMzpT9yQ1jwCVK5D6NjruA7CJ5wzR05DRyl2MxVD5Qy9h0XcuRBY4k8EIiPJG 2GrUFnvEJAxlalL43Q7VK0lxagPIymFtEBmT/lVtiifPaHgiBO4Fs8NpiIXg4Iv8B2wO fd7PeLqMdlhpc44br7F1/bYWLCknHMD+SJyLRUFNCH5099ggK/OzNsd20DMrvlAuI4uS tPHAp1g6Ii6x74sdTDLDiuK/yt/IgQYykkmdX0kGu9S4c/nNqDVPzc24kLlKtyFoQAoY Axng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GZsKEC1nTSBcprLyQbKotCOlid6wyxXVIJjOzWJDTck=; b=lE2KwWaNQBfyNyJrGsDYmPQdBUN27CShZL+TnuNS028//D9uEAdna7+kW8VT6CY6u6 +ALLaOAnU8CL1t/4Zl8ialngcXkWRbZbqu/KdREx5dITz4o88sx54shcSkjZqMh2VU1L IdEyYpdlznoAndfTL7BrCJet4dFKRf1hjmcVMVEwAI8otTICdKXM+TdwwFZ0WO0QMaDu 4FIialk8pZzpE1Yxs/pIL0Pyw/kVtWKqBv6Ejuw3aA03Uh5NNQJmbA1IPuITTbxIzGak zEoxawBP/53m5z3xPSnnh0NOhEVfdGFIQVNyDvsH3ttwGHwk+7cQ3oMtooL6ybQkHtJJ z3hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=X8hKgXst; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z5si26331edk.157.2019.10.24.16.09.40; Thu, 24 Oct 2019 16:10:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=X8hKgXst; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438221AbfJXIfr (ORCPT + 99 others); Thu, 24 Oct 2019 04:35:47 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:42372 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfJXIfr (ORCPT ); Thu, 24 Oct 2019 04:35:47 -0400 Received: by mail-qk1-f196.google.com with SMTP id m4so5741903qke.9 for ; Thu, 24 Oct 2019 01:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GZsKEC1nTSBcprLyQbKotCOlid6wyxXVIJjOzWJDTck=; b=X8hKgXstEYda+YNkDBcUmpq+bM7nNBEw/aHRwZ+8z5aeOsxMuUtuUcPGer4xAEL81y 0O0c9gOkgbPXRtINRdBoVWsxZKSKilCse4fiSDwZUBmH//JBbwDHs871DO2xFStX2wYX e03RL5gKMxcP8QYue+TFySr1F5mmmr6sbxxVJHbI8fdFZi+BA0QhD/M8mRn95SirCgj2 gn7Vd7xhJpzKi3ENG8UVMFbGpvO1hW2lyQBdUKv8/O5lqaTcLMvD91keD0fjENw0jSWx V1MF3e9uZhV7rp8YT+iiLwUw3CTLXcmxdSKavJdBnKBaYvAU/KKR0GXwnjQqWe+t6FNO AqEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GZsKEC1nTSBcprLyQbKotCOlid6wyxXVIJjOzWJDTck=; b=flFa6Bl2I3fKcPiiUEIh6UNbrUCsaG9Ofqwcqi12sC/0K47744hQWjNLBxwtdrz+r0 Zhs0gydyBHB7YYeDwTCdTX9Q7ZRM7A0ljRYm0wy1G4DSK/aF92SAwmfKxi2WDMwyzeEv NP6lJt6DJ2wPEUOJcMvt3p2CqqJxL0FdiLnaqCBHSfTTmMTm26lPPMKXUs3skdEDh9PA dE/ZnW714xb5a9foSGRaw/8SV17nVqPM11M/eSbJzTWK/QJxBUbdWY7WRNIMhP754GlE FFSbI2YbrCbkhduLkJnz1eqBbek7NO/HgfgArSANHi2A+O8RRgLQCEpS0DoOA7eqVAH+ vNxA== X-Gm-Message-State: APjAAAXSeem+D4GexhUS4H9T+4GjOnVR4YxLn3aLx6EuYKWXcHEqYcLZ ifi13HByVLPDJm8JN9ZfFsOrnHkdWTBCkQ1W7mJS2g== X-Received: by 2002:a05:620a:2115:: with SMTP id l21mr304001qkl.407.1571906143770; Thu, 24 Oct 2019 01:35:43 -0700 (PDT) MIME-Version: 1.0 References: <000000000000328b2905951a7667@google.com> <20191017181709.GA5312@avx2> In-Reply-To: From: Dmitry Vyukov Date: Thu, 24 Oct 2019 10:35:32 +0200 Message-ID: Subject: Re: KCSAN: data-race in task_dump_owner / task_dump_owner To: Marco Elver Cc: Alexey Dobriyan , syzbot , Andrew Morton , Casey Schaufler , Christian Brauner , Kees Cook , Kent Overstreet , Konstantin Khlebnikov , linux-fsdevel , LKML , Michal Hocko , Shakeel Butt , syzkaller-bugs , Thomas Gleixner , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 23, 2019 at 7:06 PM Dmitry Vyukov wrote: > > On Thu, Oct 17, 2019 at 8:33 PM 'Marco Elver' via syzkaller-bugs > wrote: > > > > On Thu, 17 Oct 2019 at 20:17, Alexey Dobriyan wrote: > > > > > > On Thu, Oct 17, 2019 at 02:56:47PM +0200, Marco Elver wrote: > > > > Hi, > > > > > > > > On Thu, 17 Oct 2019 at 14:36, syzbot > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > syzbot found the following crash on: > > > > > > > > > > HEAD commit: d724f94f x86, kcsan: Enable KCSAN for x86 > > > > > git tree: https://github.com/google/ktsan.git kcsan > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17884db3600000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c0906aa620713d80 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e392f8008a294fdf8891 > > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > > Reported-by: syzbot+e392f8008a294fdf8891@syzkaller.appspotmail.com > > > > > > > > > > ================================================================== > > > > > BUG: KCSAN: data-race in task_dump_owner / task_dump_owner > > > > > > > > > > write to 0xffff8881255bb7fc of 4 bytes by task 7804 on cpu 0: > > > > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742 > > > > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818 > > > > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841 > > > > > d_revalidate fs/namei.c:765 [inline] > > > > > d_revalidate fs/namei.c:762 [inline] > > > > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613 > > > > > walk_component+0x6d/0xe80 fs/namei.c:1804 > > > > > link_path_walk.part.0+0x5d3/0xa90 fs/namei.c:2139 > > > > > link_path_walk fs/namei.c:2070 [inline] > > > > > path_openat+0x14f/0x3530 fs/namei.c:3532 > > > > > do_filp_open+0x11e/0x1b0 fs/namei.c:3563 > > > > > do_sys_open+0x3b3/0x4f0 fs/open.c:1089 > > > > > __do_sys_open fs/open.c:1107 [inline] > > > > > __se_sys_open fs/open.c:1102 [inline] > > > > > __x64_sys_open+0x55/0x70 fs/open.c:1102 > > > > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296 > > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > > write to 0xffff8881255bb7fc of 4 bytes by task 7813 on cpu 1: > > > > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742 > > > > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818 > > > > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841 > > > > > d_revalidate fs/namei.c:765 [inline] > > > > > d_revalidate fs/namei.c:762 [inline] > > > > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613 > > > > > walk_component+0x6d/0xe80 fs/namei.c:1804 > > > > > lookup_last fs/namei.c:2271 [inline] > > > > > path_lookupat.isra.0+0x13a/0x5a0 fs/namei.c:2316 > > > > > filename_lookup+0x145/0x2d0 fs/namei.c:2346 > > > > > user_path_at_empty+0x4c/0x70 fs/namei.c:2606 > > > > > user_path_at include/linux/namei.h:60 [inline] > > > > > vfs_statx+0xd9/0x190 fs/stat.c:187 > > > > > vfs_stat include/linux/fs.h:3188 [inline] > > > > > __do_sys_newstat+0x51/0xb0 fs/stat.c:341 > > > > > __se_sys_newstat fs/stat.c:337 [inline] > > > > > __x64_sys_newstat+0x3a/0x50 fs/stat.c:337 > > > > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296 > > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > > CPU: 1 PID: 7813 Comm: ps Not tainted 5.3.0+ #0 > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > > > Google 01/01/2011 > > > > > ================================================================== > > > > > > > > My understanding is, that for every access to /proc/, > > > > d_revalidate is called, and /proc-fs implementation simply says that > > > > pid_revalidate always revalidates by rewriting uid/gid because "owning > > > > task may have performed a setuid(), etc." presumably so every access > > > > to a /proc/ entry always has the right uid/gid (in effect > > > > updating /proc/ lazily via d_revalidate). > > > > > > > > Is it possible that one of the tasks above could be preempted after > > > > doing its writes to *ruid/*rgid, another thread writing some other > > > > values (after setuid / seteuid), and then the preempted thread seeing > > > > the other values? Assertion here should never fail: > > > > === TASK 1 === > > > > | seteuid(1000); > > > > | seteuid(0); > > > > | stat("/proc/", &fstat); > > > > | assert(fstat.st_uid == 0); > > > > === TASK 2 === > > > > | stat("/proc/", ...); > > > > > > Is it the same as > > > pid_revalidate() snapshots (uid,gid) correctly > > > but writeback is done in any order? > > > > Yes, I think so. Snapshot is done in RCU reader critical section, but > > the writes can race with another thread. Is there logic that ensures > > this doesn't lead to the observable outcome above? > > > I found the case where this leads to an observable bug. > common_perm_cond() in security/apparmor/lsm.c reads the inode uid and > uses it for the security check: > > static int common_perm_cond(const char *op, const struct path *path, u32 mask) > { > struct path_cond cond = { d_backing_inode(path->dentry)->i_uid, > > d_backing_inode(path->dentry)->i_mode > }; > > Now consider the following test program: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > > void *thr(void *arg) > { > for (;;) { > struct stat file_stat; > stat((char*)arg, &file_stat); > } > return 0; > } > > int main(int argc, char *argv[]) > { > char proc[32]; > sprintf(proc, "/proc/%d", getpid()); > printf("%s\n", proc); > pthread_t th; > pthread_create(&th, 0, thr, proc); > for (;;) { > seteuid(1000); > usleep(1); > seteuid(0); > struct stat file_stat; > stat(proc, &file_stat); > } > return 0; > } > > Whenever the main thread does stat, it must observe inode.uid == 0 in > common_perm_cond(). > > But since task_dump_owner() does writeback out of order, it can lead > to non-linearizable executions and main thread observing inode.uid == > 1000. > This in turn can lead to both false negatives and false positives from > AppArmour (false denying access and falsely permitting access). > > I don't know how to setup actual AppArmour profile to do this, but I > see this guide mentions "owner @{PROC}/[0-9]*" in a policy, so I > assume it's possible: > https://gitlab.com/apparmor/apparmor/wikis/Profiling_by_hand > > Instead, I added the following check to common_perm_cond() (it's > dirty, but you get the idea): > > @@ -218,6 +218,15 @@ static int common_perm_cond(const char *op, const > struct path *path, u32 mask) > d_backing_inode(path->dentry)->i_mode > }; > + if (op == OP_GETATTR && mask == AA_MAY_GETATTR && cond.uid.val != 0) { > + char buf1[64], buf2[64]; > + char *str = d_path(path, buf1, sizeof(buf1)); > + sprintf(buf2, "/proc/%d", current->pid); > + if (!strcmp(str, buf2)) > + pr_err("common_perm_cond: path=%s pid=%d uid=%d\n", > + str, current->pid, cond.uid.val); > + } > > Now when I run the program, I see how it fires every few seconds: > > # ./a.out > /proc/1548 > [ 123.233107] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > [ 126.142869] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > [ 127.048353] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > [ 128.181873] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > [ 128.557104] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > [ 144.690774] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 > > Which means AppArmour acts based on the wrong UID. Obviously can lead > to falsely denying access, but also falsely permitting access. > Consider the following scenario. > A process sets owner UID on a file so that a child process won't be > able to access it, after that it starts the child process. > common_perm_cond() in the child process should observe the new owner > UID. However, if there a random other process simply doing stat() or > something similar on the file, now the common_perm_cond() in the child > can suddenly observe the old UID, which will be permitted by > AppArmour. Boom! > > I've tried to apply "proc: fix inode uid/gid writeback race": > https://lore.kernel.org/lkml/20191020173010.GA14744@avx2/ > but it does _not_ help because it does not really resolve the > non-atomic snapshot and writeback of UID. FTR here is the corresponding race report: https://groups.google.com/d/msg/syzkaller-bugs/Cs07ly_Nmtg/avW672LrAQAJ