Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp2094917ybg; Thu, 24 Oct 2019 04:46:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqyCr3fXJyK9B9k2zF8UdUSeSS5kOtRAvWaDypHjz8SKQlrUnI2FMvC2gv9Et6LIVqabCX61 X-Received: by 2002:a17:906:3ecc:: with SMTP id d12mr11748154ejj.52.1571917572519; Thu, 24 Oct 2019 04:46:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571917572; cv=none; d=google.com; s=arc-20160816; b=c28tF0B9RvrdVu2XpLjGkJODW6y4DjUu0RA/Ct+yKHvzC7yeYBKB3z/GQI9xTZz4Yx BZ5KRx54aEBNyDM9T2a0PcGhrCiMbUdqPoPaQJfDfgvyO186uTCOQyumuasGATdGIc16 51qlAGLGOXJ/4+dCwvTzjLVj/0w5YlPJ5X0YqufDbA21icdW209ijL8HqVyPpM3HCBcd Z6JBBWZs9ydgYOub4vLKodbHSGS5yJ97/jKIHTRqorAUSn8oZaLkxbdE07l/d0KmBbxK bN+x8DvglQi0Fo1A7/MxUcoC8zg7QMPJdA3NyD1qMcNDpV4QNaoa4kGnlCGXT+NIaSOa i0qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=BmJlh4WyBNl5ujqJykwBJPqFEruvIfpYSLEoPjqEaF0=; b=oaNgpGRY5UiHkU58O8i/Xltb3m2Kh2FZck76pHb3y9IhQWHv2z8qV9SOIxNMdgLu3/ jSXfOi/WVuoKpBWSKF8IIJH5Xj8FCd+dVrvqRHieLGb8wAgO8sYLA47nK1Tf6F420RDp vQsVF/4Z4jCKtGqj3P2Dh3wXtpKUFmxE2rYzzyVpT0UEkredlyPwwVzwPz+nnhpe5f2n 34zB7S/tii68dZdp7fPHuGWBrLACXX9DMMSiIvvWptmf5DIM+lFER6YgHjsBaEilRkp1 LDsLSOYLSRUSTAyi8mUNdUI3dhY7+cnOR1FTxaABEZ3pR+QeYgdl+Dh6BbrzKRGia692 SfBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=a9kbrGD0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z37si8549922edz.281.2019.10.24.04.45.47; Thu, 24 Oct 2019 04:46:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=a9kbrGD0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727526AbfJWRG4 (ORCPT + 99 others); Wed, 23 Oct 2019 13:06:56 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:32927 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726812AbfJWRG4 (ORCPT ); Wed, 23 Oct 2019 13:06:56 -0400 Received: by mail-qk1-f196.google.com with SMTP id 71so16579013qkl.0 for ; Wed, 23 Oct 2019 10:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BmJlh4WyBNl5ujqJykwBJPqFEruvIfpYSLEoPjqEaF0=; b=a9kbrGD0VSKrvdNU25RKScBnDnByLA4+uj/kwwXdk+N2049BF3bn/9X8wxg/ptPAMI QrwmPUCoCr2vlIwbMjD0s94a5ZRtq3UnE85Ty+T7aavcI8LsBUT41tSH+WWsIq3UgEd9 3stFaNvJhzE3ykEPC/vWgsyEBZ2BKeKLnlI9tFn/8TyLFcC6n3qMFwmu/pPEtuOzY4ft 9PkZkf7Lfy0jsAx2AvV9MKrTU6R2S8hbY86LznBATQx6OVkQO6r5czyyjHCFiL/pKDeL SicEgj+Bq+Kcf5GVp4U+jUi/zO5wIaG7p9Tr5ZJyKdxmp7HFKjaftYR7S0h767gjdxry 0z3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BmJlh4WyBNl5ujqJykwBJPqFEruvIfpYSLEoPjqEaF0=; b=P42CHrfVfpuFBf56MVblhNdDU/r02NtYIDJuHnSJ8C53NI6NV0esKl1eWbSZXX5rfb JTtIlD6z6UF74HyjEnMlkrGQUlMaxhsDg74YWgPz8JPmGex5YV3ElTcO03BgJjVU/vSK s070/MkbRi4IQhTAPonoyhCMIr8ajuu+JZENeMfEWaNiFWZnDcxxKQ+zgpOioFRtwJD5 qsZe2rfwLj6H1Xto8U21YSHLbUMg2EoztjVnpsfzGtlCMtiAlcFAWo4iDiOlFl02eSav 99TCOj6uUc0AvBS+y5tbZAVm68BttTw2hhMms+iIpA76613q9iASUuqr40zwjniTQcEP pJbA== X-Gm-Message-State: APjAAAWerD4A/OtI1pMZHjx5umrAVH2NDZMBfefPzwfC2cFEy9txbW0w fDhhYVyX72RehEak7FLRbd0a4tvXwceJidHiAig7Qg== X-Received: by 2002:a05:620a:16a6:: with SMTP id s6mr8708369qkj.407.1571850414641; Wed, 23 Oct 2019 10:06:54 -0700 (PDT) MIME-Version: 1.0 References: <000000000000328b2905951a7667@google.com> <20191017181709.GA5312@avx2> In-Reply-To: From: Dmitry Vyukov Date: Wed, 23 Oct 2019 19:06:43 +0200 Message-ID: Subject: Re: KCSAN: data-race in task_dump_owner / task_dump_owner To: Marco Elver Cc: Alexey Dobriyan , syzbot , Andrew Morton , Casey Schaufler , Christian Brauner , Kees Cook , Kent Overstreet , Konstantin Khlebnikov , linux-fsdevel , LKML , Michal Hocko , Shakeel Butt , syzkaller-bugs , Thomas Gleixner , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 17, 2019 at 8:33 PM 'Marco Elver' via syzkaller-bugs wrote: > > On Thu, 17 Oct 2019 at 20:17, Alexey Dobriyan wrote: > > > > On Thu, Oct 17, 2019 at 02:56:47PM +0200, Marco Elver wrote: > > > Hi, > > > > > > On Thu, 17 Oct 2019 at 14:36, syzbot > > > wrote: > > > > > > > > Hello, > > > > > > > > syzbot found the following crash on: > > > > > > > > HEAD commit: d724f94f x86, kcsan: Enable KCSAN for x86 > > > > git tree: https://github.com/google/ktsan.git kcsan > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17884db3600000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c0906aa620713d80 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e392f8008a294fdf8891 > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > Reported-by: syzbot+e392f8008a294fdf8891@syzkaller.appspotmail.com > > > > > > > > ================================================================== > > > > BUG: KCSAN: data-race in task_dump_owner / task_dump_owner > > > > > > > > write to 0xffff8881255bb7fc of 4 bytes by task 7804 on cpu 0: > > > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742 > > > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818 > > > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841 > > > > d_revalidate fs/namei.c:765 [inline] > > > > d_revalidate fs/namei.c:762 [inline] > > > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613 > > > > walk_component+0x6d/0xe80 fs/namei.c:1804 > > > > link_path_walk.part.0+0x5d3/0xa90 fs/namei.c:2139 > > > > link_path_walk fs/namei.c:2070 [inline] > > > > path_openat+0x14f/0x3530 fs/namei.c:3532 > > > > do_filp_open+0x11e/0x1b0 fs/namei.c:3563 > > > > do_sys_open+0x3b3/0x4f0 fs/open.c:1089 > > > > __do_sys_open fs/open.c:1107 [inline] > > > > __se_sys_open fs/open.c:1102 [inline] > > > > __x64_sys_open+0x55/0x70 fs/open.c:1102 > > > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296 > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > write to 0xffff8881255bb7fc of 4 bytes by task 7813 on cpu 1: > > > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742 > > > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818 > > > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841 > > > > d_revalidate fs/namei.c:765 [inline] > > > > d_revalidate fs/namei.c:762 [inline] > > > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613 > > > > walk_component+0x6d/0xe80 fs/namei.c:1804 > > > > lookup_last fs/namei.c:2271 [inline] > > > > path_lookupat.isra.0+0x13a/0x5a0 fs/namei.c:2316 > > > > filename_lookup+0x145/0x2d0 fs/namei.c:2346 > > > > user_path_at_empty+0x4c/0x70 fs/namei.c:2606 > > > > user_path_at include/linux/namei.h:60 [inline] > > > > vfs_statx+0xd9/0x190 fs/stat.c:187 > > > > vfs_stat include/linux/fs.h:3188 [inline] > > > > __do_sys_newstat+0x51/0xb0 fs/stat.c:341 > > > > __se_sys_newstat fs/stat.c:337 [inline] > > > > __x64_sys_newstat+0x3a/0x50 fs/stat.c:337 > > > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296 > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > CPU: 1 PID: 7813 Comm: ps Not tainted 5.3.0+ #0 > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > > Google 01/01/2011 > > > > ================================================================== > > > > > > My understanding is, that for every access to /proc/, > > > d_revalidate is called, and /proc-fs implementation simply says that > > > pid_revalidate always revalidates by rewriting uid/gid because "owning > > > task may have performed a setuid(), etc." presumably so every access > > > to a /proc/ entry always has the right uid/gid (in effect > > > updating /proc/ lazily via d_revalidate). > > > > > > Is it possible that one of the tasks above could be preempted after > > > doing its writes to *ruid/*rgid, another thread writing some other > > > values (after setuid / seteuid), and then the preempted thread seeing > > > the other values? Assertion here should never fail: > > > === TASK 1 === > > > | seteuid(1000); > > > | seteuid(0); > > > | stat("/proc/", &fstat); > > > | assert(fstat.st_uid == 0); > > > === TASK 2 === > > > | stat("/proc/", ...); > > > > Is it the same as > > pid_revalidate() snapshots (uid,gid) correctly > > but writeback is done in any order? > > Yes, I think so. Snapshot is done in RCU reader critical section, but > the writes can race with another thread. Is there logic that ensures > this doesn't lead to the observable outcome above? I found the case where this leads to an observable bug. common_perm_cond() in security/apparmor/lsm.c reads the inode uid and uses it for the security check: static int common_perm_cond(const char *op, const struct path *path, u32 mask) { struct path_cond cond = { d_backing_inode(path->dentry)->i_uid, d_backing_inode(path->dentry)->i_mode }; Now consider the following test program: #define _GNU_SOURCE #include #include #include #include #include #include void *thr(void *arg) { for (;;) { struct stat file_stat; stat((char*)arg, &file_stat); } return 0; } int main(int argc, char *argv[]) { char proc[32]; sprintf(proc, "/proc/%d", getpid()); printf("%s\n", proc); pthread_t th; pthread_create(&th, 0, thr, proc); for (;;) { seteuid(1000); usleep(1); seteuid(0); struct stat file_stat; stat(proc, &file_stat); } return 0; } Whenever the main thread does stat, it must observe inode.uid == 0 in common_perm_cond(). But since task_dump_owner() does writeback out of order, it can lead to non-linearizable executions and main thread observing inode.uid == 1000. This in turn can lead to both false negatives and false positives from AppArmour (false denying access and falsely permitting access). I don't know how to setup actual AppArmour profile to do this, but I see this guide mentions "owner @{PROC}/[0-9]*" in a policy, so I assume it's possible: https://gitlab.com/apparmor/apparmor/wikis/Profiling_by_hand Instead, I added the following check to common_perm_cond() (it's dirty, but you get the idea): @@ -218,6 +218,15 @@ static int common_perm_cond(const char *op, const struct path *path, u32 mask) d_backing_inode(path->dentry)->i_mode }; + if (op == OP_GETATTR && mask == AA_MAY_GETATTR && cond.uid.val != 0) { + char buf1[64], buf2[64]; + char *str = d_path(path, buf1, sizeof(buf1)); + sprintf(buf2, "/proc/%d", current->pid); + if (!strcmp(str, buf2)) + pr_err("common_perm_cond: path=%s pid=%d uid=%d\n", + str, current->pid, cond.uid.val); + } Now when I run the program, I see how it fires every few seconds: # ./a.out /proc/1548 [ 123.233107] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 [ 126.142869] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 [ 127.048353] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 [ 128.181873] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 [ 128.557104] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 [ 144.690774] common_perm_cond: path=/proc/1548 pid=1548 uid=1000 Which means AppArmour acts based on the wrong UID. Obviously can lead to falsely denying access, but also falsely permitting access. Consider the following scenario. A process sets owner UID on a file so that a child process won't be able to access it, after that it starts the child process. common_perm_cond() in the child process should observe the new owner UID. However, if there a random other process simply doing stat() or something similar on the file, now the common_perm_cond() in the child can suddenly observe the old UID, which will be permitted by AppArmour. Boom! I've tried to apply "proc: fix inode uid/gid writeback race": https://lore.kernel.org/lkml/20191020173010.GA14744@avx2/ but it does _not_ help because it does not really resolve the non-atomic snapshot and writeback of UID.