Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp414048imu; Fri, 11 Jan 2019 02:40:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN4zQ/SiEZdp+MBIfAwe541eI3Bhsh0LkHp04pIGP9LQ8gsnwEQuOXMJIoEYjDUWGgGTt7u+ X-Received: by 2002:a63:101:: with SMTP id 1mr10685923pgb.152.1547203248494; Fri, 11 Jan 2019 02:40:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547203248; cv=none; d=google.com; s=arc-20160816; b=RNpOG5WblaY5kruHVA46BC6JUFFpkGr5bKWySvKThJbQm+TvPDXPDY4oh1KwQJ2S/0 g7RM2r2Rk2P5lz+PdA5PPYVY/Ou4WoxDYZcj5y1ewsMm5UEVTeiblAaszI7a1GQBAxrM x1oGCXWWFSsH40ASRC91rJofSNxTK4Yk3W5+7l+aFHPIlXEYVy0F/CH+i1v+V2fgbuMX 0rgHLxoX4DjUZJuNmievk1NnZXvFIWDcKQb2nepy2FZI0dqpq7ECWGgeZmlaCw+Cl7uC ez/foSs2HE+q4fqVasLbIfipN3LW1MM8alTQZdY73vxBANdRSLLGkhHpZIWn7yEeHP1T Ma3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=V/NKTajC+gE6aHdVIsV6Wm9W3EAW3ht73RNtDtDRzCk=; b=BdxT3ICXEgtd+PY6aTRtDw2Sg7JMvvH+DjlID/uYovYfeGQnwAcJecAHxzxMHSL15p 1lpa2NGNq5XnF8DVU8qkd/PX+ywTKsrXOVQgoa7d8F4MEve9WnHcByQeZZ24EBFAB063 P2TuD+k4vjfk0gnwwPW/n0fEKaomLURK7NDONk/sLWEGpYdaGl2u4tsX0SOUTt6Y4XPt n68Hcf78ylQTCQeg9OuaLIMMTMJuceqOyIY73VVUhwrbS2Lo3I+8QifJmOAgNPA2Xo5I 2396F4dEbrFL47pqKTOF3URgZ44YiFd6bpAgArGuc/9vtOsd6LIYVD0IdgakOhdKbj+W vg2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k64si70654777pge.7.2019.01.11.02.40.33; Fri, 11 Jan 2019 02:40:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731057AbfAKHUo (ORCPT + 99 others); Fri, 11 Jan 2019 02:20:44 -0500 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:44342 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728912AbfAKHUo (ORCPT ); Fri, 11 Jan 2019 02:20:44 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail06.adl6.internode.on.net with ESMTP; 11 Jan 2019 17:50:41 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1ghr7Q-0003eH-0C; Fri, 11 Jan 2019 18:20:40 +1100 Date: Fri, 11 Jan 2019 18:20:39 +1100 From: Dave Chinner To: Andy Lutomirski Cc: Linus Torvalds , Dominique Martinet , Jiri Kosina , Matthew Wilcox , Jann Horn , Andrew Morton , Greg KH , Peter Zijlstra , Michal Hocko , Linux-MM , kernel list , Linux API Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged Message-ID: <20190111072039.GO27534@dastard> References: <20190110004424.GH27534@dastard> <20190110070355.GJ27534@dastard> <20190110122442.GA21216@nautica> <20190111020340.GM27534@dastard> <20190111040434.GN27534@dastard> <6955E7C1-A61C-49F3-8BB6-0624D5A70BD6@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6955E7C1-A61C-49F3-8BB6-0624D5A70BD6@amacapital.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 10, 2019 at 08:08:37PM -0800, Andy Lutomirski wrote: > > On Jan 10, 2019, at 8:04 PM, Dave Chinner > > wrote: > > > >> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds > >> wrote: > >>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner > >>> wrote: > >>> > >>>> On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds > >>>> wrote: And we *can* do sane things about RWF_NOWAIT. For > >>>> example, we could start async IO on RWF_NOWAIT, and suddenly > >>>> it would go from "probe the page cache" to "probe and fill", > >>>> and be much harder to use as an attack vector.. > >>> > >>> We can only do that if the application submits the read via > >>> AIO and has an async IO completion reporting mechanism. > >> > >> Oh, no, you misunderstand. > >> > >> RWF_NOWAIT has a lot of situations where it will potentially > >> return early (the DAX and direct IO ones have their own), but I > >> was thinking of the one in generic_file_buffered_read(), which > >> triggers when you don't find a page mapping. That looks like > >> the obvious "probe page cache" case. > >> > >> But we could literally move that test down just a few lines. > >> Let it start read-ahead. > >> > >> .. and then it will actually trigger on the *second* case > >> instead, where we have > >> > >> if (!PageUptodate(page)) { if (iocb->ki_flags & > >> IOCB_NOWAIT) { put_page(page); goto would_block; > >> } > >> > >> and that's where RWF_MNOWAIT would act. > >> > >> It would still return EAGAIN. > >> > >> But it would have started filling the page cache. So now the > >> act of probing would fill the page cache, and the attacker > >> would be left high and dry - the fact that the page cache now > >> exists is because of the attack, not because of whatever it was > >> trying to measure. > >> > >> See? > > > > Except for fadvise(POSIX_FADV_RANDOM) which triggers this code > > in page_cache_sync_readahead(): > > > > /* be dumb */ if (filp && (filp->f_mode & FMODE_RANDOM)) > > { force_page_cache_readahead(mapping, filp, offset, > > req_size); return; } > > > > So it will only read the single page we tried to access and > > won't perturb the rest of the message encoded into subsequent > > pages in file. > > There are two types of attacks. One is an intentional side > channel where two cooperating processes communicate. This is, > under some circumstances, a problem, Yes, that's the covert communication channel that can cross container and machine boundaries without any required privileges. > but it’s not one > we’re about to solve in general. The other is an attacker > monitoring an unwilling process. Which uses exactly the same mechanisms as the first case. i.e. controlled invalidation and page cache residency monitoring.If we aren't going to solve the first problem case, the we aren't going to solve the second because they are one and the same problem... However, I suspect you have misunderstood the monitoring mechanism here - dispatch IO for this page doesn't prevent the information leak about that page. It's when we return EAGAIN that we leak information about page cache residency. What Linus is attempting to do is perturb the nearby state of the page cache by triggering async readahead in the EAGAIN case. Async readahead will fill all the holes in readahead window and hence destroy the information about where the page fault landed and instantiated the page cache. That would prevent the attacker from determining what code the executable is running as they would only be able to check a single page in an executable at a time and that makes the attack highly impractical. But if the attacker uses FADV_RANDOM, readahead is only triggered for the page the attacker is trying to read. Hence it does not disturb the nearby page cache residency pattern the executable's page faults left behind and so doesn't destroy the information that they are trying to extract from the unwilling process. Sure, Linus's change makes monitoring the executable file after FADV_RANDOM a "read-once" mechanism. However, detection of what code is executing is a repeated invalidate-and-sweep exercise to begin with, so it basically doesn't change the information or the rate at which the monitoring process can extract information from the file. /me hasn't thought about this sort of stuff since he was running page cache invalidation attacks on Irix system libraries way back in 2002 when he found a libc bug that killed the init process and paniced the kernel. This isn't my first rodeo - it's been well known for a long, long time that the system page cache can be exploited to monitor executing code... Cheers, Dave. -- Dave Chinner david@fromorbit.com