Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3155928imu; Sun, 6 Jan 2019 20:35:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/VNR/WZBzv76S0NQL2kzhc+k+21OBq8y/nvFZ1MF2LujZ8Vz3YDqZV+TU4JiZUYBqfI+Q+o X-Received: by 2002:a62:9657:: with SMTP id c84mr63093704pfe.77.1546835759433; Sun, 06 Jan 2019 20:35:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546835759; cv=none; d=google.com; s=arc-20160816; b=CPCloN/qZlGNG1zNfp1uS2ZPP3GKG/QhcC2U5UuKRNkzaTud0dGieDE46k8HKqJMPe ke1EISB0hJvdPiXy8+5M8RuMT4Lb6kAJB25DkmqOnyoTqzMtHkAym+1Ryo4gBrgvYeSU X5/TOxHjJtbL7IKgIb9gx0BqPaz1RiK0DTwn+pK2PezIAmQKAWabcADg37Xp3Ya/wkF8 Tyr83NjiKCD+ToafUKOKsu8RSX37OQGpMSJfZ5kMDZoyf42yUtBMs8o3nXir36REKaq3 mIyspVKeuIvQLoZCChyli3Jgve9Z1RCD+loe1ezBpwaP8D53Kj2R1TaoaQSXs1yMPc47 AkCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:message-id:subject:cc:to:from:date; bh=LUMrzCsiIFY5juTbXVCAbh9L4iMKKHKbCpCKwVktw4g=; b=MOj7ExR0QmymZ74wKzR8dAvFQ18M9Jo8NPT9PX3hJ5Z24KRETOQke102G+MLajLEgQ j80pNv3dYpjbghmBA+f09hPh0KWOBBAd3rjEc9zoA+b//iODOGo1oFhim8J32OeNiIS7 T4fNbzxIotylrMGx6bpVM0dA2OISynQP+KhA+FZ0/qCn+pva1/Dq5vgHgZWtzp/Y5gQy XOb/qEsA4r1ShB/Yp3Bf3zAcoSBdU42JdY1dpylogU2DoceCrT+QC4tqfWvi65THFWiQ P/jRn+G+i9Gh3XSpoxglD/16zw94yFSXr+NTlTkofR9GgvPMxS/XmAhxS1UtaUxHLFvu p0ZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o127si6526287pfo.251.2019.01.06.20.35.31; Sun, 06 Jan 2019 20:35:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726387AbfAGEcp (ORCPT + 99 others); Sun, 6 Jan 2019 23:32:45 -0500 Received: from nautica.notk.org ([91.121.71.147]:45534 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726273AbfAGEcp (ORCPT ); Sun, 6 Jan 2019 23:32:45 -0500 Received: by nautica.notk.org (Postfix, from userid 1001) id 44190C009; Mon, 7 Jan 2019 05:32:42 +0100 (CET) Date: Mon, 7 Jan 2019 05:32:27 +0100 From: Dominique Martinet To: Linus Torvalds Cc: Matthew Wilcox , Jann Horn , Jiri Kosina , Andrew Morton , Greg KH , Peter Zijlstra , Michal Hocko , Linux-MM , kernel list , Linux API Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged Message-ID: <20190107043227.GA3325@nautica> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds wrote on Sat, Jan 05, 2019: > But I think my patch to just rip out all that page lookup, and just > base it on the page table state has the fundamental advantage that it > gets rid of code. Maybe I should jst commit it, and see if anything > breaks? We do have options in case things break, and then we'd at > least know who cares (and perhaps a lot more information of _why_ they > care). There actually are many tools like fincore which depend on mincore to try to tell whether a file is "loaded in cache" or not (I personally use vmtouch[1], but I know of at least nocache[2] uses it as well to only try to evict used pages) [1] https://hoytech.com/vmtouch/ [2] https://github.com/Feh/nocache I mostly use these to either fadvise(POSIX_FADV_DONTNEED) or prefetch/lock whole files so my "production" use-cases don't actually rely on the mincore part of them; but when playing with these actions it's actually fairly useful to be able to visualize which part of a file ended in cache or monitor how a file's content evolve in cache... There are various non-obvious behaviours where being able to poke around is enlightening (e.g. fadvise dontneed is actually a hint, so even if nothing uses the file linux sometimes keep the data around if it thinks that would be useful and nocache has a mode to call fadvise multiple times and things like that...) Anyway, I agree the use of mincore for this is rather ugly, and frankly some "cache management API" might be better in the long run if only for performance reason (don't try these tools on a hundred TB sparse file...), but until that pipe dream comes true I think mincore as it was is useful for system admins. Linus Torvalds wrote on Sun, Jan 06, 2019: > I decided to just apply that patch. It is *not* marked for stable, > very intentionally, because I expect that we will need to wait and see > if there are issues with it, and whether we might have to do something > entirely different (more like the traditional behavior with some extra > "only for owner" logic). FWIW I personally don't care much about "only for owner" or depending on mmap options; I don't understand much of the security implications honestly so I'm not sure how these limitations actually help. On the other hand, a simple CAP_SYS_ADMIN check making the call take either behaviour should be safe and would cover what I described above. (by the way, while we are discussing permissions, a regular user can use fadvise dontneed on files it doesn't own as well as long as it can open them for reading; I'm not sure if that would need restricting as well in the context of the security issue. Frankly even with mincore someone could likely tell the difference through timing, if they just do it a few times. Do magic, probe, flush out, repeat until satisfied.) Thanks, -- Dominique Martinet | Asmadeus