Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3624286ybf; Tue, 3 Mar 2020 09:17:07 -0800 (PST) X-Google-Smtp-Source: ADFU+vvycKsXRNn3mwgFV+9rlua+wohR/P8De8p1Fro44YrjjrmfJZAZaz92a6oRSYKRs+4+ZeBE X-Received: by 2002:aca:48b:: with SMTP id 133mr3258560oie.26.1583255827841; Tue, 03 Mar 2020 09:17:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583255827; cv=none; d=google.com; s=arc-20160816; b=wOE6p6NxRkj5xFVWhiC9trhAKHMfVHpGb3oA6JVVJEtqoIfbgUukcaw9uzFN8XaST6 C3hetAs6pGrwuAqNy+j0sx7KwbVr9YKNIEHGJK3F6bBYCZz+khXSidZT8OrpMQOxBVbC eIqK0FRkCiN9g0/KiSnvsGkCCoEEpxhVnu0+XkSW9x2/myWF5x7rg8ok1mLGUfqJu7ZH N8LSfkT6gbrfJM3vl/UVmLQ46qa8nqiii2SBAyZ/qE0XjqKye1CasQP2wauz++K1Wh8T qGmgvDuQx0PNM/gAWbmB9bybRHBwoXLi+HOaj1P7OBnYqIY7v0VbEt2JNQWGLXWPze5T /apQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ScbkU4HAVEXYpr6J/KNSyKOPxHj3lMzwfJpy31vHYIA=; b=DcCDPsFw7fBm/A4uagwtrayxIgt8TG2C3Z6MfLmD88KfvgemcM3GkFJ2bxde5377mx +ZVlSPXJ2M9B2HwSNGqr53aKR9yDCRwxv6crT8F2zWHWjJKVdSBJYa1hG+archPNi+a9 liGXKmHYJ7GVZFqnWlwAB9S13BAhs6oN6OSURPTaNmD64cxqkhdXwRq4tDicCzy9KQDf Gn7GnenDIj7Lj08c3mC8hSqeQebye3XQeEiXAE+NeBrxHASwqISTVXUXqNRpH5thtDcn HoD2takuIZCJlBcu5+jL8EG6kifkfFy1nvjXqKPe+IsZSapJfV/26SwnUGUXFXj7k/wP IR1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QQnIXHnD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k24si3477768oik.240.2020.03.03.09.16.55; Tue, 03 Mar 2020 09:17:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QQnIXHnD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729900AbgCCQ61 (ORCPT + 99 others); Tue, 3 Mar 2020 11:58:27 -0500 Received: from mail-oi1-f193.google.com ([209.85.167.193]:41287 "EHLO mail-oi1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726899AbgCCQ61 (ORCPT ); Tue, 3 Mar 2020 11:58:27 -0500 Received: by mail-oi1-f193.google.com with SMTP id i1so3713130oie.8 for ; Tue, 03 Mar 2020 08:58:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ScbkU4HAVEXYpr6J/KNSyKOPxHj3lMzwfJpy31vHYIA=; b=QQnIXHnDRgDAtVyGUu1WuSHqQIHl9XOB/MvTUd8ZqZ57/hDiO++NKjWutJOotOc9Gs YFdM8Mb3kAx3zmXKo2Tltvdz478H2kSlV6SkbCxs9yqVpEWqoqdrFlCfJgI2t4aPEwSz NhhF5I4J9HqNFKeDw2E21bgVgCDVqUAhKs24c9NhXBdm67NPGJNVwpy0AA4XmmPKMyZF M7S8XgUGhgMct5GOy1xnsBUM8edxeBMLHwngwlHPg/bLXCejvKnZkemKMbCxfiYoO553 OXolpr7ADo5jJnOti3uUKuqDBfyVuu+WRbWbTZKSVo6IiEPb5VEKlELPani9HF6yMjO8 QCYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ScbkU4HAVEXYpr6J/KNSyKOPxHj3lMzwfJpy31vHYIA=; b=BBC8haKHOuBCM5ZCVF4RPaBU02tSUVn1pofOjkSKq8ea2q2u2jRAVQjKk/9zP5JmUJ oCPv4U2xtV470zU682/QKPpajv+7AUzQcgVJCyTJL8n2GtvCWJWCjj3M8ZxSMfecNPPF mWr7JkevPYdljiU3VjMD2BCa63g3LcWX5BqorVCM9cN+LsTU1sL4MBgZe3OsrSPdvbzC WoKQZFc++B55BDKP3aiAC49j3M5BtYGnTXWfe0aYzWDXWd1ZP3zgdLomNoPmxpd4VIet ayDBJbK8AJ62tQkwcgzgvcMLaEEcUh9wB0XzmWjox72BmhG5YUnDU0/CuJS8vrjQHn7B 9axQ== X-Gm-Message-State: ANhLgQ3wdnT3SO2l1IFdPsvHShoyi1Ct86Ysbx9J5ozcr16d3mwwnBLE M/1ilAcAZQ++A/BmCjkGu0tAcNZnH8FNw8HN/BLkPw== X-Received: by 2002:a05:6808:8d0:: with SMTP id k16mr3125414oij.68.1583254706145; Tue, 03 Mar 2020 08:58:26 -0800 (PST) MIME-Version: 1.0 References: <1509948.1583226773@warthog.procyon.org.uk> <20200303113814.rsqhljkch6tgorpu@ws.net.home> <20200303130347.GA2302029@kroah.com> <20200303131434.GA2373427@kroah.com> <20200303134316.GA2509660@kroah.com> <20200303142958.GB47158@kroah.com> <20200303165103.GA731597@kroah.com> In-Reply-To: <20200303165103.GA731597@kroah.com> From: Jann Horn Date: Tue, 3 Mar 2020 17:57:58 +0100 Message-ID: Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] To: Greg Kroah-Hartman Cc: Miklos Szeredi , Karel Zak , David Howells , Ian Kent , Christian Brauner , James Bottomley , Steven Whitehouse , Miklos Szeredi , viro , Christian Brauner , "Darrick J. Wong" , Linux API , linux-fsdevel , lkml Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 3, 2020 at 5:51 PM Greg Kroah-Hartman wrote: > On Tue, Mar 03, 2020 at 03:40:24PM +0100, Jann Horn wrote: > > On Tue, Mar 3, 2020 at 3:30 PM Greg Kroah-Hartman > > wrote: > > > On Tue, Mar 03, 2020 at 03:10:50PM +0100, Miklos Szeredi wrote: > > > > On Tue, Mar 3, 2020 at 2:43 PM Greg Kroah-Hartman > > > > wrote: > > > > > > > > > > On Tue, Mar 03, 2020 at 02:34:42PM +0100, Miklos Szeredi wrote: > > > > > > > > > > If buffer is too small to fit the whole file, return error. > > > > > > > > > > Why? What's wrong with just returning the bytes asked for? If someone > > > > > only wants 5 bytes from the front of a file, it should be fine to give > > > > > that to them, right? > > > > > > > > I think we need to signal in some way to the caller that the result > > > > was truncated (see readlink(2), getxattr(2), getcwd(2)), otherwise the > > > > caller might be surprised. > > > > > > But that's not the way a "normal" read works. Short reads are fine, if > > > the file isn't big enough. That's how char device nodes work all the > > > time as well, and this kind of is like that, or some kind of "stream" to > > > read from. > > > > > > If you think the file is bigger, then you, as the caller, can just pass > > > in a bigger buffer if you want to (i.e. you can stat the thing and > > > determine the size beforehand.) > > > > > > Think of the "normal" use case here, a sysfs read with a PAGE_SIZE > > > buffer. That way userspace "knows" it will always read all of the data > > > it can from the file, we don't have to do any seeking or determining > > > real file size, or anything else like that. > > > > > > We return the number of bytes read as well, so we "know" if we did a > > > short read, and also, you could imply, if the number of bytes read are > > > the exact same as the number of bytes of the buffer, maybe the file is > > > either that exact size, or bigger. > > > > > > This should be "simple", let's not make it complex if we can help it :) > > > > > > > > > Verify that the number of bytes read matches the file size, otherwise > > > > > > return error (may need to loop?). > > > > > > > > > > No, we can't "match file size" as sysfs files do not really have a sane > > > > > "size". So I don't want to loop at all here, one-shot, that's all you > > > > > get :) > > > > > > > > Hmm. I understand the no-size thing. But looping until EOF (i.e. > > > > until read return zero) might be a good idea regardless, because short > > > > reads are allowed. > > > > > > If you want to loop, then do a userspace open/read-loop/close cycle. > > > That's not what this syscall should be for. > > > > > > Should we call it: readfile-only-one-try-i-hope-my-buffer-is-big-enough()? :) > > > > So how is this supposed to work in e.g. the following case? [...] > > int maps = open("/proc/self/maps", O_RDONLY); > > static char buf[0x100000]; > > int res; > > do { > > res = read(maps, buf, sizeof(buf)); > > } while (res > 0); > > } [...] > > > > The kernel is randomly returning short reads *with different lengths* > > that are vaguely around PAGE_SIZE, no matter how big the buffer > > supplied by userspace is. And while repeated read() calls will return > > consistent state thanks to the seqfile magic, repeated readfile() > > calls will probably return garbage with half-complete lines. > > Ah crap, I forgot about seqfile, I was only considering the "simple" > cases that sysfs provides. > > Ok, Miklos, you were totally right, I'll loop and read until the end of > file or buffer, which ever comes first. I wonder what we should do when one of the later reads returns an error code. As in, we start the first read, get a short read (maybe because a signal arrived), try a second read, get -EINTR. Do we just return the error code? That'd probably work fine for most usecases - e.g. if "top" is reading stuff from procfs, and that gets interrupted by SIGWINCH or so, it doesn't matter that we've already started the first read; the only thing "top" really needs to know is that the read was a short read and it has to retry.