Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753949AbbETOJQ (ORCPT ); Wed, 20 May 2015 10:09:16 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:35737 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753713AbbETOJA convert rfc822-to-8bit (ORCPT ); Wed, 20 May 2015 10:09:00 -0400 MIME-Version: 1.0 Reply-To: mtk.manpages@gmail.com In-Reply-To: <554DCB33.8080101@gmail.com> References: <554DCB33.8080101@gmail.com> From: "Michael Kerrisk (man-pages)" Date: Wed, 20 May 2015 16:08:37 +0200 Message-ID: Subject: Re: sysctl_writes_strict documentation + an oddity? To: Kees Cook Cc: Michael Kerrisk , lkml , Andrew Morton , "linux-man@vger.kernel.org" , Randy Dunlap Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6652 Lines: 187 Hello Kees, Ping on the below! Cheers, Michael On 9 May 2015 at 10:54, Michael Kerrisk (man-pages) wrote: > Hi Kees, > > I discovered that you added /proc/sys/kernel/sysctl_writes_strict in > Linux 3.16. In passing, I'll just mention that was an API change that > should have been CCed to linux-api@vger.kernel.org. > > Anyway, I've tried to write this file up for the proc(5) man page, > and I have two requests: > > 1) Could you review this text? > 2) I've found some behavior that surprised me, and I am wondering if it > is intended. Could you let me know your thoughts? > > ===== 1) man-page text ===== > > The man-page text, heavily based on your text in > Documentation/sysctl/kernel.txt, is as follows: > > /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16) > The value in this file determines how the file offset > affects the behavior of updating entries in files under > /proc/sys. The file has three possible values: > > -1 This provides legacy handling, with no printk warn‐ > ings. Each write(2) must fully contain the value to > be written, and multiple writes on the same file > descriptor will overwrite the entire value, regardless > of the file position. > > 0 (default) This provides the same behavior as for -1, > but printk warnings are written for processes that > perform writes when the file offset is not 0. > > 1 Respect the file offset when writing strings into > /proc/sys files. Multiple writes will append to the > value buffer. Anything written beyond the maximum > length of the value buffer will be ignored. Writes to > numeric /proc/sys entries must always be at file off‐ > set 0 and the value must be fully contained in the > buffer provided to write(2). > > ===== 2) Behavior puzzle (a) ===== > > The last sentence quoted from the man page was based on your sentence > > Writes to numeric sysctl entries must always be at file position 0 > and the value must be fully contained in the buffer sent in the write > syscall. > > So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to > mean that if one writes into a numeric /proc/sys file at an offset > other than zero, the write() will fail with some kind of error. > But this seems not to be the case. Instead, the write() succeeds, > but the file is left unmodified. That's surprising, I find. So, I'm > wondering whether the implementation deviates from your intention. > > There's a test program below, which takes arguments as follows > > ./a.out pathname offset string > > And here's a test run that demonstrates the behavior: > > $ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict" > $ cat /proc/sys/kernel/pid_max > 32768 > $ sudo dmesg --clear > $ sudo ./a.out /proc/sys/kernel/pid_max 1 3000 > write() succeeded (return value 4) > $ cat /proc/sys/kernel/pid_max > 32768 > $ dmesg > > As you can see above, an attempt was made to write into the > /proc/sys/kernel/pid_max file at offset 1. > The write() returned successfully (reporting 4 bytes written) > but the file contents were unchanged, and no printk() warning > was issued. Is this intended behavior? > > ===== 2) Behavior puzzle (b) ===== > > In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note: > > This adds the sysctl kernel.sysctl_writes_strict to control the write > behavior. The default (0) reports when VFS position is non-0 on a > write, but retains legacy behavior, -1 disables the warning, and 1 > enables the position-respecting behavior. > > The long-term plan here is to wait for userspace to be fixed in response > to the new warning and to then switch the default kernel behavior to the > new position-respecting behavior. > > (That last para was added to the commit message by AKPM, I see.) > > But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0 > is going to help with the long-term plan. The problem is that in > warn_sysctl_write(), pr_warn_once() is used. This means that only > the first offending user-space application that writes to *any* > /proc/sys file will generate the printk warning. If that application > isn't fixed, then none of the other "broken" applications will be > discovered. It therefore seems possible that it could be a very long > time before we could "switch the default kernel behavior to the > new position-respecting behavior". > > Looking over old mails > (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240), > I see that you're aware of the problem, but it seems to me that > the switch to pr_warn_once() (for fear of spamming the log) likely > dooms the long-term plan to failure. Your thoughts? > > Cheers, > > Michael > > > 8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x-- > > #include > #include > #include > #include > #include > #include > #include > > #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) > > int > main(int argc, char *argv[]) > { > char *pathname; > off_t offset; > char *string; > int fd; > ssize_t numWritten; > > if (argc != 4) { > fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]); > exit(EXIT_FAILURE); > } > > pathname = argv[1]; > offset = strtoll(argv[2], NULL, 0); > string = argv[3]; > > fd = open(pathname, O_RDWR); > if (fd == -1) > errExit("open"); > > if (lseek(fd, offset, SEEK_SET) == -1) > errExit("lseek"); > > numWritten = write(fd, string, strlen(string)); > if (numWritten == -1) > errExit("write"); > > printf("write() succeeded (return value %zd)\n", numWritten); > > exit(EXIT_SUCCESS); > } > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/