Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762942AbXHFGjy (ORCPT ); Mon, 6 Aug 2007 02:39:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754011AbXHFGjp (ORCPT ); Mon, 6 Aug 2007 02:39:45 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:46924 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753859AbXHFGjo (ORCPT ); Mon, 6 Aug 2007 02:39:44 -0400 Date: Mon, 6 Aug 2007 08:39:09 +0200 From: Ingo Molnar To: Dave Jones , Alan Cox , J??rn Engel , Jeff Garzik , Linus Torvalds , Peter Zijlstra , linux-mm@kvack.org, Linux Kernel Mailing List , miklos@szeredi.hu, akpm@linux-foundation.org, neilb@suse.de, dgc@sgi.com, tomoki.sekiyama.qu@hitachi.com, nikita@clusterfs.com, trond.myklebust@fys.uio.no, yingchao.zhou@gmail.com, richard@rsk.demon.co.uk, david@lang.hm Subject: Re: [PATCH 00/23] per device dirty throttling -v8 Message-ID: <20070806063909.GB31321@elte.hu> References: <46B4C0A8.1000902@garzik.org> <20070804191205.GA24723@lazybastard.org> <20070804192130.GA25346@elte.hu> <20070804211156.5f600d80@the-village.bc.nu> <20070804202830.GA4538@elte.hu> <20070804210351.GA9784@elte.hu> <20070804225121.5c7b66e0@the-village.bc.nu> <20070805072141.GA4414@elte.hu> <20070805184408.GB22639@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070805184408.GB22639@redhat.com> User-Agent: Mutt/1.5.14 (2007-02-12) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: 1.0 X-ELTE-SpamLevel: s X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=1.0 required=5.9 tests=BAYES_50 autolearn=no SpamAssassin version=3.0.3 1.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8746 Lines: 289 * Dave Jones wrote: > > btw., Mutt does not go boom, i use it myself. It works just fine > > and notices new mails even on a noatime,nodiratime filesystem. > > It still fails miserably for me. > > If I hit 'C' and '?' I get a list of my mail folders, with some of > them marked 'N' if they have new mail. Without atime, those N's never > show up and every mbox looks like it has no new mail. does it work with the "atime on steroids" patch below? (no need to configure anything, just apply the patch and go.) Ingo -----------------------> Subject: [patch] [patch] implement smarter atime updates support From: Ingo Molnar change relatime updates to be performed once per day. This makes relatime a compatible solution for HSM, mailer-notification and tmpwatch applications too. also add the CONFIG_DEFAULT_RELATIME kernel option, which makes "norelatime" the default for all mounts without an extra kernel boot option. add the "default_relatime=0" boot option to turn this off. also add the /proc/sys/kernel/default_relatime flag which can be changed runtime to modify the behavior of subsequent new mounts. tested by moving the date forward: # date Sun Aug 5 22:55:14 CEST 2007 # date -s "Tue Aug 7 22:55:14 CEST 2007" Tue Aug 7 22:55:14 CEST 2007 access to a file did not generate disk IO before the date was set, and it generated exactly one IO after the date was set. Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 8 +++++ fs/Kconfig | 22 ++++++++++++++ fs/inode.c | 53 +++++++++++++++++++++++++++--------- fs/namespace.c | 24 ++++++++++++++++ include/linux/mount.h | 3 ++ kernel/sysctl.c | 17 +++++++++++ 6 files changed, 114 insertions(+), 13 deletions(-) Index: linux/Documentation/kernel-parameters.txt =================================================================== --- linux.orig/Documentation/kernel-parameters.txt +++ linux/Documentation/kernel-parameters.txt @@ -525,6 +525,10 @@ and is between 256 and 4096 characters. This is a 16-member array composed of values ranging from 0-255. + default_relatime= + [FS] mount all filesystems with relative atime + updates by default. + default_utf8= [VT] Format=<0|1> Set system-wide default UTF-8 mode for all tty's. @@ -1468,6 +1472,10 @@ and is between 256 and 4096 characters. Format: [,[,...]] See arch/*/kernel/reboot.c or arch/*/kernel/process.c + relatime_interval= + [FS] relative atime update frequency, in seconds. + (default: 1 day: 86400 seconds) + reserve= [KNL,BUGS] Force the kernel to ignore some iomem area reservetop= [X86-32] Index: linux/fs/Kconfig =================================================================== --- linux.orig/fs/Kconfig +++ linux/fs/Kconfig @@ -2060,6 +2060,28 @@ config 9P_FS endmenu +config DEFAULT_RELATIME + bool "Mount all filesystems with relatime by default" + default y + help + If you say Y here, all your filesystems will be mounted + with the "relatime" mount option. This eliminates many atime + ('file last accessed' timestamp) updates (which otherwise + is performed on every file access and generates a write + IO to the inode) and thus speeds up IO. Atime is still updated, + but only once per day. + + The mtime ('file last modified') and ctime ('file created') + timestamp are unaffected by this change. + + Use the "norelatime" kernel boot option to turn off this + feature. + +config DEFAULT_RELATIME_VAL + int + default "1" if DEFAULT_RELATIME + default "0" + if BLOCK menu "Partition Types" Index: linux/fs/inode.c =================================================================== --- linux.orig/fs/inode.c +++ linux/fs/inode.c @@ -1162,6 +1162,41 @@ sector_t bmap(struct inode * inode, sect } EXPORT_SYMBOL(bmap); +/* + * Relative atime updates frequency (default: 1 day): + */ +int relatime_interval __read_mostly = 24*60*60; + +/* + * With relative atime, only update atime if the + * previous atime is earlier than either the ctime or + * mtime. + */ +static int relatime_need_update(struct inode *inode, struct timespec now) +{ + /* + * Is mtime younger than atime? If yes, update atime: + */ + if (timespec_compare(&inode->i_mtime, &inode->i_atime) >= 0) + return 1; + /* + * Is ctime younger than atime? If yes, update atime: + */ + if (timespec_compare(&inode->i_ctime, &inode->i_atime) >= 0) + return 1; + + /* + * Is the previous atime value older than a day? If yes, + * update atime: + */ + if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= relatime_interval) + return 1; + /* + * Good, we can skip the atime update: + */ + return 0; +} + /** * touch_atime - update the access time * @mnt: mount the inode is accessed on @@ -1191,22 +1226,14 @@ void touch_atime(struct vfsmount *mnt, s return; if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)) return; - - if (mnt->mnt_flags & MNT_RELATIME) { - /* - * With relative atime, only update atime if the - * previous atime is earlier than either the ctime or - * mtime. - */ - if (timespec_compare(&inode->i_mtime, - &inode->i_atime) < 0 && - timespec_compare(&inode->i_ctime, - &inode->i_atime) < 0) + } + now = current_fs_time(inode->i_sb); + if (mnt) { + if (mnt->mnt_flags & MNT_RELATIME) + if (!relatime_need_update(inode, now)) return; - } } - now = current_fs_time(inode->i_sb); if (timespec_equal(&inode->i_atime, &now)) return; Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c +++ linux/fs/namespace.c @@ -1107,6 +1107,7 @@ int do_add_mount(struct vfsmount *newmnt goto unlock; newmnt->mnt_flags = mnt_flags; + if ((err = graft_tree(newmnt, nd))) goto unlock; @@ -1362,6 +1363,24 @@ int copy_mount_options(const void __user } /* + * Allow users to disable (or enable) atime updates via a .config + * option or via the boot line, or via /proc/sys/fs/default_relatime: + */ +int default_relatime __read_mostly = CONFIG_DEFAULT_RELATIME_VAL; + +static int __init set_default_relatime(char *str) +{ + get_option(&str, &default_relatime); + + printk(KERN_INFO "Mount all filesystems with" + "default relative atime updates: %s.\n", + default_relatime ? "enabled" : "disabled"); + + return 1; +} +__setup("default_relatime=", set_default_relatime); + +/* * Flags is a 32-bit value that allows up to 31 non-fs dependent flags to * be given to the mount() call (ie: read-only, no-dev, no-suid etc). * @@ -1409,6 +1428,11 @@ long do_mount(char *dev_name, char *dir_ mnt_flags |= MNT_NODIRATIME; if (flags & MS_RELATIME) mnt_flags |= MNT_RELATIME; + else if (default_relatime && + !(flags & (MNT_NOATIME | MNT_NODIRATIME))) { + mnt_flags |= MNT_RELATIME; + flags |= MS_RELATIME; + } flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_NOATIME | MS_NODIRATIME | MS_RELATIME); Index: linux/include/linux/mount.h =================================================================== --- linux.orig/include/linux/mount.h +++ linux/include/linux/mount.h @@ -103,5 +103,8 @@ extern void shrink_submounts(struct vfsm extern spinlock_t vfsmount_lock; extern dev_t name_to_dev_t(char *name); +extern int default_relatime; +extern int relatime_interval; + #endif #endif /* _LINUX_MOUNT_H */ Index: linux/kernel/sysctl.c =================================================================== --- linux.orig/kernel/sysctl.c +++ linux/kernel/sysctl.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -1206,6 +1207,22 @@ static ctl_table fs_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "default_relatime", + .data = &default_relatime, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "relatime_interval", + .data = &relatime_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) { .ctl_name = CTL_UNNUMBERED, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/