Partial page discard requests are ignored and the documentation on why this
is correct behaviour sucks. A readahead patch looked like a "regression" to
a random IO storage benchmark because posix_fadvise() was used incorrectly
to force IO requests to go to disk. In reality, the benchmark sucked but
it was non-obvious why. Patch 1 updates the kernel comment in case someone
"fixes" either readahead or fadvise for inappropriate reasons. Patch 2
updates the relevant man page on the rough off chance that application
developers do not read kernel source comments.
--
2.1.2
It is not obvious from the interface that partial page discard requests
are ignored. It should be spelled out.
Signed-off-by: Mel Gorman <[email protected]>
---
man2/posix_fadvise.2 | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/man2/posix_fadvise.2 b/man2/posix_fadvise.2
index 25d0c50..07313a9 100644
--- a/man2/posix_fadvise.2
+++ b/man2/posix_fadvise.2
@@ -144,6 +144,11 @@ A program may periodically request the kernel to free cached data
that has already been used, so that more useful cached pages are not
discarded instead.
+Requests to discard partial pages are ignored. It is preferable to preserve
+needed data than discard unneeded data. If the application requires that
+data be considered for discarding then \fIoffset\fP and \fIlen\fP must be
+page-aligned.
+
Pages that have not yet been written out will be unaffected, so if the
application wishes to guarantee that pages will be released, it should
call
A random seek IO benchmark appeared to regress because of a change to
readahead but the real problem was the benchmark. To ensure the IO request
accesssed disk, it used fadvise(FADV_DONTNEED) on a block boundary (512K)
but the hint is ignored by the kernel. This is correct but not necessarily
obvious behaviour. As much as I dislike comment patches, the explanation
for this behaviour predates current git history. Clarify why it behaves
like this in case someone "fixes" fadvise or readahead for the wrong reasons.
Signed-off-by: Mel Gorman <[email protected]>
---
mm/fadvise.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 3bcfd81d..c908c72 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -117,7 +117,11 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
__filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
- /* First and last FULL page! */
+ /*
+ * First and last FULL page! Partial pages are deliberately
+ * preserved on the expectation that it is better to preserve
+ * needed memory than to discard unneeded memory.
+ */
start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
end_index = (endbyte >> PAGE_CACHE_SHIFT);
--
2.1.2
On 12/03/2014 01:42 AM, Mel Gorman wrote:
> It is not obvious from the interface that partial page discard requests
> are ignored. It should be spelled out.
Thanks, Mel. Applied.
Cheers,
Michael
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> man2/posix_fadvise.2 | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/man2/posix_fadvise.2 b/man2/posix_fadvise.2
> index 25d0c50..07313a9 100644
> --- a/man2/posix_fadvise.2
> +++ b/man2/posix_fadvise.2
> @@ -144,6 +144,11 @@ A program may periodically request the kernel to free cached data
> that has already been used, so that more useful cached pages are not
> discarded instead.
>
> +Requests to discard partial pages are ignored. It is preferable to preserve
> +needed data than discard unneeded data. If the application requires that
> +data be considered for discarding then \fIoffset\fP and \fIlen\fP must be
> +page-aligned.
> +
> Pages that have not yet been written out will be unaffected, so if the
> application wishes to guarantee that pages will be released, it should
> call
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
On Wed, Dec 3, 2014 at 1:42 AM, Mel Gorman <[email protected]> wrote:
> Partial page discard requests are ignored and the documentation on why this
> is correct behaviour sucks. A readahead patch looked like a "regression" to
> a random IO storage benchmark because posix_fadvise() was used incorrectly
> to force IO requests to go to disk. In reality, the benchmark sucked but
> it was non-obvious why. Patch 1 updates the kernel comment in case someone
> "fixes" either readahead or fadvise for inappropriate reasons. Patch 2
> updates the relevant man page on the rough off chance that application
> developers do not read kernel source comments.
It feels like that last sentence should have made LWN quote of the week :-/.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/