2012-02-12 12:54:36

by Dmitry Adamushko

[permalink] [raw]
Subject: Fwd: [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags

[ resent to lkml in 'plain-text' format ]

On 10 February 2012 01:16, John Stultz <[email protected]> wrote:

[ ... ]

> --- /dev/null
> +++ b/mm/volatile.c
> @@ -0,0 +1,314 @@
> +/* mm/volatile.c
> + *
> [ ... ]
>
> +
> +#define range_on_lru(range) (!(range)->purged)
> +
> +
> +static inline void volatile_range_shrink(struct volatile_range *range,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pgoff_t start_index, pgoff_t end_index)
> +{
> + ? ? ? size_t pre = range_size(range);
> +
> + ? ? ? range->range_node.start = start_index;
> + ? ? ? range->range_node.end = end_index;
> +


I guess, here we get a whole range of races with volatile_shrink(),
which may see inconsistent (in-the-middle-of-update) ranges (e.g.
.start and .end).


>
> + ? ? ? if (range_on_lru(range)) {


here volatile_shrink() runs and sets range->purge to 1, then calls
__lru_del() => lru_count gets updated.

>
> + ? ? ? ? ? ? ? mutex_lock(&volatile_lru_mutex);
> + ? ? ? ? ? ? ? lru_count -= pre - range_size(range);
> + ? ? ? ? ? ? ? mutex_unlock(&volatile_lru_mutex);


and then lru_count gets updated once more - for the same 'range' object.


>
> + ? ? ? }
> +}


>
> [ ... ]


>
>
> +static int volatile_shrink(struct shrinker *ignored, struct shrink_control *sc)
> +{
> + ? ? ? struct volatile_range *range, *next;
> + ? ? ? unsigned long nr_to_scan = sc->nr_to_scan;
> + ? ? ? const gfp_t gfp_mask = sc->gfp_mask;
> +
> + ? ? ? /* We might recurse into filesystem code, so bail out if necessary */
> + ? ? ? if (nr_to_scan && !(gfp_mask & __GFP_FS))
> + ? ? ? ? ? ? ? return -1;
> + ? ? ? if (!nr_to_scan)
> + ? ? ? ? ? ? ? return lru_count;


So it's u64 -> int here, which is possibly 32 bits and signed. Can't
it lead to inconsistent results on 32bit platforms?

>
> +
> + ? ? ? mutex_lock(&volatile_lru_mutex);
> + ? ? ? list_for_each_entry_safe(range, next, &volatile_lru_list, lru) {
> + ? ? ? ? ? ? ? struct inode *inode = range->mapping->host;
> + ? ? ? ? ? ? ? loff_t start, end;
> +
> +
> + ? ? ? ? ? ? ? start = range->range_node.start * PAGE_SIZE;
> + ? ? ? ? ? ? ? end = (range->range_node.end + 1) * PAGE_SIZE - 1;


PAGE_CACHE_SHIFT was used in fadvise() to calculate .start and .end
indexes, and here we use PAGE_SIZE to get back to 'normal' addresses.
Isn't it inconsistent at the very least?

>
> +
> + ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ?* XXX - calling vmtruncate_range from a shrinker causes
> + ? ? ? ? ? ? ? ?* lockdep warnings. Revisit this!
> + ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? vmtruncate_range(inode, start, end);
> + ? ? ? ? ? ? ? range->purged = 1;
> + ? ? ? ? ? ? ? __lru_del(range);
> +
> + ? ? ? ? ? ? ? nr_to_scan -= range_size(range);


hmm, unsigned long -= u64

>
> + ? ? ? ? ? ? ? if (nr_to_scan <= 0)


nr_to_scan is "unsigned long" :-))

[ ... ]

> +arch_initcall(volatile_init);
> --
> 1.7.3.2.146.gca209
>

--

-- Dmitry