Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753749AbZIRGxZ (ORCPT ); Fri, 18 Sep 2009 02:53:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752111AbZIRGxY (ORCPT ); Fri, 18 Sep 2009 02:53:24 -0400 Received: from courier.cs.helsinki.fi ([128.214.9.1]:57884 "EHLO mail.cs.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751155AbZIRGxX (ORCPT ); Fri, 18 Sep 2009 02:53:23 -0400 Subject: Re: [PATCH 2/4] send callback when swap slot is freed From: Pekka Enberg To: Nitin Gupta Cc: Greg KH , Andrew Morton , Ed Tomlinson , linux-kernel , linux-mm , linux-mm-cc , hugh.dickins@tiscali.co.uk, kamezawa.hiroyu@jp.fujitsu.com, nishimura@mxp.nes.nec.co.jp In-Reply-To: <1253227412-24342-3-git-send-email-ngupta@vflare.org> References: <1253227412-24342-1-git-send-email-ngupta@vflare.org> <1253227412-24342-3-git-send-email-ngupta@vflare.org> Date: Fri, 18 Sep 2009 09:53:25 +0300 Message-Id: <1253256805.4959.8.camel@penberg-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Mailer: Evolution 2.24.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4630 Lines: 135 On Fri, 2009-09-18 at 04:13 +0530, Nitin Gupta wrote: > Currently, we have "swap discard" mechanism which sends a discard bio request > when we find a free cluster during scan_swap_map(). This callback can come a > long time after swap slots are actually freed. > > This delay in callback is a great problem when (compressed) RAM [1] is used > as a swap device. So, this change adds a callback which is called as > soon as a swap slot becomes free. For above mentioned case of swapping > over compressed RAM device, this is very useful since we can immediately > free memory allocated for this swap page. > > This callback does not replace swap discard support. It is called with > swap_lock held, so it is meant to trigger action that finishes quickly. > However, swap discard is an I/O request and can be used for taking longer > actions. > > It is preferred to use this callback for ramzswap case even if discard > mechanism could be improved such that it can be called as often as required. > This is because, allocation of 'bio'(s) is undesirable since ramzswap always > operates under low memory conditions (its a swap device). Also, batching of > discard bio requests is not optimal since stale data can accumulate very > quickly in ramzswap devices, pushing system further into low memory state. > > Links: > [1] http://compcache.googlecode.com/ > > Signed-off-by: Nitin Gupta > > --- > include/linux/swap.h | 5 +++++ > mm/swapfile.c | 34 ++++++++++++++++++++++++++++++++++ > 2 files changed, 39 insertions(+), 0 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 7c15334..64796fc 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -8,6 +8,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -20,6 +21,8 @@ struct bio; > #define SWAP_FLAG_PRIO_MASK 0x7fff > #define SWAP_FLAG_PRIO_SHIFT 0 > > +typedef void (swap_free_notify_fn) (struct block_device *, unsigned long); > + > static inline int current_is_kswapd(void) > { > return current->flags & PF_KSWAPD; > @@ -155,6 +158,7 @@ struct swap_info_struct { > unsigned int max; > unsigned int inuse_pages; > unsigned int old_block_size; > + swap_free_notify_fn *swap_free_notify_fn; > }; > > struct swap_list_t { > @@ -295,6 +299,7 @@ extern sector_t swapdev_block(int, pgoff_t); > extern struct swap_info_struct *get_swap_info_struct(unsigned); > extern int reuse_swap_page(struct page *); > extern int try_to_free_swap(struct page *); > +extern void set_swap_free_notify(struct block_device *, swap_free_notify_fn *); > struct backing_dev_info; > > /* linux/mm/thrash.c */ > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 74f1102..b165db0 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -554,6 +554,38 @@ out: > return NULL; > } > > +/* > + * Sets callback for event when swap_map[offset] == 0 > + * i.e. page at this swap offset is no longer used. > + */ > +void set_swap_free_notify(struct block_device *bdev, > + swap_free_notify_fn *notify_fn) > +{ > + unsigned int i; > + struct swap_info_struct *sis; > + > + spin_lock(&swap_lock); > + for (i = 0; i <= nr_swapfiles; i++) { > + sis = &swap_info[i]; > + if (!(sis->flags & SWP_USED)) > + continue; > + if (sis->bdev == bdev) > + break; > + } > + > + /* swap device not found */ > + if (i > nr_swapfiles) { > + spin_unlock(&swap_lock); > + return; > + } > + > + BUG_ON(!sis || sis->swap_free_notify_fn); > + sis->swap_free_notify_fn = notify_fn; > + spin_unlock(&swap_lock); > + return; > +} > +EXPORT_SYMBOL_GPL(set_swap_free_notify); > + > static int swap_entry_free(struct swap_info_struct *p, > swp_entry_t ent, int cache) > { > @@ -585,6 +617,8 @@ static int swap_entry_free(struct swap_info_struct *p, > swap_list.next = p - swap_info; > nr_swap_pages++; > p->inuse_pages--; > + if (p->swap_free_notify_fn) > + p->swap_free_notify_fn(p->bdev, offset); > } > if (!swap_count(count)) > mem_cgroup_uncharge_swap(ent); OK, this hits core kernel code so we need to CC some more mm/swapfile.c people. The set_swap_free_notify() API looks strange to me. Hugh, I think you mentioned that you're okay with an explicit hook. Any suggestions how to do this cleanly? Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/