Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Thu, 4 Apr 2019 08:13:54 +1100
From:   "Tobin C. Harding" <me@tobin.cc>
To:     Roman Gushchin <guro@fb.com>
Cc:     "Tobin C. Harding" <tobin@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Christoph Lameter <cl@linux.com>,
        Pekka Enberg <penberg@kernel.org>,
        David Rientjes <rientjes@google.com>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        Matthew Wilcox <willy@infradead.org>,
        "linux-mm@kvack.org" <linux-mm@kvack.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 2/7] slob: Respect list_head abstraction layer
Message-ID: <20190403211354.GC23288@eros.localdomain>
References: <20190402230545.2929-1-tobin@kernel.org>
 <20190402230545.2929-3-tobin@kernel.org>
 <20190403180026.GC6778@tower.DHCP.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190403180026.GC6778@tower.DHCP.thefacebook.com>
User-Agent: Mutt/1.11.4 (2019-03-13)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Wed, Apr 03, 2019 at 06:00:30PM +0000, Roman Gushchin wrote:
> On Wed, Apr 03, 2019 at 10:05:40AM +1100, Tobin C. Harding wrote:
> > Currently we reach inside the list_head.  This is a violation of the
> > layer of abstraction provided by the list_head.  It makes the code
> > fragile.  More importantly it makes the code wicked hard to understand.
> > 
> > The code reaches into the list_head structure to counteract the fact
> > that the list _may_ have been changed during slob_page_alloc().  Instead
> > of this we can add a return parameter to slob_page_alloc() to signal
> > that the list was modified (list_del() called with page->lru to remove
> > page from the freelist).
> > 
> > This code is concerned with an optimisation that counters the tendency
> > for first fit allocation algorithm to fragment memory into many small
> > chunks at the front of the memory pool.  Since the page is only removed
> > from the list when an allocation uses _all_ the remaining memory in the
> > page then in this special case fragmentation does not occur and we
> > therefore do not need the optimisation.
> > 
> > Add a return parameter to slob_page_alloc() to signal that the
> > allocation used up the whole page and that the page was removed from the
> > free list.  After calling slob_page_alloc() check the return value just
> > added and only attempt optimisation if the page is still on the list.
> > 
> > Use list_head API instead of reaching into the list_head structure to
> > check if sp is at the front of the list.
> > 
> > Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> > ---
> >  mm/slob.c | 51 +++++++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 37 insertions(+), 14 deletions(-)
> > 
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 307c2c9feb44..07356e9feaaa 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -213,13 +213,26 @@ static void slob_free_pages(void *b, int order)
> >  }
> >  
> >  /*
> > - * Allocate a slob block within a given slob_page sp.
> > + * slob_page_alloc() - Allocate a slob block within a given slob_page sp.
> > + * @sp: Page to look in.
> > + * @size: Size of the allocation.
> > + * @align: Allocation alignment.
> > + * @page_removed_from_list: Return parameter.
> > + *
> > + * Tries to find a chunk of memory at least @size bytes big within @page.
> > + *
> > + * Return: Pointer to memory if allocated, %NULL otherwise.  If the
> > + *         allocation fills up @page then the page is removed from the
> > + *         freelist, in this case @page_removed_from_list will be set to
> > + *         true (set to false otherwise).
> >   */
> > -static void *slob_page_alloc(struct page *sp, size_t size, int align)
> > +static void *slob_page_alloc(struct page *sp, size_t size, int align,
> > +			     bool *page_removed_from_list)
> 
> Hi Tobin!
> 
> Isn't it better to make slob_page_alloc() return a bool value?
> Then it's easier to ignore the returned value, no need to introduce "_unused".
> 
> Thanks!
> 
> >  {
> >  	slob_t *prev, *cur, *aligned = NULL;
> >  	int delta = 0, units = SLOB_UNITS(size);
> >  
> > +	*page_removed_from_list = false;
> >  	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
> >  		slobidx_t avail = slob_units(cur);
> >  
> > @@ -254,8 +267,10 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align)
> >  			}
> >  
> >  			sp->units -= units;
> > -			if (!sp->units)
> > +			if (!sp->units) {
> >  				clear_slob_page_free(sp);
> > +				*page_removed_from_list = true;
> > +			}
> >  			return cur;
> >  		}
> >  		if (slob_last(cur))
> > @@ -269,10 +284,10 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align)
> >  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >  {
> >  	struct page *sp;
> > -	struct list_head *prev;
> >  	struct list_head *slob_list;
> >  	slob_t *b = NULL;
> >  	unsigned long flags;
> > +	bool _unused;
> >  
> >  	if (size < SLOB_BREAK1)
> >  		slob_list = &free_slob_small;
> > @@ -284,6 +299,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >  	spin_lock_irqsave(&slob_lock, flags);
> >  	/* Iterate through each partially free page, try to find room */
> >  	list_for_each_entry(sp, slob_list, lru) {
> > +		bool page_removed_from_list = false;
> >  #ifdef CONFIG_NUMA
> >  		/*
> >  		 * If there's a node specification, search for a partial
> > @@ -296,18 +312,25 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >  		if (sp->units < SLOB_UNITS(size))
> >  			continue;
> >  
> > -		/* Attempt to alloc */
> > -		prev = sp->lru.prev;
> > -		b = slob_page_alloc(sp, size, align);
> > +		b = slob_page_alloc(sp, size, align, &page_removed_from_list);
> >  		if (!b)
> >  			continue;
> >  
> > -		/* Improve fragment distribution and reduce our average
> > -		 * search time by starting our next search here. (see
> > -		 * Knuth vol 1, sec 2.5, pg 449) */
> > -		if (prev != slob_list->prev &&
> > -				slob_list->next != prev->next)
> > -			list_move_tail(slob_list, prev->next);
> > +		/*
> > +		 * If slob_page_alloc() removed sp from the list then we
> > +		 * cannot call list functions on sp.  If so allocation
> > +		 * did not fragment the page anyway so optimisation is
> > +		 * unnecessary.
> > +		 */
> > +		if (!page_removed_from_list) {
> > +			/*
> > +			 * Improve fragment distribution and reduce our average
> > +			 * search time by starting our next search here. (see
> > +			 * Knuth vol 1, sec 2.5, pg 449)
> > +			 */
> > +			if (!list_is_first(&sp->lru, slob_list))
> > +				list_rotate_to_front(&sp->lru, slob_list);

According to 0day test robot this is triggering an error from
CHECK_DATA_CORRUPTION when the kernel is built with CONFIG_DEBUG_LIST.
I think this is because list_rotate_to_front() puts the list into an
invalid state before it calls __list_add().  The thing that has me
stumped is why this was not happening before this patch series was
applied?  ATM I'm not able to get my test module to trigger this but I'm
going to try a bit harder today.  If I'm right one solution is to modify
list_rotate_to_front() to _not_ call __list_add() but do it manually,
this solution doesn't sit well with me though.

So, summing up, I think the patch is correct in that it does the correct
thing but I think the debugging code doesn't like it because we are
violating typical usage - so the patch is wrong :)

thanks,
Tobin.