Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp584975ybl; Fri, 10 Jan 2020 03:32:59 -0800 (PST) X-Google-Smtp-Source: APXvYqxbAGa7i1c3xByqf6jXZu4LxaczNOucW117fo4UXQhrjNNjDkwG2NzSjlowTHSnK1BWdtPp X-Received: by 2002:a9d:66ca:: with SMTP id t10mr2161952otm.352.1578655979751; Fri, 10 Jan 2020 03:32:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578655978; cv=none; d=google.com; s=arc-20160816; b=t7TSvSKEvQjpl1gu7LGqnyfLgO6i4WvUQkClu/ely62+sOwfLOwKnYl915L0USFYkD VBrFc6+OHQSYPRD3Wv2CBY8t1s1gFEg4bdHZOswOoFgbD7TF5/HTadYBh2irHahKoIZM bcB4SgRyL4lDoW8Hj+v/W0zLbzoRvyuGDV9gU75WeJEcFhSeqccd82Nu6kglBrqinRhA 9UCHX/sdQthvKoH8c1ERFdiJk0XbMy8Z2wOI38sh/eiXEa9rKwsYY6Uj3ku7kzIzZq37 epfLeI+8AkLbgsgpnvbpb6gpkGsFppR6fCzSOaceJaMaQwqDlWJe4SRlokHm5P1t2rNY /DAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=2To09vgTYnKpMXzy6m6qVl1xvlUAlJvY27EC8LYs6zM=; b=L5Z1y1dvt89+iEjyZr6aF2Q3bO69unxUSo0PpGrrZAsE9/BRt2PgA1EQhPXDbQ1Qcc saeZrjqyHoqnn+1bkrtR9PbEZrFZoPWlwuWRgIO+2w5Ln+fdnet57EWsFONon4INx4R+ mhclYyPq7yhgbXaadQPXMJJbIuvyHpVphYspH66zPZX1dcRg48lG/AIvlMgkbgL/47Af uUPRvtrdl9wcigZ+QfCbZbpNcLfW6S2MCAdvt3yyXsh25r1mDVnZRQXVhZLKldp8tVt2 618zK6Loa1bcrnT2ClVbOB3LHkcLNs5Jy3jRnm3dee86bcXAxrqi/daKZxZ6D3/KSfOL ZEtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g26si1118035otn.180.2020.01.10.03.32.47; Fri, 10 Jan 2020 03:32:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727806AbgAJLbm (ORCPT + 99 others); Fri, 10 Jan 2020 06:31:42 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:38419 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727457AbgAJLbm (ORCPT ); Fri, 10 Jan 2020 06:31:42 -0500 Received: by mail-wr1-f66.google.com with SMTP id y17so1475829wrh.5 for ; Fri, 10 Jan 2020 03:31:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=2To09vgTYnKpMXzy6m6qVl1xvlUAlJvY27EC8LYs6zM=; b=Y1CdTsjHbZQ3gRrW4Gddnc9VdhOMH31g886AyFni0UL0cMRgcNQaCnesQlaJaOEFYB d7m4/0Cx13aRqcLY35O3G+iKwMOGIlar0wp5i2DhKvNU85WkMcgwKNQbW8Zu8d8+ggHt pG48CSfGfsSMAehaK7YmLvabbRYZpE4t+RnIxSMWEAzkEI5KoFMp8TWSzAY/eVAY4ofp K/rvenwaeA8pCwmRU/0a2M7u1Rkxtv2BslAXvYlWRwWZXD8+oa3F1jXZZYyY7brEQdiK 2YUc9iy9u7+QawI8/dE/2OiUwjDs1pgop0G3YmTeuCDOUuqWsA8FCtN1ALefjr+FjP+E AOHQ== X-Gm-Message-State: APjAAAXZzA68KRWHRm84fc74Q67ow60nRRWkkoeCldfVuVOKYhr3pRR0 Sa2MosmMtujKauy7tRe9H6sdl+q2 X-Received: by 2002:adf:8b4f:: with SMTP id v15mr3086675wra.231.1578655899881; Fri, 10 Jan 2020 03:31:39 -0800 (PST) Received: from localhost (ip-37-188-146-105.eurotel.cz. [37.188.146.105]) by smtp.gmail.com with ESMTPSA id b15sm1857297wmj.13.2020.01.10.03.31.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jan 2020 03:31:38 -0800 (PST) Date: Fri, 10 Jan 2020 12:31:37 +0100 From: Michal Hocko To: David Hildenbrand Cc: Andrew Morton , Scott Cheloha , linux-kernel@vger.kernel.org, "Rafael J. Wysocki" , Greg Kroah-Hartman , nathanl@linux.ibm.com, ricklind@linux.vnet.ibm.com, Scott Cheloha Subject: Re: [PATCH v4] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup Message-ID: <20200110113137.GE29802@dhcp22.suse.cz> References: <20200109142758.659c1545cb8df2d05f299a4a@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 10-01-20 10:32:01, David Hildenbrand wrote: > On 09.01.20 23:35, David Hildenbrand wrote: > >> Am 09.01.2020 um 23:28 schrieb Andrew Morton : > >> > >> On Thu, 9 Jan 2020 23:17:09 +0100 David Hildenbrand wrote: > >> > >>> > >>> > >>>>> Am 09.01.2020 um 23:00 schrieb Andrew Morton : > >>>> > >>>> On Thu, 9 Jan 2020 15:25:16 -0600 Scott Cheloha wrote: > >>>> > >>>>> Searching for a particular memory block by id is an O(n) operation > >>>>> because each memory block's underlying device is kept in an unsorted > >>>>> linked list on the subsystem bus. > >>>>> > >>>>> We can cut the lookup cost to O(log n) if we cache the memory blocks in > >>>>> a radix tree. With a radix tree cache in place both memory subsystem > >>>>> initialization and memory hotplug run palpably faster on systems with a > >>>>> large number of memory blocks. > >>>>> > >>>>> ... > >>>>> > >>>>> @@ -56,6 +57,13 @@ static struct bus_type memory_subsys = { > >>>>> .offline = memory_subsys_offline, > >>>>> }; > >>>>> > >>>>> +/* > >>>>> + * Memory blocks are cached in a local radix tree to avoid > >>>>> + * a costly linear search for the corresponding device on > >>>>> + * the subsystem bus. > >>>>> + */ > >>>>> +static RADIX_TREE(memory_blocks, GFP_KERNEL); > >>>> > >>>> What protects this tree from racy accesses? > >>> > >>> I think the device hotplug lock currently (except during boot where no races can happen). > >>> > >> > >> So this? > >> > >> --- a/drivers/base/memory.c~drivers-base-memoryc-cache-blocks-in-radix-tree-to-accelerate-lookup-fix > >> +++ a/drivers/base/memory.c > >> @@ -61,6 +61,9 @@ static struct bus_type memory_subsys = { > >> * Memory blocks are cached in a local radix tree to avoid > >> * a costly linear search for the corresponding device on > >> * the subsystem bus. > >> + * > >> + * Protected by mem_hotplug_lock in mem_hotplug_begin(), and by the guaranteed > >> + * single-threadness at boot time. > >> */ > >> static RADIX_TREE(memory_blocks, GFP_KERNEL); > >> > >> > >> But are we sure this is all true? > > > > I think the device hotplug lock, not the memory hotplug lock. Will double check later. > > So all writers either hold the device_hotplug_lock or run during boot. > Documented e.g., for memory_dev_init(), create_memory_block_devices(), > remove_memory_block_devices(). > > The readers are mainly > - find_memory_block() > -> called via online_pages()/offline_pages() where we hold the > device_hotplug_lock > -> called from arch/powerpc/platforms/pseries/hotplug-memory.c, > where we always hold the device_hotplug_lock > - walk_memory_blocks() > -> Callers in drivers/acpi/acpi_memhotplug.c either hold the > device_hotplug_lock or are called early during boot > -> Callers in mm/memory_hotplug.c either hold the > device_hotplug_lock or are called early during boot > -> link_mem_sections() is called either early during boot or via > add_memory_resource() (whereby all callers either hold the > device_hotplug_lock or are called early during boot) > -> Callers in arch/powerpc/platforms/powernv/memtrace.c hold the > device_hotplug_lock > > So we are fine. > > I suggest we document that expected behavior via Thanks for documenting this! Adding a comment as you suggest makes sense. Overall the locking expectations would be similar to subsys_find_device_by_id which doesn't use any internal locking so the radix tree which follows the lifetime of those objects should be compatible with the current implementation (so no new races at least). > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 799b43191dea..8c8dc081597e 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -585,6 +585,8 @@ static struct memory_block > *find_memory_block_by_id(unsigned long block_id) > * tree or something here. > * > * This could be made generic for all device subsystems. > + * > + * Called under device_hotplug_lock. > */ > struct memory_block *find_memory_block(struct mem_section *section) > { > @@ -837,6 +839,8 @@ void __init memory_dev_init(void) > * > * In case func() returns an error, walking is aborted and the error is > * returned. > + * > + * Called under device_hotplug_lock. > */ > int walk_memory_blocks(unsigned long start, unsigned long size, > void *arg, walk_memory_blocks_func_t func) > > > Please note that the memory hotplug lock is not safe on the reader side. > But also not on the writer side after > https://lkml.kernel.org/r/157863061737.2230556.3959730620803366776.stgit@dwillia2-desk3.amr.corp.intel.com > > > -- > Thanks, > > David / dhildenb -- Michal Hocko SUSE Labs