Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1592713imu; Tue, 6 Nov 2018 00:51:25 -0800 (PST) X-Google-Smtp-Source: AJdET5fjhBPvGKmVj7pWTD0wEw3PNv8rNIX4SCHp7Cqpe7CiutCTakhfwR1aXBUzWUzn3jTx1I8u X-Received: by 2002:a62:6a88:: with SMTP id f130-v6mr10051632pfc.98.1541494285448; Tue, 06 Nov 2018 00:51:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541494285; cv=none; d=google.com; s=arc-20160816; b=W8eHCXEM0at+aNG/pl02Qnpkkjj21Bfd2Q/TmHApHn4gsYmaOwpWet4jK+UecGyrqf nJ/HiPDV2zfE476Cd8OHw4+dNU44/jZMSeaTFXxq3S57KjIUceYyxu5BbafUq09DPXQ4 2sNLGcQYo77Kx8QxeKviU3NuoG45sXviIMY7UZHld3GO4DIcjHBoWwKSgmwjSIZfgSdj ssEUjJe1LnDPbUZ9fVoLXxQmKyQNChMedZSGnBYpNICBLhdift2I2S6p3g6OxXgcrQB+ W6Vzi9JN0tCymy7tjhvH17z7mmgMpaVdsYRwKgacGT/HfTHlspEO8f/jczeBG/4QVJNK 6zRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=ZtGlg4SEax+LLpOCtbpP3UFflBhBGrANW2SLc/OYBKs=; b=jo6yseVfuz5Oy8OHuIxb9HwT8F54lQJwI8xfRtcYyALG95yASgHd/U/2Cu30bMcFws EjPPAfc/ojyRenwZqKGo/o++gcradf/3kwsY2Og9AYkLtctUmM0yecf3wypto9wD5p9U hQBMpmk1cYa0PXMvC+sugGqOxmLnQyJHW/tb8DDiLiu7mCsy5v667iK8VmgSF7zyspFU GIGTMeK/ErEXDddhDIAPVolSftCnizxOZUU4hPG2vey+fr+7wIzZmULdrrZX82dW9uIy dCwZ64ullBj7drYni7+cMXsGTZ44qA2qOnZXIj9sYHYKuqvbnKcwWXKaZKDHh8sf4CQD 5lHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Py1yXs5q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t1-v6si46298553ply.279.2018.11.06.00.51.09; Tue, 06 Nov 2018 00:51:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Py1yXs5q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730193AbeKFSNi (ORCPT + 99 others); Tue, 6 Nov 2018 13:13:38 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:55862 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729416AbeKFSNi (ORCPT ); Tue, 6 Nov 2018 13:13:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ZtGlg4SEax+LLpOCtbpP3UFflBhBGrANW2SLc/OYBKs=; b=Py1yXs5qWcZ9ibf5IsDIZubx9 92U/ksfz4MEStGOvWZOfNWd4Ea2w2tUm5o1dlCKNeBE91nh9TfodPiNqMm0RZ/9ec8gi9XtB5vUzZ Lz2NUq/glLJ4+Wp5dPCA3kh5i5QmpTHl5MZz6DG4yWErPBvb3E45F204aop5NuI1DjI6m/8DFKn3q Hl556k4M4I9vyGlwOWClofkxK0rvW1ykogW3SxWiGxHJswyt7rMcj46RU27mB/AfDU6MLmdnAm/Kz fmttpxNQdewssH8ipRWEUUeG7efutv4Bn+Pu6OSGM3MVkP/DOqqW7OVGiqZxPMAZ4jJpQD3Gu6KRu qnuSHbMXg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gJx2u-0006fz-KW; Tue, 06 Nov 2018 08:49:12 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 18F8A2029F9FF; Tue, 6 Nov 2018 09:49:11 +0100 (CET) Date: Tue, 6 Nov 2018 09:49:11 +0100 From: Peter Zijlstra To: Daniel Jordan Cc: linux-mm@kvack.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, aarcange@redhat.com, aaron.lu@intel.com, akpm@linux-foundation.org, alex.williamson@redhat.com, bsd@redhat.com, darrick.wong@oracle.com, dave.hansen@linux.intel.com, jgg@mellanox.com, jwadams@google.com, jiangshanlai@gmail.com, mhocko@kernel.org, mike.kravetz@oracle.com, Pavel.Tatashin@microsoft.com, prasad.singamsetty@oracle.com, rdunlap@infradead.org, steven.sistare@oracle.com, tim.c.chen@intel.com, tj@kernel.org, vbabka@suse.cz, "Rafael J. Wysocki" Subject: Re: [RFC PATCH v4 01/13] ktask: add documentation Message-ID: <20181106084911.GA22504@hirez.programming.kicks-ass.net> References: <20181105165558.11698-1-daniel.m.jordan@oracle.com> <20181105165558.11698-2-daniel.m.jordan@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181105165558.11698-2-daniel.m.jordan@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 05, 2018 at 11:55:46AM -0500, Daniel Jordan wrote: > +Concept > +======= > + > +ktask is built on unbound workqueues to take advantage of the thread management > +facilities it provides: creation, destruction, flushing, priority setting, and > +NUMA affinity. > + > +A little terminology up front: A 'task' is the total work there is to do and a > +'chunk' is a unit of work given to a thread. So I hate on the task naming. We already have a task, lets not overload that name. > +To complete a task using the ktask framework, a client provides a thread > +function that is responsible for completing one chunk. The thread function is > +defined in a standard way, with start and end arguments that delimit the chunk > +as well as an argument that the client uses to pass data specific to the task. > + > +In addition, the client supplies an object representing the start of the task > +and an iterator function that knows how to advance some number of units in the > +task to yield another object representing the new task position. The framework > +uses the start object and iterator internally to divide the task into chunks. > + > +Finally, the client passes the total task size and a minimum chunk size to > +indicate the minimum amount of work that's appropriate to do in one chunk. The > +sizes are given in task-specific units (e.g. pages, inodes, bytes). The > +framework uses these sizes, along with the number of online CPUs and an > +internal maximum number of threads, to decide how many threads to start and how > +many chunks to divide the task into. > + > +For example, consider the task of clearing a gigantic page. This used to be > +done in a single thread with a for loop that calls a page clearing function for > +each constituent base page. To parallelize with ktask, the client first moves > +the for loop to the thread function, adapting it to operate on the range passed > +to the function. In this simple case, the thread function's start and end > +arguments are just addresses delimiting the portion of the gigantic page to > +clear. Then, where the for loop used to be, the client calls into ktask with > +the start address of the gigantic page, the total size of the gigantic page, > +and the thread function. Internally, ktask will divide the address range into > +an appropriate number of chunks and start an appropriate number of threads to > +complete these chunks. I see no mention of padata anywhere; I also don't see mention of the async init stuff. Both appear to me to share, at least in part, the same reason for existence. > +Scheduler Interaction > +===================== > + > +Even within the resource limits, ktask must take care to run a number of > +threads appropriate for the system's current CPU load. Under high CPU usage, > +starting excessive helper threads may disturb other tasks, unfairly taking CPU > +time away from them for the sake of an optimized kernel code path. > + > +ktask plays nicely in this case by setting helper threads to the lowest > +scheduling priority on the system (MAX_NICE). This way, helpers' CPU time is > +appropriately throttled on a busy system and other tasks are not disturbed. > + > +The main thread initiating the task remains at its original priority so that it > +still makes progress on a busy system. > + > +It is possible for a helper thread to start running and then be forced off-CPU > +by a higher priority thread. With the helper's CPU time curtailed by MAX_NICE, > +the main thread may wait longer for the task to finish than it would have had > +it not started any helpers, so to ensure forward progress at a single-threaded > +pace, once the main thread is finished with all outstanding work in the task, > +the main thread wills its priority to one helper thread at a time. At least > +one thread will then always be running at the priority of the calling thread. What isn't clear is if this calling thread is waiting or not. Only do this inheritance trick if it is actually waiting on the work. If it is not, nobody cares. > +Cgroup Awareness > +================ > + > +Given the potentially large amount of CPU time ktask threads may consume, they > +should be aware of the cgroup of the task that called into ktask and > +appropriately throttled. > + > +TODO: Implement cgroup-awareness in unbound workqueues. Yes.. that needs done. > +Power Management > +================ > + > +Starting additional helper threads may cause the system to consume more energy, > +which is undesirable on energy-conscious devices. Therefore ktask needs to be > +aware of cpufreq policies and scaling governors. > + > +If an energy-conscious policy is in use (e.g. powersave, conservative) on any > +part of the system, that is a signal that the user has strong power management > +preferences, in which case ktask is disabled. > + > +TODO: Implement this. No, don't do that, its broken. Also, we're trying to move to a single cpufreq governor for all. Sure we'll retain 'performance', but powersave and conservative and all that nonsense should go away eventually. That's not saying you don't need a knob for this; but don't look at cpufreq for this.