Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753410AbaGVHXe (ORCPT ); Tue, 22 Jul 2014 03:23:34 -0400 Received: from mail-wi0-f182.google.com ([209.85.212.182]:54775 "EHLO mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753210AbaGVHXa (ORCPT ); Tue, 22 Jul 2014 03:23:30 -0400 Date: Tue, 22 Jul 2014 09:23:37 +0200 From: Daniel Vetter To: Oded Gabbay Cc: Jerome Glisse , Andrew Lewycky , Michel =?iso-8859-1?Q?D=E4nzer?= , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , linux-mm , Evgeny Pinchuk , Alexey Skidanov , Andrew Morton Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver Message-ID: <20140722072337.GG15237@phenom.ffwll.local> Mail-Followup-To: Oded Gabbay , Jerome Glisse , Andrew Lewycky , Michel =?iso-8859-1?Q?D=E4nzer?= , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , linux-mm , Evgeny Pinchuk , Alexey Skidanov , Andrew Morton References: <20140720174652.GE3068@gmail.com> <53CD0961.4070505@amd.com> <53CD17FD.3000908@vodafone.de> <53CD1FB6.1000602@amd.com> <20140721155437.GA4519@gmail.com> <53CD5122.5040804@amd.com> <20140721181433.GA5196@gmail.com> <53CD5DBC.7010301@amd.com> <20140721185940.GA5278@gmail.com> <53CD68BF.4020308@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53CD68BF.4020308@amd.com> X-Operating-System: Linux phenom 3.15.0-rc3+ User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 21, 2014 at 10:23:43PM +0300, Oded Gabbay wrote: > But Jerome, the core problem still remains in effect, even with your > suggestion. If an application, either via userspace queue or via ioctl, > submits a long-running kernel, than the CPU in general can't stop the > GPU from running it. And if that kernel does while(1); than that's it, > game's over, and no matter how you submitted the work. So I don't really > see the big advantage in your proposal. Only in CZ we can stop this wave > (by CP H/W scheduling only). What are you saying is basically I won't > allow people to use compute on Linux KV system because it _may_ get the > system stuck. > > So even if I really wanted to, and I may agree with you theoretically on > that, I can't fulfill your desire to make the "kernel being able to > preempt at any time and be able to decrease or increase user queue > priority so overall kernel is in charge of resources management and it > can handle rogue client in proper fashion". Not in KV, and I guess not > in CZ as well. At least on intel the execlist stuff which is used for preemption can be used by both the cpu and the firmware scheduler. So we can actually preempt when doing cpu scheduling. It sounds like current amd hw doesn't have any preemption at all. And without preemption I don't think we should ever consider to allow userspace to directly submit stuff to the hw and overload. Imo the kernel _must_ sit in between and reject clients that don't behave. Of course you can only ever react (worst case with a gpu reset, there's code floating around for that on intel-gfx), but at least you can do something. If userspace has a direct submit path to the hw then this gets really tricky, if not impossible. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/