Message-ID: <553A66EB.3050802@redhat.com>
Date: Fri, 24 Apr 2015 11:53:15 -0400
From: Rik van Riel <riel@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Christoph Lameter <cl@linux.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Jerome Glisse <j.glisse@gmail.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, jglisse@redhat.com, mgorman@suse.de,
        aarcange@redhat.com, airlied@redhat.com, benh@kernel.crashing.org,
        aneesh.kumar@linux.vnet.ibm.com,
        Cameron Buschardt <cabuschardt@nvidia.com>,
        Mark Hairgrove <mhairgrove@nvidia.com>,
        Geoffrey Gerfin <ggerfin@nvidia.com>,
        John McKenna <jmckenna@nvidia.com>, akpm@linux-foundation.org
Subject: Re: Interacting with coherent memory on external devices
References: <20150421214445.GA29093@linux.vnet.ibm.com> <alpine.DEB.2.11.1504211839120.6294@gentwo.org> <20150422000538.GB6046@gmail.com> <alpine.DEB.2.11.1504211942040.6294@gentwo.org> <20150422131832.GU5561@linux.vnet.ibm.com> <alpine.DEB.2.11.1504221105130.24979@gentwo.org> <20150422170737.GB4062@gmail.com> <alpine.DEB.2.11.1504221306200.26217@gentwo.org> <20150422185230.GD5561@linux.vnet.ibm.com> <alpine.DEB.2.11.1504230910190.32297@gentwo.org> <20150423192456.GQ5561@linux.vnet.ibm.com> <alpine.DEB.2.11.1504240859080.7582@gentwo.org>
In-Reply-To: <alpine.DEB.2.11.1504240859080.7582@gentwo.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1703
Lines: 40

On 04/24/2015 10:01 AM, Christoph Lameter wrote:
> On Thu, 23 Apr 2015, Paul E. McKenney wrote:
> 
>>> As far as I know Jerome is talkeing about HPC loads and high performance
>>> GPU processing. This is the same use case.
>>
>> The difference is sensitivity to latency.  You have latency-sensitive
>> HPC workloads, and Jerome is talking about HPC workloads that need
>> high throughput, but are insensitive to latency.
> 
> Those are correlated.
> 
>>> What you are proposing for High Performacne Computing is reducing the
>>> performance these guys trying to get. You cannot sell someone a Volkswagen
>>> if he needs the Ferrari.
>>
>> You do need the low-latency Ferrari.  But others are best served by a
>> high-throughput freight train.
> 
> The problem is that they want to run 2000 trains at the same time
> and they all must arrive at the destination before they can be send on
> their next trip. 1999 trains will be sitting idle because they need
> to wait of the one train that was delayed. This reduces the troughput.
> People really would like all 2000 trains to arrive on schedule so that
> they get more performance.

So you run 4000 or even 6000 trains, and have some subset of them
run at full steam, while others are waiting on memory accesses.

In reality the overcommit factor is likely much smaller, because
the GPU threads run and block on memory in smaller, more manageable
numbers, say a few dozen at a time.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/