Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754663AbcJERVm (ORCPT ); Wed, 5 Oct 2016 13:21:42 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:42119 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753175AbcJERVh (ORCPT ); Wed, 5 Oct 2016 13:21:37 -0400 Subject: Re: aio: questions with ioctx_alloc() and large num_possible_cpus() To: Kent Overstreet References: <20161005063435.mtw2keukyxwbwo2k@kmo-pixel> Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-kernel@vger.kernel.org From: Mauricio Faria de Oliveira Date: Wed, 5 Oct 2016 14:21:27 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161005063435.mtw2keukyxwbwo2k@kmo-pixel> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16100517-0020-0000-0000-0000024CDBBF X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16100517-0021-0000-0000-0000304FFDFB Message-Id: <965fc993-97c2-48b9-82e3-6c3444d0ffe5@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-05_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610050294 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1739 Lines: 40 Hi Kent, Thanks for commenting. I understood more of the code in trying to make sense of your point, but there are some things still unclear about it; if you could help a bit more, please. Can you describe how a single thread might not be able to use all the slots because 'up to about half of the reqs_available slots might be on other percpu reqs_available' ? I see that the thread might be scheduled on different CPUs (say, only 2 possible CPUs) and perform get_reqs_available() on both -- but that only gives one req_batch to each CPU, and for req_batch to be half of reqs_available its denominator needs to be 2, which doesn't happen w/ num_possible_cpus() * 4 -- which is 8. So I'm a bit confused here. atomic_set(&ctx->reqs_available, ctx->nr_events - 1); ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4); On 10/05/2016 03:34 AM, Kent Overstreet wrote: >> - why "num_possible_cpus() * 4", and why "max(nr_events, )" ? > For the scheme to work - percpu allocation of slots - we have to ensure that > there aren't too many unused slots stranded on other CPUs. The stranding is > limited to 1/4th of the slots [snip] By 'unused slots' you mean the slots included in the batch allocated to a particular cpu but not actually used by a thread in that cpu? (e.g., get_reqs_available() called once, unused_slots == req_batch - 1) Can you please detail a bit more how the limit to 1/4th of the slots is ensured because of "num_possible_cpus() * 4", and what is the scenario where the math is based on? I've been thinking and assuming values for a while now, and didn't figure out the point where / how it occurs. Thanks for your support, -- Mauricio Faria de Oliveira IBM Linux Technology Center