Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753722AbcJDWz0 (ORCPT ); Tue, 4 Oct 2016 18:55:26 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35870 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751602AbcJDWzY (ORCPT ); Tue, 4 Oct 2016 18:55:24 -0400 To: Benjamin LaHaise , Kent Overstreet Cc: Alexander Viro , linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-kernel@vger.kernel.org From: Mauricio Faria de Oliveira Subject: aio: questions with ioctx_alloc() and large num_possible_cpus() Date: Tue, 4 Oct 2016 19:55:12 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16100422-0020-0000-0000-0000024C753A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16100422-0021-0000-0000-0000304F4628 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-04_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610040392 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2913 Lines: 89 Hi Benjamin, Kent, and others, Would you please comment / answer about this possible problem? Any feedback is appreciated. Since commit e1bdd5f27a5b ("aio: percpu reqs_available") the maximum number of aio nr_events may be a function of num_possible_cpus() and actually be /inversely proportional/ to it (i.e., more CPUs lead to less system-wide aio nr_events). This is a problem on larger systems. That's because if "nr_events < num_possible_cpus() * 4" (for example nr_events == 1) that counts as "num_possible_cpus() * 4" into aio_nr and against aio_max_nr static struct kioctx *ioctx_alloc(unsigned nr_events) ... nr_events = max(nr_events, num_possible_cpus() * 4); nr_events *= 2; ... /* limit the number of system wide aios */ .... if (aio_nr + nr_events > (aio_max_nr * 2UL) || ... err = -EAGAIN; ... aio_nr += ctx->max_reqs; ... That problem is easily noticeable on a common POWER8 system: 160 CPUs (2 sockets * 10 cores/socket * 8 threads/core = 160 CPUs) limits the max AIO contexts with "io_setup(1, )" to 102 out of 64k (default ax_aio_nr): # cat /sys/devices/system/cpu/possible 0-159 # cat /proc/sys/fs/aio-max-nr 65536 # echo $(( 65536 / (160 * 4) )) 102 test-case snippet & output: for (i = 0; i < 65536; i++) if (rc = io_setup(1, &ioctx[i])) break; printf("rc = %d, i = %d\n", rc, i); > rc = -11, i = 102 (another problem is that the sysctl aio-nr grows larger than aio-max-nr, since it's checked against "aio_max_nr * 2") So, I've been trying to understand/fix this, but soon got stuck on options as I didn't quite get a few points.. if you could provide some insight, please, that would be really helpful: - why "num_possible_cpus() * 4", and why "max(nr_events, )" ? Is it just related to req_batch in a form of a reasonable constant, or there are other implications (e.g., related to "up to half of slots on other cpu's percpu counters" -- which would be nice to understand why too.) - "struct kioctx" says max_reqs is " is what userspace passed to io_setup(), it's not used for anything but counting against the global max_reqs quota. " However, we see it incremented by the modified nr_events, thus not really the value from userspace anymore, and used to derive nr_events in aio_setup_ring(). Is the comment wrong nowadays, or is the code usage of max_reqs wrong/abusing it, or... ? :) - what's really expected to be counted by aio-nr is nr_events (er.. the value actually requested by userspace?) or the number of times io_setup(N, ) returned successfully (say, io contexts), regardless of the total/sum of their nr_events? - any other comments/suggestions are appreciated. Thanks in advance, -- Mauricio Faria de Oliveira IBM Linux Technology Center