Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932160Ab0F2Qm4 (ORCPT ); Tue, 29 Jun 2010 12:42:56 -0400 Received: from office.altell.ru ([80.246.246.162]:27917 "EHLO office.altell.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755786Ab0F2Qmz (ORCPT ); Tue, 29 Jun 2010 12:42:55 -0400 Date: Tue, 29 Jun 2010 20:42:47 +0400 From: Dan Kruchinin To: LKML Cc: Steffen Klassert , Herbert Xu Subject: [PATCH 0/2] padata: Separate cpumasks for cb_cpus and parallel workers Message-ID: <20100629204247.392855dc@leibniz> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Altell-MailScanner-ID: B4D0D53E6.99532 X-Altell-MailScanner: Found to be clean X-Altell-MailScanner-From: kruchinin@altell.ru Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1735 Lines: 39 Hello. The main point of my patches is to make two separate cpumasks. One for parallel and another for serial workers(callback cpus). It'll perform to bind non-intersecting groups of CPUs for serial and parallel workers and do more thin tuning of padata subsystem. My tests shows that proper configuration of serial and parallel cpu masks gives a bit better performance. For example (aes-asm, sha1-generic. Two 16-core machines): 1) 1 point-to-point connection: Non-modified padata gives ~650Mbit of TCP and ~780Mbit of UDP When I exclude callback CPUs from parallel cpumask padata gives ~750Mbit of TCP and ~900Mbit of UDP. 2) 2 IPSEC tunnels between 16-core machines and 4 clients communicating via tunnels with each-other Non-modified padata gives ~1.5Gbit of UDP padata with non-intersecting cpumasks for parallel and serial workers gives ~1.8Gbit Besides the performance growth, there may be situations when serial job takes a lot of time. For example if I add several dozens of firewall rules, serial worker will work slower and padata_do_parallel will continue to enqueue requests into the queue of CPU serial worker executes on. It may significantly slow down parallelization and reordering because one CPU(that is shared by both parallel and serial workers) will always have more requests in its parallel queue than others CPUs(because serialization takes a lot of time). In such cases user may exclude callback CPUs from cpumask for parallel workers. -- W.B.R. Dan Kruchinin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/