From: Hamid Nassiby Subject: Re: Fwd: crypto accelerator driver problems Date: Sat, 1 Oct 2011 12:38:19 +0330 Message-ID: References: <20101230211900.GA22742@gondor.apana.org.au> <20110126070939.GA18150@gondor.apana.org.au> <20110126233315.GB26664@gondor.apana.org.au> <20110705065351.GA31107@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 To: Herbert Xu , linux-crypto@vger.kernel.org, Steffen Klassert Return-path: Received: from mail-bw0-f46.google.com ([209.85.214.46]:54517 "EHLO mail-bw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752320Ab1JAJIu (ORCPT ); Sat, 1 Oct 2011 05:08:50 -0400 Received: by bkbzt4 with SMTP id zt4so2691332bkb.19 for ; Sat, 01 Oct 2011 02:08:49 -0700 (PDT) In-Reply-To: <20110705065351.GA31107@gondor.apana.org.au> Sender: linux-crypto-owner@vger.kernel.org List-ID: Hi all, Referring my previous posts in crypto list related to our hardware aes accelerator project, I finally could deploy device in IPSec successfully. As I mentioned earlier, my driver registers itself in kernel as blkcipher for cbc(aes) as follows: static struct crypto_alg my_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name = "cbc-aes-my", .cra_priority = 400, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_init = fallback_init_blk, .cra_exit = fallback_exit_blk, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize = sizeof(struct my_aes_op), .cra_alignmask = 15, .cra_type = &crypto_blkcipher_type, .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(my_cbc_alg.cra_list), .cra_u = { .blkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MIN_KEY_SIZE, .setkey = my_setkey_blk, .encrypt = my_cbc_encrypt, .decrypt = my_cbc_decrypt, .ivsize = AES_IV_LENGTH, } } }; And my_cbc_encrypt function as PSEUDO/real code (for simplicity of representation) is as: static int my_cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { SOME__common_preparation_and_initializations; spin_lock_irqsave(&myloc, myflags); send_request_to_device(&dev); /*sends request to device. After processing request,device writes result to destination*/ while(!readl(complete_flag)); /*here we wait for a flag in device register space indicating completion. */ spin_unlock_irqrestore(&mylock, myflags); } With above code, I can successfully test IPSec gateway equipped with our hardware and get a 200Mbps throughput using Iperf. Now I am facing with another poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With above code I only utilize one of them. >From this point, we want to go a step further and utilize more than one aes engines of our device. Simplest solution appears to me is to deploy pcrypt/padata, made by Steffen Klassert. First instantiate in a dual core gateway : modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3 and test again. Running Iperf now gives me a very low throughput about 20Mbps while dmesg shows the following: BUG: workqueue leaked lock or atomic: kworker/0:1/0x00000001/10 last function: padata_parallel_worker+0x0/0x80 Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1 Call Trace: [] ? printk+0x18/0x1b [] process_one_work+0x177/0x370 [] ? padata_parallel_worker+0x0/0x80 [] worker_thread+0x127/0x390 [] ? worker_thread+0x0/0x390 [] kthread+0x74/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x6/0x10 BUG: scheduling while atomic: kworker/0:1/10/0x00000002 Modules linked in: pcrypt my_aes2 binfmt_misc bridge stp bnep sco rfcomm l2cap crc16 bluetooth rfkill ppdev acpi_cpufreq mperf cpufreq_stats cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave freq_table pci_slot sbs container video output sbshc battery iptable_filter ip_tables x_tables decnet ctr twofish_i586 twofish_generic twofish_common camellia serpent blowfish cast5 aes_i586 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key ac lp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm_oss evdev snd_mixer_oss snd_pcm psmouse serio_raw snd_seq_dummy pcspkr parport_pc parport snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event option usb_wwan snd_seq usbserial snd_timer snd_seq_device button processor iTCO_wdt iTCO_vendor_support snd intel_agp soundcore intel_gtt snd_page_alloc agpgart shpchp pci_hotplug ext3 jbd mbcache sr_mod cdrom sd_mod sg ata_generic pata_jmicron ata_piix pata_acpi libata floppy r8169 mii scsi_mod uhci_hcd ehci_hcd usbcore thermal fan fuse Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1 Call Trace: [] __schedule_bug+0x59/0x70 [] schedule+0x6a7/0xa70 [] ? show_trace_log_lvl+0x47/0x60 [] ? dump_stack+0x6e/0x75 [] ? process_one_work+0x1c8/0x370 [] ? padata_parallel_worker+0x0/0x80 [] worker_thread+0x1df/0x390 [] ? worker_thread+0x0/0x390 [] kthread+0x74/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x6/0x10 I must emphasize again that goal of deploying pcrypt/padata is to have more than one request present in our hardware (e.g. in a quad cpu system we'll have 4 encryption and 4 decryption requests sent into our hardware). Also I tried using pcrypt/padata in a single cpu system with one change in pcrypt_init_padata function of pcrypt.c: passing 4 as max_active parameter of alloc_workqueue. In fact I called alloc_workqueue as: alloc_workqueue(name, WQ_MEM_RECLAIM | WQ_CPU_INTENSIVE, 4); instead of : alloc_workqueue(name, WQ_MEM_RECLAIM | WQ_CPU_INTENSIVE, 1); But this did not give me 4 encryption requests. I know that one promising solution might be to choose ablkcipher over blkcipher scheme, but as we need a quicker solution and we are pressed with time, I request your comments about my problem. Can I solve my problem with pcrypt/padata anyway with any change in my current blkcipher driver en/deccrypt function or in pcrypt iself? Or should I take another way? Please take in mind that minor changes to our current solution is highly recommended because of our little time. Thanks in advance, Hamid.