Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752827Ab0AHHpu (ORCPT ); Fri, 8 Jan 2010 02:45:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752651Ab0AHHpt (ORCPT ); Fri, 8 Jan 2010 02:45:49 -0500 Received: from mail-fx0-f225.google.com ([209.85.220.225]:45228 "EHLO mail-fx0-f225.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751010Ab0AHHps (ORCPT ); Fri, 8 Jan 2010 02:45:48 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=jSW6O5ymU2kkZCqGT2w6RjeBQedpNjTy3Xs6bS9uSLsfpYORxmSOwOSEERkDs4Xh3E YC0nk707vuS0gc5LR7+u0sBZQoYv0/xBSE/uQ3BghJNf8XHPYX8BCwbrZRXFznTmmA6G VMeXp9/KoicrBEFlYF6xpcZiz8ftDzps2Rb2Y= Date: Fri, 8 Jan 2010 07:45:39 +0000 From: Jarek Poplawski To: Michael Breuer Cc: Stephen Hemminger , David Miller , akpm@linux-foundation.org, flyboy@gmail.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() Message-ID: <20100108074539.GA6205@ff.dom.local> References: <4B458B36.6050509@majjas.com> <20100107074756.GB6258@ff.dom.local> <4B459368.2000503@majjas.com> <4B45F841.8030407@majjas.com> <20100107180114.GB3088@del.dom.local> <4B4625BD.3070202@majjas.com> <20100107183545.GA3208@del.dom.local> <4B462B3C.90506@majjas.com> <20100107185040.GB3208@del.dom.local> <4B466A26.5070506@majjas.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B466A26.5070506@majjas.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6143 Lines: 122 On Thu, Jan 07, 2010 at 06:11:34PM -0500, Michael Breuer wrote: > Results: > * no MMAP, mtu=1500, neither alternative patch loaded: adapter crashed: > Jan 7 15:44:23 mail kernel: DRHD: handling fault status reg 2 > Jan 7 15:44:23 mail kernel: DMAR:[DMA Read] Request device [06:00.0] > fault addr fffb7bffe000 > Jan 7 15:44:23 mail kernel: DMAR:[fault reason 06] PTE Read access is > not set > Jan 7 15:44:23 mail kernel: sky2 0000:06:00.0: error interrupt > status=0x80000000 > Jan 7 15:44:23 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010) > Jan 7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24, 0] > lib/util_sock.c:539(read_fd_with_timeout) > Jan 7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24, 0] > lib/util_sock.c:1491(get_peer_addr_internal) > Jan 7 15:44:24 mail smbd[6572]: getpeername failed. Error was > Transport endpoint is not connected > Jan 7 15:44:24 mail smbd[6572]: read_fd_with_timeout: client 0.0.0.0 > read error = Connection timed out. > Jan 7 15:44:44 mail kernel: ------------[ cut here ]------------ > Jan 7 15:44:44 mail kernel: WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0xf3/0x164() > Jan 7 15:44:44 mail kernel: Hardware name: System Product Name > Jan 7 15:44:44 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit > queue 0 timed out > Jan 7 15:44:44 mail kernel: Modules linked in: ip6table_filter > ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat > iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd > nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit > tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP xt_dscp > xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath kvm_intel kvm > snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec > snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device > snd_pcm gspca_spca505 gspca_main firewire_ohci videodev v4l1_compat > firewire_core pcspkr v4l2_compat_ioctl32 snd_timer iTCO_wdt i2c_i801 > crc_itu_t iTCO_vendor_support snd soundcore snd_page_alloc sky2 wmi > asus_atk0110 hwmon fbcon tileblit font bitblit softcursor raid456 > async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx > raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm > agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfil > Jan 7 15:44:44 mail kernel: lrect [last unloaded: microcode] > Jan 7 15:44:44 mail kernel: Pid: 0, comm: swapper Tainted: G W BTW, was there any other oops saved before this one? ... > --- adapter dead after this --- rebooted. > * no MMAP; alternative 1 patch, mtu=1500; no errors; sustained transfer > rates about 25% lower than what I saw with mmap enabled...(before MMAP > enabled crashed). ?? Read below... > * no MMAP mtu=9000; ran ok at low transfer rates - when high rates > kicked in, got the sky2 interrupt error & things went south: > Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt > status=0x40000008 > Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt > status=0x40000008 > After this, remote connections broke and I rebooted... decided to rerun > w/o MMAP again before going back to MMAP and trying those other sky2 > options... > * Retest of no MMAP + Alternative 1 - just to confirm consistency. > Worked - no errors. Only version so far that allows the win7 backup to > complete. ??? Hmm... Alternative 1 or 2 doesn't even compile into when no MMAP, so it definitely needs re-retesting ;-) > * MMAP + NO DMAR + disable_msi=1... also works w/o errors... leaving > this one running for a while - also completed a backup successfully. > Fastest of the lot... about 3x faster than any other version, working or > not. Very interesting. It would be nice to give it a really long try, and if still true, try MMAP + NO DMAR only. > > I'm leaving this one running for now. Not retesting jumbo for now. Be > happy to help dig further. > > Tentative recommendations: > > 1) The af alternative patch seems rather necessary. First alternative > seems to be working, I'd suggest that be submitted and backported to > 2.6.32. > 2) Steven's pskb_may_pull patch also ought to be included and backported. > 3) Jumbo frame support for yukon2 should probably be disabled until/if > fixed. > 4) When possible I'll test dmar and disable_msi, and no dmar and no > disable_msi. When I first hit issues, I was running without DMAR, but > also without the above patches. I suppose the non-working permutations > need to be either fixed or invalidated (or well documented). > 5) It would be nice if someone with comparable hardware could reproduce > these issues. FWIW, I can only recreate the crash running windows backup > to a cifs share. Copying large files doesn't seem to do it. Could also > be some other interaction going on here that perhaps others aren't > running - would be happy to compare notes. > > Notes: > This *could* be coincidental, but maybe not... > With MMAP+NO DMAR + disable_msi there are far fewer ... actually almost > no... bind error reports... and no bind format error messages. With > NOMMAP and alternative one there are a few more bind error messages and > one format error message during the several hours that version was up. > All other configurations going back perhaps for two weeks have > significantly more bind error reports - and all versions show increasing > frequency of bind format errors (IPV6 only) in the roughly 10-15 minutes > preceding the lockup/crash/interrupt error messages. There are none > immediately preceding any crash, but perhaps there is some correlation > between the network errors and bind ipv6 packets. OK, for now let's make sure this MMAP + NO DMAR + disable_msi is really really working. Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/