2016-07-24 03:45:47

by Alan Curry

[permalink] [raw]
Subject: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

[1.] One line summary of the problem:
network data corruption (bisected to e5a4b0bb803b)

[2.] Full description of the problem/report:
Note: although my bisect ended at a commit from before 3.19, I have the
same symptom in all newer kernels I've tried, up to 4.6.4.

The commit was:

>commit e5a4b0bb803b39a36478451eae53a880d2663d5b
>Author: Al Viro <[email protected]>
>Date: Mon Nov 24 18:17:55 2014 -0500
>
> switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives

The symptom is that downloaded files (http, ftp, and probably other
protocols) have small corrupted segments (about 1-2 kilobytes long) in
random locations. Only downloads that sustain a high speed for at least a
few seconds are corrupted. Anything small enough to be received in less
than about 5 seconds is not affected.

If I download the same file twice in a row, the corruption is in different
places in each copy.

If I try to do a git clone, it fails a few seconds into the "Receiving
objects" stage with a deflate error.

[3.] Keywords: networking, carl9170

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Multiple versions are known to be affected, from 3.19 to 4.6.4

[4.2.] Kernel .config file:
For testing I built with make x86_64_defconfig followed by enabling the
carl9170 driver, which adds these lines:
CONFIG_ATH_COMMON=m
CONFIG_ATH_CARDS=m
CONFIG_CARL9170=m
CONFIG_CARL9170_LEDS=y
CONFIG_CARL9170_WPC=y

[5.] Most recent kernel version which did not have the bug:
That would be the predecessor of e5a4b0bb803b39a36478451eae53a880d2663d5b
which is v3.18-rc6-1620-g17836394e578

[6.] no Oops

[7.] A small shell script or example program which triggers the
problem (if possible)

This command fails reliably for me when running an affected kernel:

git clone git://git.kernel.org/pub/scm/git/git.git

(I'm including all the standard format stuff suggested by REPORTING-BUGS,
but I think you can skip from here to section 8.7 without missing anything
relevant)

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)

Mostly Debian 8.5 stable packages here.

GNU C 4.9.2
GNU Make 4.0
Binutils 2.25
Util-linux 2.25.2
Mount 2.25.2
Quota-tools 4.01
Linux C Library 2.19
Dynamic linker (ldd) 2.19
Procps 2.0.11
Kbd 1.15.5
Console-tools 1.15.5
Sh-utils 8.23
Modules Loaded <snipped since I'm not running the affected kernel now>

[8.2.] Processor information (from /proc/cpuinfo):
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU G620 @ 2.60GHz
stepping : 7
microcode : 0x14
cpu MHz : 1599.914
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
bugs :
bogomips : 5188.45
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU G620 @ 2.60GHz
stepping : 7
microcode : 0x14
cpu MHz : 2340.304
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
bugs :
bogomips : 5189.38
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

[8.3.] Module information (from /proc/modules):

When I tested with the x86_64_defconfig + carl9170 kernel, there were
hardly any modules built, and I reproduced the problem after booting with
init=/bin/sh, so no unnecessary modules were loaded. Currently running a
normal 4.6.4 kernel which is showing the bug, there are many more modules:

ip6t_REJECT 1524 1 - Live 0xffffffffa0393000
nf_reject_ipv6 2744 1 ip6t_REJECT, Live 0xffffffffa038f000
ip6table_filter 1419 1 - Live 0xffffffffa038b000
ip6_tables 13135 1 ip6table_filter, Live 0xffffffffa0383000
nf_nat_tftp 1158 0 - Live 0xffffffffa037f000
nf_conntrack_tftp 4017 1 nf_nat_tftp, Live 0xffffffffa037b000
nf_nat_irc 1574 0 - Live 0xffffffffa0377000
nf_conntrack_irc 3787 1 nf_nat_irc, Live 0xffffffffa0373000
nf_nat_ftp 1836 0 - Live 0xffffffffa036f000
nf_conntrack_ftp 6335 1 nf_nat_ftp, Live 0xffffffffa036a000
ipt_REJECT 1457 2 - Live 0xffffffffa0366000
nf_reject_ipv4 2483 1 ipt_REJECT, Live 0xffffffffa0362000
xt_TCPMSS 3187 1 - Live 0xffffffffa035e000
iptable_mangle 1544 1 - Live 0xffffffffa035a000
nf_log_ipv4 3667 1 - Live 0xffffffffa0356000
nf_log_common 2837 1 nf_log_ipv4, Live 0xffffffffa0352000
xt_LOG 1487 1 - Live 0xffffffffa034e000
xt_tcpudp 2522 5 - Live 0xffffffffa034a000
iptable_filter 1480 1 - Live 0xffffffffa0346000
iptable_nat 1786 0 - Live 0xffffffffa0342000
nf_conntrack_ipv4 10891 1 - Live 0xffffffffa033b000
nf_defrag_ipv4 1475 1 nf_conntrack_ipv4, Live 0xffffffffa0337000
nf_nat_ipv4 4349 1 iptable_nat, Live 0xffffffffa0332000
nf_nat 10019 4 nf_nat_tftp,nf_nat_irc,nf_nat_ftp,nf_nat_ipv4, Live 0xffffffffa032a000
nf_conntrack 59739 9 nf_nat_tftp,nf_conntrack_tftp,nf_nat_irc,nf_conntrack_irc,nf_nat_ftp,nf_conntrack_ftp,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat, Live 0xffffffffa030f000
ip_tables 12753 3 iptable_mangle,iptable_filter,iptable_nat, Live 0xffffffffa0307000
x_tables 14824 10 ip6t_REJECT,ip6table_filter,ip6_tables,ipt_REJECT,xt_TCPMSS,iptable_mangle,xt_LOG,xt_tcpudp,iptable_filter,ip_tables, Live 0xffffffffa02fd000
ide_generic 1377 0 [permanent], Live 0xffffffffa02f9000
ide_core 71087 1 ide_generic, Live 0xffffffffa02da000
snd_hda_codec_hdmi 28560 1 - Live 0xffffffffa02cd000
snd_hda_codec_realtek 50872 1 - Live 0xffffffffa02b7000
snd_hda_codec_generic 44234 1 snd_hda_codec_realtek, Live 0xffffffffa02a5000
snd_hda_intel 16223 1 - Live 0xffffffffa029b000
snd_hda_codec 66159 4 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_intel, Live 0xffffffffa027c000
snd_hda_core 37571 5 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_intel,snd_hda_codec, Live 0xffffffffa0267000
snd_pcm_oss 29084 0 - Live 0xffffffffa025a000
snd_pcm 64191 5 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core,snd_pcm_oss, Live 0xffffffffa023e000
snd_timer 16920 1 snd_pcm, Live 0xffffffffa0234000
snd_mixer_oss 12283 2 snd_pcm_oss, Live 0xffffffffa022d000
snd 48509 8 snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel,snd_hda_codec,snd_pcm_oss,snd_pcm,snd_timer,snd_mixer_oss, Live 0xffffffffa0217000
soundcore 4527 2 snd, Live 0xffffffffa0211000
af_packet 26418 6 - Live 0xffffffffa0205000
arc4 2040 2 - Live 0xffffffffa0201000
carl9170 65428 0 - Live 0xffffffffa01ea000
ehci_pci 3783 0 - Live 0xffffffffa01e6000
ath 16832 1 carl9170, Live 0xffffffffa01dd000
led_class 3520 1 carl9170, Live 0xffffffffa01d8000
mac80211 319536 1 carl9170, Live 0xffffffffa0170000
cfg80211 178240 3 carl9170,ath,mac80211, Live 0xffffffffa0132000
rfkill 13258 1 cfg80211, Live 0xffffffffa0129000
xts 3015 2 - Live 0xffffffffa0125000
gf128mul 5879 1 xts, Live 0xffffffffa0120000
dm_crypt 15756 1 - Live 0xffffffffa0118000
dm_mod 75130 16 dm_crypt, Live 0xffffffffa00f9000
raid1 23138 1 - Live 0xffffffffa00ef000
md_mod 88740 2 raid1, Live 0xffffffffa00ce000
coretemp 4358 0 - Live 0xffffffffa00c9000
hwmon 2866 1 coretemp, Live 0xffffffffa00c4000
hid_generic 1321 0 - Live 0xffffffffa00c0000
usbhid 22332 1 - Live 0xffffffffa00b5000
hid 54261 2 hid_generic,usbhid, Live 0xffffffffa00a1000
xhci_pci 4024 0 - Live 0xffffffffa009d000
xhci_hcd 84050 1 xhci_pci, Live 0xffffffffa0082000
uhci_hcd 18422 0 - Live 0xffffffffa0079000
ohci_hcd 16591 0 - Live 0xffffffffa0070000
ehci_hcd 35022 1 ehci_pci, Live 0xffffffffa0062000
usbcore 142047 9 carl9170,ehci_pci,usbhid,xhci_pci,xhci_hcd,uhci_hcd,ohci_hcd,ehci_hcd, Live 0xffffffffa002b000
usb_common 2222 1 usbcore, Live 0xffffffffa0027000
loop 14910 0 - Live 0xffffffffa001e000
fuse 65423 0 - Live 0xffffffffa0005000
rtc 5294 0 - Live 0xffffffffa0000000

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

0000-0000 : PCI Bus 0000:00
0000-0000 : dma1
0000-0000 : pic1
0000-0000 : timer0
0000-0000 : timer1
0000-0000 : keyboard
0000-0000 : PNP0800:00
0000-0000 : keyboard
0000-0000 : rtc
0000-0000 : dma page reg
0000-0000 : pic2
0000-0000 : dma2
0000-0000 : PNP0C04:00
0000-0000 : fpu
0000-0000 : ide_generic
0000-0000 : ide_generic
0000-0000 : ide_generic
0000-0000 : PCI Bus 0000:00
0000-0000 : vga+
0000-0000 : PCI Bus 0000:00
0000-0000 : ide_generic
0000-0000 : ACPI PM1a_EVT_BLK
0000-0000 : ACPI PM1a_CNT_BLK
0000-0000 : ACPI PM_TMR
0000-0000 : ACPI CPU throttle
0000-0000 : ACPI GPE0_BLK
0000-0000 : ACPI PM2_CNT_BLK
0000-0000 : pnp 00:06
0000-0000 : pnp 00:05
0000-0000 : pnp 00:04
0000-0000 : pnp 00:05
0000-0000 : pnp 00:01
0000-0000 : pnp 00:01
0000-0000 : pnp 00:01
0000-0000 : PCI conf1
0000-0000 : PCI Bus 0000:00
0000-0000 : pnp 00:05
0000-0000 : PCI Bus 0000:01
0000-0000 : 0000:01:00.0
0000-0000 : 0000:00:02.0
0000-0000 : 0000:00:1f.3
0000-0000 : 0000:00:1f.2
0000-0000 : ahci
0000-0000 : 0000:00:1f.2
0000-0000 : ahci
0000-0000 : 0000:00:1f.2
0000-0000 : ahci
0000-0000 : 0000:00:1f.2
0000-0000 : ahci
0000-0000 : 0000:00:1f.2
0000-0000 : ahci

00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : Video ROM
00000000-00000000 : reserved
00000000-00000000 : System ROM
00000000-00000000 : System RAM
00000000-00000000 : Kernel code
00000000-00000000 : Kernel data
00000000-00000000 : Kernel bss
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : ACPI Tables
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : RAM buffer
00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : reserved
00000000-00000000 : 0000:00:02.0
00000000-00000000 : PCI Bus 0000:01
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.0
00000000-00000000 : PCI MMCONFIG 0000 [bus 00-ff]
00000000-00000000 : reserved
00000000-00000000 : pnp 00:00
00000000-00000000 : reserved
00000000-00000000 : 0000:00:02.0
00000000-00000000 : PCI Bus 0000:03
00000000-00000000 : 0000:03:00.0
00000000-00000000 : xhci-hcd
00000000-00000000 : PCI Bus 0000:02
00000000-00000000 : 0000:02:00.0
00000000-00000000 : xhci-hcd
00000000-00000000 : 0000:00:1b.0
00000000-00000000 : ICH HD audio
00000000-00000000 : 0000:00:1f.3
00000000-00000000 : 0000:00:1f.2
00000000-00000000 : ahci
00000000-00000000 : 0000:00:1d.0
00000000-00000000 : ehci_hcd
00000000-00000000 : 0000:00:1a.0
00000000-00000000 : ehci_hcd
00000000-00000000 : 0000:00:16.0
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : IOAPIC 0
00000000-00000000 : HPET 0
00000000-00000000 : PNP0103:00
00000000-00000000 : reserved
00000000-00000000 : pnp 00:05
00000000-00000000 : reserved
00000000-00000000 : pnp 00:00
00000000-00000000 : reserved
00000000-00000000 : pnp 00:05
00000000-00000000 : pnp 00:00
00000000-00000000 : reserved
00000000-00000000 : pnp 00:00
00000000-00000000 : reserved
00000000-00000000 : pnp 00:00
00000000-00000000 : Local APIC
00000000-00000000 : reserved
00000000-00000000 : pnp 00:05
00000000-00000000 : System RAM
00000000-00000000 : RAM buffer

[8.5.] PCI information ('lspci -vvv' as root)

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=0c <?>
Kernel driver in use: snb_uncore

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Biostar Microtech Int'l Corp Device 110d
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
Region 2: Memory at c0000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a4] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-

00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fe608000 (64-bit, non-prefetchable) [size=16]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000

00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fe607000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Capabilities: [98] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: ehci-pci

00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
Subsystem: Biostar Microtech Int'l Corp Device 8229
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 33
Region 0: Memory at fe600000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0300c Data: 41a2
Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=1 ArbSelect=Fixed TC/VC=22
Status: NegoPending- InProgress-
Capabilities: [130 v1] Root Complex Link
Desc: PortNumber=0f ComponentID=00 EltType=Config
Link0: Desc: TargetPort=00 TargetComponent=00 AssocRCRB- LinkType=MemMapped LinkValid+
Addr: 00000000fed1c000
Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000d0000000-00000000d00fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0300c Data: 41a1
Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: pcieport

00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fe500000-fe5fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #1, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0300c Data: 41b1
Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fe400000-fe4fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #2, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 23
Region 0: Memory at fe606000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Capabilities: [98] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=0c <?>

00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
Subsystem: Biostar Microtech Int'l Corp Device 5207
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 26
Region 0: I/O ports at f0b0 [size=8]
Region 1: I/O ports at f0a0 [size=4]
Region 2: I/O ports at f090 [size=8]
Region 3: I/O ports at f080 [size=4]
Region 4: I/O ports at f060 [size=32]
Region 5: Memory at fe605000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0300c Data: 41d1
Capabilities: [70] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
Capabilities: [b0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
Subsystem: Biostar Microtech Int'l Corp Device 3108
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin C routed to IRQ 3
Region 0: Memory at fe604000 (64-bit, non-prefetchable) [size=256]
Region 4: I/O ports at f040 [size=32]

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: Biostar Microtech Int'l Corp Device 230a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at e000 [size=256]
Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Not readable
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 3b-03-00-00-68-4c-e0-00

02:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
Subsystem: Biostar Microtech Int'l Corp Device 6300
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fe500000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00002080
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Kernel driver in use: xhci_hcd

03:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
Subsystem: Biostar Microtech Int'l Corp Device 6300
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at fe400000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00002080
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Kernel driver in use: xhci_hcd

[8.6.] SCSI information (from /proc/scsi/scsi)

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: TSSTcorp Model: CDDVDW SH-222BB Rev: SB00
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD10EZEX-00W Rev: 1A01
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD10EZEX-00W Rev: 1A01
Type: Direct-Access ANSI SCSI revision: 05

[8.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

lsusb identifies my network device as:

Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]

I have version 1.9.9 of carl9170-1.fw in /lib/firmware

--
Alan Curry


2016-07-26 18:26:56

by alexmcwhirter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On 2016-07-26 09:59, Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2
> devices.
> I did not see any corruptions in any of the tests though. Can you tell
> me
> something about your wireless network too? I would like to know what
> router
> and firmware are you using? Also important: what's your wireless
> configuration?
> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>
> Probably the quickest and easiest way to get that information is by
> running
> the following commands as root, when you are connected to your wifi
> network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump
>
> (You can of course remove your device's MACs, but please do it
> consistently).
>
> Regards,
> Christian

I wonder if this is something that is only affecting certain NIC's? For
example, it see this issue on sun4u boxes with tigon3 and hme
interfaces. But on my sun4v boxes that have e1000 interfaces everything
works fine.

2016-07-26 04:57:31

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
>
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
>
> Can that sucker be reproduced with netcat? That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

>
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
>
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
Usage: dlbug select > outfile
or
dlbug noselect > outfile
outfile will contain the full HTTP response. Edit out the HTTP headers
and what's left should be a valid gzip if the download worked. */

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <sys/select.h>

int main(int argc, char **argv)
{
const char *request =
"GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
"Host: ftp.us.debian.org\r\n"
"\r\n";
ssize_t request_len = strlen(request), w, r, copied;
struct addrinfo hints, *host;
int sock, err, doselect;
char buf[10240];

if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
return 1;
}

doselect = !strcmp(argv[1], "select");

memset(&hints, 0, sizeof hints);
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;

err = getaddrinfo("ftp.us.debian.org", 0, &hints, &host);
if(err) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
return 1;
}

sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
if(sock < 0) {
perror("socket");
return 1;
}

((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
perror("connect");
return 1;
}

while(request_len) {
w = write(sock, request, request_len);
if(w < 0) {
perror("write to socket");
return 1;
}
request += w;
request_len -= w;
}

while((r = read(sock, buf, sizeof buf))) {
if(r < 0) {
perror("read from socket");
return 1;
}

copied = 0;
while(copied < r) {
w = write(1, buf+copied, r-copied);
if(w < 0) {
perror("write to stdout");
return 1;
}
copied += w;
}

if(doselect) {
fd_set rfds;
FD_ZERO(&rfds);
FD_SET(sock, &rfds);
select(sock+1, &rfds, 0, 0, 0);
}
}

return 0;
}

--
Alan Curry

2016-07-26 04:58:25

by alexmcwhirter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

> Thanks for the detailed bug-report. I looked around the web to see if
> it
> was already reported or not. If found that this issue was reported
> before:
> [0], [1] and [2] by the same person (CC'ed). One difference is that the
> reporter had this issue with rsync on multiple SPARC systems. I ran a
> git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't
> find
> any patches directly referencing the commit. I'm not sure if this issue
> has been fixed by now or not. I would greatly appreciate any comment
> about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).

I can confirm the issue i was having with this commit still exists on
sparc with the latest mainline kernel.

2016-07-28 00:37:35

by alexmcwhirter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On 2016-07-27 20:31, Al Viro wrote:
> On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:
>
>> > I highly expect both my issue and OP's issue to revolve not around
>> > commit e5a4b0bb803b specifically, but around other code that no longer
>> > behaves as expected because of it.
>>
>> Indeed, and that fault address rounding bug occurs two other times
>> in arch/sparc/lib/user_fixup.c
>>
>> The mentioned patchwork patch should fix the bug and I'll get that
>> into my sparc tree, merged, and queued up for -stable ASAP.
>
> Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
> bugs in case of x86_64...

I'm going to go ahead and say this is where my issue and the op's issue
begin to branch apart from one another. He's seeing this on all incoming
data, whereas i am only seeing it on ssl data and not on sun4v.

At this point i would say data from my issue is only going to cloud this
issue as they seem to be two completely different issues revolving
around the same commit. If i come across any relevant data for x86_64
ill be sure to post it if this isn't resolved by then, but for now i'm
going to refrain from submitting anything sparc related.

2016-07-28 00:31:12

by Al Viro

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:

> > I highly expect both my issue and OP's issue to revolve not around
> > commit e5a4b0bb803b specifically, but around other code that no longer
> > behaves as expected because of it.
>
> Indeed, and that fault address rounding bug occurs two other times
> in arch/sparc/lib/user_fixup.c
>
> The mentioned patchwork patch should fix the bug and I'll get that
> into my sparc tree, merged, and queued up for -stable ASAP.

Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
bugs in case of x86_64...

2016-07-27 23:45:47

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

From: [email protected]
Date: Wed, 27 Jul 2016 19:02:40 -0400

> On 2016-07-27 14:04, [email protected] wrote:
>> Just to add some more information to this, the corruption seems to
>> effect ssh as well.
>> Using a sun hme interface, occasionally upon an ssh connection it will
>> refuse to authenticate a client with either password or cert
>> authentication. Using wireshark to capture and decrypt the packets
>> between the two machines, the data coming from the server seems good,
>> but the data received by the server from the client is essentially
>> garbage. Note that the client is sending valid data, but the server is
>> corrupting it upon receipt. Closing the connection and starting a new
>> one will remedy the login issue, but you do occasionally see
>> corruption on the server side sporadically.
>> So far this only seems to occur on incoming data, outgoing data seems
>> fine.
>
> Also, there is another patch the references this commit on sparc64 at
> least.
>
> https://patchwork.kernel.org/patch/9221895/
>
> I highly expect both my issue and OP's issue to revolve not around
> commit e5a4b0bb803b specifically, but around other code that no longer
> behaves as expected because of it.

Indeed, and that fault address rounding bug occurs two other times
in arch/sparc/lib/user_fixup.c

The mentioned patchwork patch should fix the bug and I'll get that
into my sparc tree, merged, and queued up for -stable ASAP.

2016-07-27 06:39:46

by Kalle Valo

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

[email protected] writes:

> On 2016-07-26 09:59, Christian Lamparter wrote:
>> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2
>> devices.
>> I did not see any corruptions in any of the tests though. Can you
>> tell me
>> something about your wireless network too? I would like to know what
>> router
>> and firmware are you using? Also important: what's your wireless
>> configuration?
>> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>>
>> Probably the quickest and easiest way to get that information is by
>> running
>> the following commands as root, when you are connected to your wifi
>> network
>> and post the results:
>> # iw dev wlan0 link
>> # iw dev wlan0 scan dump
>>
>> (You can of course remove your device's MACs, but please do it
>> consistently).
>>
>> Regards,
>> Christian
>
> I wonder if this is something that is only affecting certain NIC's?
> For example, it see this issue on sun4u boxes with tigon3 and hme
> interfaces. But on my sun4v boxes that have e1000 interfaces
> everything works fine.

Kernel configuration might also make a difference. Trying to find an
Kconfig option which affects this could be useful.

--
Kalle Valo

2016-07-27 10:32:41

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Al Viro wrote:
>
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:

I tried it on top of v4.6.4 and on top of the very recent v4.7-2509-g59ebc44
from Linus, and still got corruption.

>
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b7de71f..0ee5995 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
> if (!chunk)
> return 0;
>
> - if (msg_data_left(msg) < chunk) {
> + if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
> if (__skb_checksum_complete(skb))
> goto csum_error;
> if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
>

--
Alan Curry

2016-07-27 18:14:59

by alexmcwhirter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Just to add some more information to this, the corruption seems to
effect ssh as well.

Using a sun hme interface, occasionally upon an ssh connection it will
refuse to authenticate a client with either password or cert
authentication. Using wireshark to capture and decrypt the packets
between the two machines, the data coming from the server seems good,
but the data received by the server from the client is essentially
garbage. Note that the client is sending valid data, but the server is
corrupting it upon receipt. Closing the connection and starting a new
one will remedy the login issue, but you do occasionally see corruption
on the server side sporadically.

So far this only seems to occur on incoming data, outgoing data seems
fine.

2016-07-27 23:13:26

by alexmcwhirter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On 2016-07-27 14:04, [email protected] wrote:
> Just to add some more information to this, the corruption seems to
> effect ssh as well.
>
> Using a sun hme interface, occasionally upon an ssh connection it will
> refuse to authenticate a client with either password or cert
> authentication. Using wireshark to capture and decrypt the packets
> between the two machines, the data coming from the server seems good,
> but the data received by the server from the client is essentially
> garbage. Note that the client is sending valid data, but the server is
> corrupting it upon receipt. Closing the connection and starting a new
> one will remedy the login issue, but you do occasionally see
> corruption on the server side sporadically.
>
> So far this only seems to occur on incoming data, outgoing data seems
> fine.

Also, there is another patch the references this commit on sparc64 at
least.

https://patchwork.kernel.org/patch/9221895/

I highly expect both my issue and OP's issue to revolve not around
commit e5a4b0bb803b specifically, but around other code that no longer
behaves as expected because of it.

2016-07-27 01:14:42

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
> I did not see any corruptions in any of the tests though. Can you tell me
> something about your wireless network too? I would like to know what router
> and firmware are you using? Also important: what's your wireless configuration?

The router/access-point is a Comcast-issued Technicolor cable modem, model
TC8305C. The only thing I can find on it that looks like it might identify
a firmware version is this:

System Software Version

eMTA & DOCSIS Software Version: 01.E6.01.22.25
Packet Cable: 2.0

I assume Comcast pushes firmware updates whenever they feel like it.

There is possibly another clue. I get this message from the kernel sometimes:

ieee80211 phy0: invalid plcp cck rate (0).

I had this message appearing long before the data corruption bug started.
It never correlated with any actual problems, so I turned down the priority
level of the message to get it off the console, and forgot about it. I
was unable to discover what a "plcp" or "cck" is so the message means
nothing to me.

> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>
> Probably the quickest and easiest way to get that information is by running
> the following commands as root, when you are connected to your wifi network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump

Connected to cc:03:fa:bf:e9:ea (on wlan0)
SSID: HOME-E9EA
freq: 2462
RX: 20726719 bytes (106483 packets)
TX: 5902478 bytes (44707 packets)
signal: -43 dBm
tx bitrate: 54.0 MBit/s

bss flags: short-slot-time
dtim period: 1
beacon int: 100

BSS cc:03:fa:bf:e9:ea(on wlan0) -- associated
TSF: 236407205748 usec (2d, 17:40:07)
freq: 2462
beacon interval: 100 TUs
capability: ESS Privacy ShortSlotTime (0x0411)
signal: -33.00 dBm
last seen: 634452 ms ago
Information elements from Probe Response frame:
SSID: HOME-E9EA
Supported rates: 1.0* 2.0* 5.5* 11.0* 18.0 24.0* 36.0 54.0
DS Parameter set: channel 11
ERP: <no flags>
ERP D4.0: <no flags>
RSN: * Version: 1
* Group cipher: TKIP
* Pairwise ciphers: CCMP TKIP
* Authentication suites: PSK
* Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
Extended supported rates: 6.0* 9.0 12.0* 48.0
HT capabilities:
Capabilities: 0x18bd
RX LDPC
HT20
SM Power Save disabled
RX Greenfield
RX HT20 SGI
TX STBC
No RX STBC
Max AMSDU length: 7935 bytes
DSSS/CCK HT40
Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
Minimum RX AMPDU time spacing: 8 usec (0x06)
HT RX MCS rate indexes supported: 0-23
HT TX MCS rate indexes are undefined
HT operation:
* primary channel: 11
* secondary channel offset: no secondary
* STA channel width: 20 MHz
* RIFS: 1
* HT protection: nonmember
* non-GF present: 1
* OBSS non-GF present: 1
* dual beacon: 0
* dual CTS protection: 0
* STBC beacon: 0
* L-SIG TXOP Prot: 0
* PCO active: 0
* PCO phase: 0
WPS: * Version: 1.0
* Wi-Fi Protected Setup State: 2 (Configured)
* Response Type: 3 (AP)
* UUID: 6d1b1911-14a9-391c-cdee-89850a5aa1ef
* Manufacturer: Technicolor
* Model: Technicolor
* Model Number: 123456
* Serial Number: 0000001
* Primary Device Type: 6-0050f204-1
* Device name: TechnicolorAP
* Config methods: Display
* RF Bands: 0x1
* Unknown TLV (0x1049, 6 bytes): 00 37 2a 00 01 20
WPA: * Version: 1
* Group cipher: TKIP
* Pairwise ciphers: CCMP TKIP
* Authentication suites: PSK
* Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
WMM: * Parameter version 1
* u-APSD
* BE: CW 15-1023, AIFSN 3
* BK: CW 15-1023, AIFSN 7
* VI: CW 7-15, AIFSN 2, TXOP 3008 usec
* VO: CW 3-7, AIFSN 2, TXOP 1504 usec

--
Alan Curry

2016-07-28 01:22:59

by Al Viro

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On Wed, Jul 27, 2016 at 08:26:48PM -0400, [email protected] wrote:

> I'm going to go ahead and say this is where my issue and the op's issue
> begin to branch apart from one another. He's seeing this on all incoming
> data, whereas i am only seeing it on ssl data and not on sun4v.
>
> At this point i would say data from my issue is only going to cloud this
> issue as they seem to be two completely different issues revolving around
> the same commit. If i come across any relevant data for x86_64 ill be sure
> to post it if this isn't resolved by then, but for now i'm going to refrain
> from submitting anything sparc related.

Which just might mean that we have *three* issues here -
(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
(2) your ssl-only corruption
(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
anywhere, and no multi-segment recvmsg(). Which would strongly argue in
favour of some kind of copy_page_to_iter() breakage triggered when handling
a fragmented skb, as in (1). Except that I don't see anything similar in
x86_64 uaccess primitives...

2016-07-26 14:00:01

by Christian Lamparter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On Tuesday, July 26, 2016 4:57:03 AM CEST Alan Curry wrote:
> Al Viro wrote:
> > On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> >
> > > > The symptom is that downloaded files (http, ftp, and probably other
> > > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > > random locations. Only downloads that sustain a high speed for at least a
> > > > few seconds are corrupted. Anything small enough to be received in less
> > > > than about 5 seconds is not affected.
> >
> > Can that sucker be reproduced with netcat? That would eliminate all issues
> > with multi-iovec recvmsg(2), narrowing the things down quite bit.
>
> netcat seems to be immune. Comparing strace results, I didn't see any
> recvmsg() calls in the other programs that have had the problem, but there
> is an interesting difference: netcat calls select() to wait for the socket
> to be ready for reading, where my other test programs just call read() and
> let it block until ready.
>
> So I wrote a small test program to isolate that difference. It downloads
> a file using only read() and write() and a hardcoded HTTP request. It has
> a select mode (main loop alternates read() and select() on the TCP socket)
> and a noselect mode (main loop just read()s the TCP socket).
>
> The program is included at the bottom of this message.
>
> I ran it several times in both modes and got corruption if and only if the
> noselect mode was used.
>
> >
> > Another thing (and if that works, it's *NOT* a proper fix - it would be
> > papering over the problem, but at least it would show where to look for
> > it) - try (on top of mainline) the following delta:
> >
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
>
> Will try that patch soon. Meanwhile, here's my test:
>
> /* Demonstration program "dlbug".
> Usage: dlbug select > outfile
> or
> dlbug noselect > outfile
> outfile will contain the full HTTP response. Edit out the HTTP headers
> and what's left should be a valid gzip if the download worked. */
> [...]
Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
I did not see any corruptions in any of the tests though. Can you tell me
something about your wireless network too? I would like to know what router
and firmware are you using? Also important: what's your wireless configuration?
(WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)

Probably the quickest and easiest way to get that information is by running
the following commands as root, when you are connected to your wifi network
and post the results:
# iw dev wlan0 link
# iw dev wlan0 scan dump

(You can of course remove your device's MACs, but please do it consistently).

Regards,
Christian

2016-07-24 17:45:24

by Christian Lamparter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Hello,

I added Al Viro to the CC (probably not necessary...)

On Sunday, July 24, 2016 3:35:14 AM CEST Alan Curry wrote:
> [1.] One line summary of the problem:
> network data corruption (bisected to e5a4b0bb803b)
>
> [2.] Full description of the problem/report:
> Note: although my bisect ended at a commit from before 3.19, I have the
> same symptom in all newer kernels I've tried, up to 4.6.4.
>
> The commit was:
>
> >commit e5a4b0bb803b39a36478451eae53a880d2663d5b
> >Author: Al Viro <[email protected]>
> >Date: Mon Nov 24 18:17:55 2014 -0500
> >
> > switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives
>
> The symptom is that downloaded files (http, ftp, and probably other
> protocols) have small corrupted segments (about 1-2 kilobytes long) in
> random locations. Only downloads that sustain a high speed for at least a
> few seconds are corrupted. Anything small enough to be received in less
> than about 5 seconds is not affected.
>
> If I download the same file twice in a row, the corruption is in different
> places in each copy.
>
> If I try to do a git clone, it fails a few seconds into the "Receiving
> objects" stage with a deflate error.

Thanks for the detailed bug-report. I looked around the web to see if it
was already reported or not. If found that this issue was reported before:
[0], [1] and [2] by the same person (CC'ed). One difference is that the
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't find
any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).

As for carl9170: I'm not sure what the driver or firmware can do about
this at this time. You can try to disable the hardware crypto by setting
nohwcrypt via the module option. However, this might not do anything at all.

> [3.] Keywords: networking, carl9170
>
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Multiple versions are known to be affected, from 3.19 to 4.6.4
>
> [4.2.] Kernel .config file:
> For testing I built with make x86_64_defconfig followed by enabling the
> carl9170 driver, which adds these lines:
> CONFIG_ATH_COMMON=m
> CONFIG_ATH_CARDS=m
> CONFIG_CARL9170=m
> CONFIG_CARL9170_LEDS=y
> CONFIG_CARL9170_WPC=y
>
> [5.] Most recent kernel version which did not have the bug:
> That would be the predecessor of e5a4b0bb803b39a36478451eae53a880d2663d5b
> which is v3.18-rc6-1620-g17836394e578
>
> [6.] no Oops
>
> [7.] A small shell script or example program which triggers the
> problem (if possible)
>
> This command fails reliably for me when running an affected kernel:
>
> git clone git://git.kernel.org/pub/scm/git/git.git
>
> (I'm including all the standard format stuff suggested by REPORTING-BUGS,
> but I think you can skip from here to section 8.7 without missing anything
> relevant)
Yes, I removed it for the most part. If anyone is interested in the details:
Here's a link to the original post @LKML [3].

>
> [8.] Environment
> [8.1.] Software (add the output of the ver_linux script here)
>
> Mostly Debian 8.5 stable packages here.
>
> [8.3.] Module information (from /proc/modules):
>
> When I tested with the x86_64_defconfig + carl9170 kernel, there were
> hardly any modules built, and I reproduced the problem after booting with
> init=/bin/sh, so no unnecessary modules were loaded. Currently running a
> normal 4.6.4 kernel which is showing the bug.
>
> [...]
> [8.7.] Other information that might be relevant to the problem
> (please look in /proc and include all information that you
> think to be relevant):
>
> lsusb identifies my network device as:
>
> Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]
>
> I have version 1.9.9 of carl9170-1.fw in /lib/firmware
Just one additional question: Is the TL-WN821N connected to a USB3 port?

Regards,
Christian

[0] <https://lists.debian.org/debian-sparc/2016/06/msg00160.html>
[1] <https://marc.info/?l=gentoo-sparc&m=145766845820114&w=2>
[2] <http://permalink.gmane.org/gmane.linux.ports.sparc/22507>
[3] <https://lkml.org/lkml/2016/7/23/184>




2016-07-26 04:32:25

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Christian Lamparter wrote:
>
> As for carl9170: I'm not sure what the driver or firmware can do about
> this at this time. You can try to disable the hardware crypto by setting
> nohwcrypt via the module option. However, this might not do anything at all.

The nohwcrypt parameter didn't make any difference.

> >
> > lsusb identifies my network device as:
> >
> > Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]
> >
> > I have version 1.9.9 of carl9170-1.fw in /lib/firmware
> Just one additional question: Is the TL-WN821N connected to a USB3 port?

It never has been before. I tried it today and it made no difference.

--
Alan Curry

2016-07-24 19:02:42

by Al Viro

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:

> > The symptom is that downloaded files (http, ftp, and probably other
> > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > random locations. Only downloads that sustain a high speed for at least a
> > few seconds are corrupted. Anything small enough to be received in less
> > than about 5 seconds is not affected.

Can that sucker be reproduced with netcat? That would eliminate all issues
with multi-iovec recvmsg(2), narrowing the things down quite bit.

Another thing (and if that works, it's *NOT* a proper fix - it would be
papering over the problem, but at least it would show where to look for
it) - try (on top of mainline) the following delta:

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..0ee5995 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
if (!chunk)
return 0;

- if (msg_data_left(msg) < chunk) {
+ if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
if (__skb_checksum_complete(skb))
goto csum_error;
if (skb_copy_datagram_msg(skb, hlen, msg, chunk))

2016-08-03 23:41:36

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Christian Lamparter wrote:
>
> One thing I noticed in your previous post is that you "might" not have
> draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use."
> in your dmesg logs? If so, check if you can force your AP to use WPA2
> with CCMP/AES only.
>

Yes, I've had that message. The reason wan't on the AP though. My
wpa_supplicant.conf only had TKIP enabled, because that's what was in the
sample configuration file I started with. Adding CCMP there worked, and in
that mode I'm no longer getting any corrupted packets.

If I'd paid attention to the encryption options when setting up this network
originally, I would have had CCMP the whole time, with no corrupted packets,
and never would have found the iov iterator bug...

--
Alan Curry

2016-08-03 03:50:21

by Alan Curry

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Al Viro wrote:
>
> Which just might mean that we have *three* issues here -
> (1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
> (2) your ssl-only corruption
> (3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
> anywhere, and no multi-segment recvmsg(). Which would strongly argue in
> favour of some kind of copy_page_to_iter() breakage triggered when handling
> a fragmented skb, as in (1). Except that I don't see anything similar in
> x86_64 uaccess primitives...
>

I think I've solved (3) at least...

Using the twin weapons of printk and stubbornness, I have built a working
theory of the bug. I haven't traced it all the way through, so my explanation
may be partly wrong. I do have a patch that eliminates the symptom in all my
tests though. Here's what happens:

A corrupted packet somehow arrives in skb_copy_and_csum_datagram_msg().
During downloads at reasonably high speed, about 0.1% of my incoming
packets are bad. Probably because the access point is that suspicious
Comcast thing.

skb_copy_and_csum_datagram_msg() does this:

if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
chunk, &csum))
goto fault;
if (csum_fold(csum))
goto csum_error;

skb_copy_and_csum_datagram() copies the bad data, computes the checksum,
and *advances the iterator*. The checksum is bad, so it goes to
csum_error, which returns without indicating success to userspace, but the
bad data is in the userspace buffer, and since the iterator has advanced,
the proper data doesn't get written to the proper place when it arrives in a
retransmission. The same iterator is still used because we're still in the
same syscall (I guess - this is one of the parts I didn't check out).

My ugly patch fixes this in the most obvious way: make a local copy of
msg->msg_iter before the call to skb_copy_and_csum_datagram(), and copy it
back if the checksum is bad, just before "goto csum_error;". (I wonder if the
other failure exits from this function might need to do the same thing.)

You can probably reproduce this problem if you deliberately inject some
bad TCP checksums into a stream. Just make sure the receiving machine is
in a blocking read() on the socket when the bad packet arrives. You may
need to resend the offending packet afterward with the checksum corrected.

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..574d4bf 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -730,6 +730,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
{
__wsum csum;
int chunk = skb->len - hlen;
+ struct iov_iter save_iter;

if (!chunk)
return 0;
@@ -741,11 +742,14 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
goto fault;
} else {
csum = csum_partial(skb->data, hlen, skb->csum);
+ memcpy(&save_iter, &msg->msg_iter, sizeof save_iter);
if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
chunk, &csum))
goto fault;
- if (csum_fold(csum))
+ if (csum_fold(csum)) {
+ memcpy(&msg->msg_iter, &save_iter, sizeof save_iter);
goto csum_error;
+ }
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
netdev_rx_csum_fault(skb->dev);
}

--
Alan Curry

2016-08-03 12:44:35

by Christian Lamparter

[permalink] [raw]
Subject: Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

On Wednesday, August 3, 2016 3:49:26 AM CEST Alan Curry wrote:
> Al Viro wrote:
> >
> > Which just might mean that we have *three* issues here -
> > (1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
> > (2) your ssl-only corruption
> > (3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
> > anywhere, and no multi-segment recvmsg(). Which would strongly argue in
> > favour of some kind of copy_page_to_iter() breakage triggered when handling
> > a fragmented skb, as in (1). Except that I don't see anything similar in
> > x86_64 uaccess primitives...
> >
>
> I think I've solved (3) at least...
>
> Using the twin weapons of printk and stubbornness, I have built a working
> theory of the bug. I haven't traced it all the way through, so my explanation
> may be partly wrong. I do have a patch that eliminates the symptom in all my
> tests though. Here's what happens:
>
> A corrupted packet somehow arrives in skb_copy_and_csum_datagram_msg().
> During downloads at reasonably high speed, about 0.1% of my incoming
> packets are bad. Probably because the access point is that suspicious
> Comcast thing.

Thanks for being very persistent with this. I think I'm able to reproduce
this now (on any hardware... like r8169 ethernet) as long as the following
"traffic policy" is enacted on the HTTP - Server:

# tc qdisc add dev eth0 root netem corrupt 0.1%

(This needs the "Network Emulation" Sched CONFIG_NET_SCH_NETEM [0].)

With your tool (changed to point to my apache local server). I'm seeing
corruptions in the "noselect" case. Running it in "select" mode however
and the resulting files have no corruptions.

About AR9170 corruption issues: I know of one report that the AR9170's
Encryption Engine can cause corruptions [1]. In this case outgoing
data was corrupted which lead to deauths/disassocs since the AP was
basically sending out multicast deauths/disassocs with bad addresses.
However, "nohwcrypt" should have made a difference there since the
software decryption would discard the faulty package due the message
integrety checks.

Another source for corruptions could be the USB-PHY (FUSB200) in the
AR9170 [2]. I know it's causing problems for the ath9k_htc. However
not everyone is affected.

One thing I noticed in your previous post is that you "might" not have
draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use."
in your dmesg logs? If so, check if you can force your AP to use WPA2
with CCMP/AES only.

Regards,
Christian

[0] <http://www.spinics.net/lists/linux-wireless/msg60104.html>
[1] <https://wiki.linuxfoundation.org/networking/netem>
[2] <https://github.com/qca/open-ath9k-htc-firmware/wiki/usb-related-issues>