Return-path: Received: from mail-wi0-f176.google.com ([209.85.212.176]:38133 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751820AbbAKOfr (ORCPT ); Sun, 11 Jan 2015 09:35:47 -0500 Received: by mail-wi0-f176.google.com with SMTP id ex7so10135963wid.3 for ; Sun, 11 Jan 2015 06:35:46 -0800 (PST) Message-ID: <54B28A3F.3000904@gmail.com> (sfid-20150111_153551_422571_D1481621) Date: Sun, 11 Jan 2015 15:35:43 +0100 From: =?UTF-8?B?RnJhbsOnb2lzIFZhbGVuZHVj?= MIME-Version: 1.0 To: Larry Finger , linux-wireless@vger.kernel.org Subject: Re: Kernel crash while copying big files since kernel 3.18 References: <54AA3953.20603@gmail.com> <54AAC925.6050602@lwfinger.net> <54AADC22.9050105@gmail.com> <54AAE532.6050104@lwfinger.net> In-Reply-To: <54AAE532.6050104@lwfinger.net> Content-Type: text/plain; charset=utf-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: Le 05/01/15 20:25, Larry Finger a écrit : > On 01/05/2015 12:46 PM, François Valenduc wrote: >> Le 05/01/15 18:25, Larry Finger a écrit : >>> On 01/05/2015 01:12 AM, François Valenduc wrote: >>>> Hello everybody, >>>> >>>> Since kernel 3.18, I encounter a kernel crash each time when I copy a >>>> big file (around 12 Gb) from an external USB drive to the harddrive of >>>> my laptop. >>>> I tried a bisection between kernels 3.17 and 3.18 and I was >>>> surprised to >>>> find that this has to do with the driver of the wireless card >>>> (rtl8188ee). However, I don't have problems if I copy the file >>>> while the >>>> rtl8188 module is not loaded. Unfortunately, the results of git-bisect >>>> are not totally conclusive because the kernel crash during boot >>>> when the >>>> wireless connection is established. Here are the last steps of the >>>> bisection: >>>> >>>> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee: >>>> Update driver to match Realtek release of 06282014 >>>> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b >>>> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove >>>> extra >>>> workqueue for enter/leave power state >>>> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7 >>>> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify >>>> base.{c,h} for new drivers >>>> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b >>>> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify >>>> cam.{c,h} and efuse.{c,h} for new drivers >>>> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd >>>> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify >>>> core.c for new drivers >>>> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8 >>>> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update >>>> power-save routines for 062814 driver >>>> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52 >>>> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci: >>>> Start modification for new drivers >>>> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954 >>>> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish >>>> modifying core routines for new drivers >>>> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626 >>>> # only skipped commits left to test >>>> # possible first bad commit: >>>> [c151aed6aa146e9587590051aba9da68b9370f9b] >>>> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014 >>>> # possible first bad commit: >>>> [f3a97e93814aeac3f13e857a0071726acc9bd626] >>>> rtlwifi: Finish modifying core routines for new drivers >>>> # possible first bad commit: >>>> [d3feae41a3473a0f7b431d6af4e092865d586e52] >>>> rtlwifi: Update power-save routines for 062814 driver >>>> # possible first bad commit: >>>> [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] >>>> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers >>>> # possible first bad commit: >>>> [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] >>>> rtlwifi: Modify base.{c,h} for new drivers >>>> # possible first bad commit: >>>> [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] >>>> rtlwifi: Modify core.c for new drivers >>>> # possible first bad commit: >>>> [38506ecefab911785d5e1aa5889f6eeb462e0954] >>>> rtlwifi: rtl_pci: Start modification for new drivers >>>> >>>> Can somebody explain what's happening ? I do the copy via Dolphin >>>> in KDE >>>> and the screen becomes black and the computer becomes totally >>>> unresponsive. So, I don't have access to the logs to see the trace of >>>> the problem. >>>> >>>> Thanks in advance for your help, >>> >>> There is a bug in 3.18 that is triggered when an O(3) memory >>> allocation fails. There is a patch to fix this at >>> http://marc.info/?l=linux-netdev&m=141999680927473&w=2 that has been >>> merged into wireless-drivers as commit >>> e9538cf4f90713eca71b1d6a74b4eae1d445c664. It will be applied to 3.18.X >>> when it makes it into mainline 3.19-rcY, but that has not yet happened. >>> >>> You could manually apply that patch to your kernel source, or you >>> could pull the git repo at http://github.com/lwfinger/rtlwifi_new.git. >>> That code has this patch already applied. >>> >>> If this patch does not fix the problem, you might be able to capture >>> at least part of the backtrace by starting the transfer and then >>> switching to the logging console. When a crash happens, photograph the >>> screen. On my system, I display it with CTRL-ALT-F10. I return to the >>> normal graphical console with CTRL-ALT-F7, but your distro may use >>> different virtual consoles. >>> >>> Larry >>> >>> >> Thanks for your help, it seems that your patch solves the problem. Now, >> the system doesn't crash anymore after copying the same large file than >> yesterday. I also see this message in the log: >> rtl_pci: Allocation of new skb failed in _rtl_pci_rx_interrupt which is >> added by your patch. >> Should I worry about this failure ? Or is it expected ? > > That is the positive proof that the new patch worked. Getting to that > condition without the patch would have crashed the system. That printk > is there to see if we were actually getting the condition and > recovering. As the code is obviously working now, that line will be > removed soon. > > Thanks, > > Larry > Do you still intend to remove the line about allocation failure in the log ? I made a backup of my root partition compressed with pixz and that line appeared 1350 times. So I removed the code which add this line. Is it really expected that it occurs so often ? pixz use multithreading to compress files and therefore at least 3 of the 4 CPU are used during around 20 minutes, but are you sure there is no other problems ? Thanks for your help, François Valenduc