Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753073AbbBYJ2Q (ORCPT ); Wed, 25 Feb 2015 04:28:16 -0500 Received: from mail-ig0-f181.google.com ([209.85.213.181]:58229 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753013AbbBYJ2N (ORCPT ); Wed, 25 Feb 2015 04:28:13 -0500 MIME-Version: 1.0 X-Originating-IP: [62.245.219.245] In-Reply-To: <20150224013752.GA25909@spacedout.fries.net> References: <20150224013752.GA25909@spacedout.fries.net> Date: Wed, 25 Feb 2015 10:28:12 +0100 Message-ID: Subject: Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl->family_data From: Thorsten Bschorr To: David Fries Cc: linux-kernel@vger.kernel.org, Evgeniy Polyakov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5018 Lines: 124 David, I can try to prepare a patch fixing the null-ptr access. Thanks for the hint with the search_count, I'll try it. 2015-02-24 2:37 GMT+01:00 David Fries : > Thorsten, > Are you planning on submitting a patch? > > FYI, I load the one wire module with > wire search_count=1 > in > /etc/modules > which tells it to search the bus once only. That works for me as > devices don't come and go on my bus, and it isn't scanning the bus > every few seconds just to find out nothing changed. If you are > finding devices vanish even when they aren't maybe it will be useful > to you as well, though the bigger problem might be why they are > vanishing in the first place. > > On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote: >> Hi, >> >> I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry >> Pi running 3.18.5 with several DS18S20 temperature sensors. I have already >> discussed the problem with Evgeniy. >> >> @Evgeniy & David: sorry for re-sending my message, my email >> accidentally contained a HTML part. >> >> >> >> w1_therm: w1_slave_show checks if the sensor uses external power. If this >> is the case, the mutex on dev->bus_mutex is unlocked while waiting 750 ms >> for the sensor to convert the temperature. Before reading the temperature, >> the mutex is again locked. >> >> During this sleep-time, the sensor could get detached, for example by w1.c: >> w1_search_process_cb not finding the sensor (*): >> !test_bit(W1_SLAVE_ACTIVE, &sl->flags) ==true. >> This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and >> nulls sl->family_data. >> >> w1_slave_show does not check if familiy_data is null after re-acquiring the >> bus_mutex resulting in a null-ptr access when storing data. >> >> After I've added checks for family_data!=0, I did not observe any more >> crashes; the other data of struct w1_slave seem to be valid as long as any >> thread executes w1_slave_show. >> >> >> I have added debug-output to w1.c and w1_therm.c, here's a log of a >> potential crash: >> >> [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb, >> !W1_SLAVE_ACTIVE, calling w1_slave_detach >> [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach >> destroy_now 1 >> [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0 >> [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching >> 10-000802d9c9e4 [d5fb8800]. >> [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> >> w1_family_notify >> [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling >> remove_slave >> [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave >> [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave >> refcnt -1 >> [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after >> sleep), family_data==0 >> [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before >> sleep), family_data==0 >> [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after >> sleep), family_data==0 >> [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after >> sleep), family_data==0 >> [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check >> [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> >> device_unregister >> [184200.931169] w1 w1_unref_slave -> kfree >> >> Note: When this crash happened, multiple threads were reading the sensor. >> >> >> I could trigger the problem several times, and each time device_unregister >> in w1_unref_slave was executed *after* w1_slave_show. In one case, the >> logged time-difference between the first family_data==0 message and >> device_unregister was about 8 seconds! >> I have not observed a w1_slave_show call after w1_unref_slave (as long as >> the device was not re-attached again). >> >> >From my observation, the w1_slave data seem to be valid as long as >> w1_slave_show is executed. >> My guess is that the call of sysfs_remove_groups in w1_family_notify hits a >> mutex (I did not add debug output here). >> >> >> (*) On my tiny raspberry, this happens from time to time with high CPU and >> external disc load (timing and/or electrical issues); it seems that the >> sensor does not respond in time to the (periodic) search. >> >> >> Please email me if you need further information. >> >> >> >> Best regards, >> Thorsten Bschorr >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > -- > David Fries PGP pub CB1EE8F0 > http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/