Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Lenovo ThinkPad P70, data corruption issues?

  1. #1
    Join Date
    Jun 2017
    Posts
    29

    Default Lenovo ThinkPad P70, data corruption issues?

    I recently bought a refurb ThinkPad P70 (Xeon E3-1505Mv5, nVidia M4000M, UHD screen, 2x16GB RAM), and am having an odd data corruption issue with openSUSE Leap 42.3 (either the stock 4.4.79+ or a more recent 4.12). Specifically, anything that uses sockets -- network or UNIX domain -- gets sporadic data corruption. Sometimes it's one or two flipped bits, sometimes it's more than just a few bits in a byte. There appears to be some rough clustering, but there are no sequences of bytes that are bad.

    One pattern I did pick out is that the bottom 5 bits of all of the affected file offsets (using scp to copy the files) are all ones.

    This happens with both inbound, outbound, and local rsync (since rsync has some additional checks), so if I rsync from a remote machine, rsync between two directories on the local machine (which uses a UNIX domain socket), or rsync to a remote machine, I get the same kinds of comparisons or protocol failures. I've seen it with inbound ftp (downloading RPMs, where of course there are checksums), too.

    I have not seen any errors with simple file copy using cp -r or tar through a pipe.

    This does not happen if I boot Knoppix 7.7.1 (based on kernel 4.7.9).

    I have run a full pass of memtest86 with no errors. I have tried using just one DIMM at a time and changing which slot I use, and using the rear panel slots vs. the under-keyboard slots; no change in the symptoms. The BIOS is up to date, presumably with the microcode fix.

    I am presently running diagnostics; that will take a while longer. I am also going to try a vanilla kernel (provided by openSUSE RPMs) to see if that makes a difference.

    What I have to decide is whether I return it (and most likely eat a 15% restocking fee; I didn't see any indication of this under Windows, which I don't plan to use but I kept the SSD with it installed), get the mobo replaced under warranty (have to ship to Lenovo, presumably at my expense), or find a solution on my own to this. I have not been able to find anything on the net about a problem like this, either. It's an odd one; the symptoms look generally memory-ish, but it happened with two different DIMMs in different slots, and it's only happening with sockets. It's also apparently happening above the transport layer, since TCP checksums aren't catching it.

    Anyone have any thoughts here?

  2. #2
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,337
    Blog Entries
    15

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Hi
    A Samsung SSD? If so, another user with the same issue... https://forums.opensuse.org/showthre...-in-RAM-memory
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  3. #3
    Join Date
    Jun 2017
    Posts
    29

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    A Samsung SSD? If so, another user with the same issue... https://forums.opensuse.org/showthre...-in-RAM-memory
    Doesn't look like the same thing at all, and in any event, this happened with two different SSD's (one may have been a Samsung; the other, that I'm currently using is a Crucial MX300).

  4. #4
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,337
    Blog Entries
    15

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by rlk View Post
    Doesn't look like the same thing at all, and in any event, this happened with two different SSD's (one may have been a Samsung; the other, that I'm currently using is a Crucial MX300).
    Hi
    I have a Crucial running on one of my 42.3 test systems;
    Code:
    Model Family:     Crucial/Micron RealSSD C300/M500
    Device Model:     Crucial_CT120M500SSD1
    You have checked out the SSD with smartctl and also firmware up to date?

    If you really want to test, use prime95, best tool for stress testing
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  5. #5
    Join Date
    Jun 2017
    Posts
    29

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    I have a Crucial running on one of my 42.3 test systems;
    Code:
    Model Family:     Crucial/Micron RealSSD C300/M500
    Device Model:     Crucial_CT120M500SSD1
    You have checked out the SSD with smartctl and also firmware up to date?

    If you really want to test, use prime95, best tool for stress testing
    Again, I've seen the same problem with an installation on two different SSD's (different brands and models -- I'm at home now, and the other one's a SanDisk M400), and two separate DIMMs (singly and in combination). And it only strikes uses of sockets; I have not seen it in any other context. I have no reason to think it's related to disk at all.

  6. #6
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,337
    Blog Entries
    15

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by rlk View Post
    Again, I've seen the same problem with an installation on two different SSD's (different brands and models -- I'm at home now, and the other one's a SanDisk M400), and two separate DIMMs (singly and in combination). And it only strikes uses of sockets; I have not seen it in any other context. I have no reason to think it's related to disk at all.
    Hi
    Like I said, prime95 stress test will test your ram and confirm if it's that....
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  7. #7
    Join Date
    Jun 2017
    Posts
    29

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    So, some more information overnight.

    It appears that if I remove the xf86-video-nouveau package and use the vanilla kernel (4.12.9-1.gf2ab6ba-vanilla), this problem goes away. This seems distinctly odd, and this holds even if I boot to runlevel 3 and never start the X server to begin with (and blacklist the nouveau kernel driver). However, with either nouveau installed or using the default kernel of the same vintage I have the data corruption issue I described.

    I ran a full pass of the Lenovo diagnostics in addition to memtest86, and found nothing. But it has me at a loss for explanation. The failure is robust against SSD and memory configuration, and appears confined to something both very specific and very general (use of sockets). It also happens regardless of whether I have hyperthreading enabled in the BIOS (and the BIOS is up to date in any event). But with two software changes, one of which should be completely unrelated, the problem appears to reliably go away.

    This is making me nervous; if I can't find an explanation and fix, I'll certainly have to return the machine even if I have to eat the restocking fee.

  8. #8
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,337
    Blog Entries
    15

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by rlk View Post
    So, some more information overnight.

    It appears that if I remove the xf86-video-nouveau package and use the vanilla kernel (4.12.9-1.gf2ab6ba-vanilla), this problem goes away. This seems distinctly odd, and this holds even if I boot to runlevel 3 and never start the X server to begin with (and blacklist the nouveau kernel driver). However, with either nouveau installed or using the default kernel of the same vintage I have the data corruption issue I described.

    I ran a full pass of the Lenovo diagnostics in addition to memtest86, and found nothing. But it has me at a loss for explanation. The failure is robust against SSD and memory configuration, and appears confined to something both very specific and very general (use of sockets). It also happens regardless of whether I have hyperthreading enabled in the BIOS (and the BIOS is up to date in any event). But with two software changes, one of which should be completely unrelated, the problem appears to reliably go away.

    This is making me nervous; if I can't find an explanation and fix, I'll certainly have to return the machine even if I have to eat the restocking fee.
    Hi
    There have been some threads about the nouveau driver, what about the standard 42.3 kernel, can you duplicate? If so, then I would create a bug report;
    openSUSE:Submitting bug reports - openSUSE
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  9. #9
    Join Date
    Jun 2017
    Posts
    29

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    There have been some threads about the nouveau driver, what about the standard 42.3 kernel, can you duplicate? If so, then I would create a bug report;
    openSUSE:Submitting bug reports - openSUSE
    I can reproduce it with the standard 42.3 kernel (either the 4.4-based one or the 4.12-based standard kernel) with or without the nouveau driver being installed. With the vanilla 4.12 kernel, I can't. The vanilla 4.4 kernel does not, I believe, have proper Skylake support so I don't think I can test that.

    I'm going to look for the threads in question, but do you have any links handy?

  10. #10
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,337
    Blog Entries
    15

    Default Re: Lenovo ThinkPad P70, data corruption issues?

    On Wed 30 Aug 2017 05:46:01 PM CDT, rlk wrote:

    malcolmlewis;2836194 Wrote:
    > Hi
    > There have been some threads about the nouveau driver, what about the
    > standard 42.3 kernel, can you duplicate? If so, then I would create a
    > bug report;
    > 'openSUSE:Submitting bug reports - openSUSE'
    > (http://en.opensuse.org/openSUSE:Submitting_bug_reports)


    I can reproduce it with the standard 42.3 kernel (either the 4.4-based
    one or the 4.12-based standard kernel) with or without the nouveau
    driver being installed. With the vanilla 4.12 kernel, I can't. The
    vanilla 4.4 kernel does not, I believe, have proper Skylake support so I
    don't think I can test that.

    I'm going to look for the threads in question, but do you have any links
    handy?


    Hi
    I would create a bug then with all those details and see what happens.
    Also post the bug number back here for others to reference.

    --
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    openSUSE Leap 42.2|GNOME 3.20.2|4.4.79-18.26-default
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!


Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •