TransWikia.com

My production server has bad blocks, now what?

Server Fault Asked on December 11, 2021

My server has crashed for Nth time in a couple of months, so I decided to do a badblocks test. I have used fsck to detect and mark badblocks, and it has indeed detected some. If I am correct, this means the filesystem will not use those blocks anymore to store data.

But, what happens to the data which was already there? Has it been moved? It was probably corrupted to begin with, so probably the files which were using those blocks are broken. Now I have several open questions:

  1. can I detect which files have been affected?
  2. how can I check if those files are corrupted or not after fsck?
  3. is there any way to tell my distribution (Ubuntu 14.04) to “reinstall all packages, as they are cached in the system”? (that is, no upgrades, just re-installation of the current versions, without overwriting any configuration files)

Note: for completeness I paste here a result of the fsck:

root@rescue:~# fsck -vcck /dev/sda2
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
/dev/sda2: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes

Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 8: 119060233 119060234 119060592 119060615 119060616 119060617 119060618 119060619 119060620 119060621 119060623 119060624 119060625 119060626 119060632 119060633 119060635 119060636 119060637 119060638 119060639 119061755
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 0 inodes containing multiply-claimed blocks.)

File <The journal inode> (inode #8, mod time Mon May  5 14:17:18 2014) 
  has 22 multiply-claimed block(s), shared with 1 file(s):
        <The bad blocks inode> (inode #1, mod time Thu Aug  7 19:11:37 2014)
Clone multiply-claimed blocks<y>? yes
Error reading block 119060233 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060234 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060592 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060615 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060616 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060617 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060618 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060619 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060620 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060621 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060623 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060624 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060625 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060626 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060632 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060633 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060635 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060636 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060637 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060638 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060639 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119061755 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (23499, counted=23477).
Fix<y>? yes
Free blocks count wrong for group #2016 (23956, counted=23961).
Fix<y>? yes
Free blocks count wrong for group #3633 (65514, counted=0).
Fix<y>? yes
Free blocks count wrong (231534163, counted=231534168).
Fix<y>? yes

/dev/sda2: ***** FILE SYSTEM WAS MODIFIED *****

      154609 inodes used (0.26%, out of 59736064)
          47 non-contiguous files (0.0%)
           9 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 154209/10
     7404456 blocks used (3.10%, out of 238938624)
          99 bad blocks
           2 large files

      126167 regular files
       27996 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
         437 symbolic links (382 fast symbolic links)
           0 sockets
------------
      154600 files

One Answer

First, take a look at the Bad Block HOWTO for smartmontools:

https://www.smartmontools.org/wiki/BadBlockHowto

Second, if you don't already have it, time to implement a working backup strategy.

If you need a certain availability of your server, you might also want to consider implementing a RAID-1, mirroring.

And either way it's time to get rid of the old hard disk drive and get a new one. It failed you already enough times in the past, so it is quite certain this is not going to be better in the near future and beyond.

Answered by Marc Stürmer on December 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP