[mythtv-users] linux software raid / mdadm UUID problems

Thu Dec 31 16:53:06 UTC 2009

On Thu, Dec 31, 2009 at 7:47 AM, John Drescher <drescherjm at gmail.com> wrote:
>> I upgraded the motherboard in my storage system today, and reinstalled
>> Ubuntu 9.10. There are 2 Linux software raid arrays in the system,
>> /dev/md0 which is comprised of 8 devices, and /dev/md1 which is
>> comprised of 2 devices. I can get mdadm to automatically assemble
>> /dev/md1 on boot, but it won't auto assemble /dev/md0. For some reason
>> (possibly because I had a 4 disk array with some of the same disks
>> years ago), mdadm finds two instances of /dev/md0. One is correctly
>> comprised of /dev/sd[abcdefgh]1, the other is incorrectly comprised of
>> /dev/sd[bc] and two missing devices:
>>
>> mike at storage1:~$ sudo mdadm --examine --scan --verbose
>> ARRAY /dev/md0 level=raid5 num-devices=4
>> UUID=7aa116e9:6de51617:82eda4d1:8807c2ac
>>   devices=/dev/sdc,/dev/sdb
>> ARRAY /dev/md1 level=raid1 num-devices=2
>> UUID=5e4d8185:a0cc963f:796f3e2c:ec945a20
>>   devices=/dev/sde3,/dev/sda3
>> ARRAY /dev/md0 level=raid5 num-devices=8
>> UUID=afdf5b45:8b930e27:66ac6d52:07eb4ce7
>>   spares=1   devices=/dev/sdh1,/dev/sdg1,/dev/sdf1,/dev/sde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
>>
>> I'm not sure why these mis-marked partitions weren't a problem for me before.
>>
>> When I use fdisk to look for partitions marked as Linux raid, I see
>> exactly what I would expect:
>> mike at storage1:~$ sudo fdisk -l |grep auto
>> /dev/sda1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sda3           36743       38913    17438557+  fd  Linux raid autodetect
>> /dev/sdb1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sdc1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sdd1               1       36481   293033601   fd  Linux raid autodetect
>> /dev/sde1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sde3           36743       38913    17438557+  fd  Linux raid autodetect
>> /dev/sdf1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sdg1   *           1       36481   293033601   fd  Linux raid autodetect
>> /dev/sdh1               1       36481   293033601   fd  Linux raid autodetect
>>
>> Note that sd[ae]3 are part of /dev/md1
>>
>> I can manually start the array with:
>> sudo mdadm --assemble --uuid=afdf5b45:8b930e27:66ac6d52:07eb4ce7
>> /dev/md0 , so I know the array is intact.
>>
>
> How about cat /proc/mdstat?
>
>> How do I get rid of the erroneous /dev/md0?  It looks like I may be
>> able to use the --zero-superblock option on /dev/sd[bc], but I value
>> the data on the array, and am wary of doing anything that could
>> jeopardize the data on /dev/sd[bc]1.
>>
>
> At work and home I manage dozens of mdadm arrays over the last 5 to 7
> years and I have never had to zero a superblock. I have moved entire
> arrays from machine to machine, replaced individual drives,
> controllers and grown arrays.
>
> Did you edit your /etc/mdadm.conf? Do you even have one?
>
> John
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>

John,

Thanks for the reply.

I looked at my mdadm.conf file which was automatically generated. It
contained both the valid and invalid descriptions of md0. I removed
the invalid one and rebooted, but the automatic raid detection still
finds the invalid array. The mdadm.conf file was not modified to
reflect this, but sudo mdadm --examine --scan --verbose   gives me the
same invalid results as before.

As for mdstat, this is what I get when I first boot the machine:

mike at storage1:~$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md1 : active raid1 sde3[0] sda3[1]
      17438464 blocks [2/2] [UU]

md0 : inactive sdb[3](S) sdc[0](S)
      586072192 blocks

It shows the invalid md0 as inactive.  I can start the correct md0
array by doing:

$ sudo mdadm --stop /dev/md0
$ sudo mdadm --assemble --uuid=afdf5b45:8b930e27:66ac6d52:07eb4ce7 /dev/md0
mdadm: SET_ARRAY_INFO failed for /dev/md0: Device or resource busy

The strange thing here is that I get the message above even though the
lvm volume within the array shows up in /dev/mapper, and I can mount
the xfs partition on that volume.

One thread I found on another board
(http://www.mail-archive.com/debian-user@lists.debian.org/msg532311.html)
indicates that I should be able to zero the superblocks on sdb and
sdc. Again, I'm a little wary of doing this because I don't really
understand superblocks or partitios that well. Where is the superblock
stored? If I have sdb and sdb1 where sdb1 is the only partition on the
disk and is as large as it can possibly be, how many blocks are there
for sdb itself? I though only a single 512 byte block, and I didn't
think a superblock was stored there. Am I way off on that?

I suppose I could manually get the correct array started, fail sdb1,
remove it, zero the entire disk with dd, recreate the partition, add
it back to the array, allow for rebuild, and then repeate with the
other offending disk. However, I'm not sure I want to lose the
redundancy during the day or so period of high disk activity needed
for the resync.

Regards,
Mike