Difference between revisions of "LaCie 5big Network 2"
m (→Custom capsule) |
|||
(19 intermediate revisions by the same user not shown) | |||
Line 36: | Line 36: | ||
===Custom capsule=== |
===Custom capsule=== |
||
I created [http://lacie.nas-central.org/wiki/Category:2big_Network_2#Custom_capsule a custom capsule], a custom firmware which is very easy to do thanks to the provided script. |
I created [http://lacie.nas-central.org/wiki/Category:2big_Network_2#Custom_capsule a custom capsule], a custom firmware which is very easy to do thanks to the provided script. |
||
+ | <br>'''update:''' the website seems to have disappeared, see the original [https://web.archive.org/web/20190308170346/http://lacie.nas-central.org/wiki/Category:2big_Network_2#Custom_capsule page] and [https://web.archive.org/web/20150904135824/http://downloads.lacie.nas-central.org/Users/andreus/scripts/lacieos/create_capsule.sh_v7.txt script] on Archive.org. |
||
<br>Not much to say, just execute the script and answer to a few questions. |
<br>Not much to say, just execute the script and answer to a few questions. |
||
− | <br>Then to flash it, you can use the LaCieNetworkAssistant or |
+ | <br>Then to flash it, you can use the LaCieNetworkAssistant or this method which I prefer, less dependent on network operations: |
+ | * In the dashboard, create a share named "Share" |
||
− | <br>Note that if you use LaCieNetworkAssistant and it fails updating the fw, they tell to disable IPConf support. To do that on linux, once the assistant is launched, right click on its icon in the task bar => preferences. |
||
+ | * Create a folder in that share named "Update" |
||
+ | * Drop the capsule file into the share Share/Update |
||
+ | * Reboot the NAS |
||
+ | Note that if you use LaCieNetworkAssistant and it fails updating the fw, they tell to disable IPConf support. To do that on linux, once the assistant is launched, right click on its icon in the task bar => preferences. |
||
<br>Example: |
<br>Example: |
||
New capsule built: '/home/phil/Downloads/lacie/capsule/lacie/5bignetwork2_2.2.8.1.capsule' |
New capsule built: '/home/phil/Downloads/lacie/capsule/lacie/5bignetwork2_2.2.8.1.capsule' |
||
Line 48: | Line 53: | ||
- Lacie Dashboard: http://YOUR.NETWORK.EXTERNAL.IP/ |
- Lacie Dashboard: http://YOUR.NETWORK.EXTERNAL.IP/ |
||
- Transmission Web Interface: http://YOUR.NETWORK.EXTERNAL.IP:9091/transmission/web/ |
- Transmission Web Interface: http://YOUR.NETWORK.EXTERNAL.IP:9091/transmission/web/ |
||
+ | The script has been reported to work with capsule 2.2.10.1 producing a rooted capsule 2.2.10.1.1 |
||
===Authentication=== |
===Authentication=== |
||
Line 53: | Line 59: | ||
<br>This step is automated when you create a custom capsule, see [http://downloads.lacie.nas-central.org/Users/andreus/scripts/lacieos/sshd.i.txt sshd.i.txt] |
<br>This step is automated when you create a custom capsule, see [http://downloads.lacie.nas-central.org/Users/andreus/scripts/lacieos/sshd.i.txt sshd.i.txt] |
||
<br>SSH runs on port 2222 and you can make use of /root/ssh/authorised_keys as usual |
<br>SSH runs on port 2222 and you can make use of /root/ssh/authorised_keys as usual |
||
+ | <br>Note that direct edition of user.py may be reverted back by a firmware update unless you customize directly the new capsule. |
||
+ | <br>Note that some (all?) firmwares have an extra user called "partner" with the same rights as root. |
||
+ | <br>Default passwords for root and partner are unknown AFAIK. |
||
+ | <br>Their respective md5crypt hashes are: |
||
+ | $1$$1RDUuTsVHjre9juUvuICX. |
||
+ | $1$AhmQ/2rZ$1cYuUexBvzYmM.Zk4R/6y. |
||
+ | We can remove partner account by editing /usr/lib/python2.6/site-packages/unicorn/authentication/local/user.py (search partner) and deluser partner |
||
+ | or changing ExecMode in /usr/lib/python2.6/site-packages/exec_mode/exec_mode.py? |
||
==Misc== |
==Misc== |
||
Line 87: | Line 101: | ||
===Reset=== |
===Reset=== |
||
See [http://manuals.lacie.com/en/manuals/d2-2big-5big-network2/reset#reset-to-factory LaCie website], it's possible to reset default fw, with or without data loss |
See [http://manuals.lacie.com/en/manuals/d2-2big-5big-network2/reset#reset-to-factory LaCie website], it's possible to reset default fw, with or without data loss |
||
+ | <br>Reset without data loss will move all data into /Share and will make it accessible only to the admin. |
||
+ | <br>Admin password gets reset in the process. |
||
+ | |||
+ | ====Reset Without Data Loss==== |
||
+ | Caution: Following these steps will erase all Dashboard data such as users, groups, shares, and settings. It will also reset the machine name to the default and reset the network settings to DHCP. |
||
+ | <br>Caution: After following these steps, all files will be moved to a folder called Recovery in Share and so by default will be available only to the administrator. |
||
+ | * Make sure the product is turned OFF. |
||
+ | * Press and hold down the front button. Without releasing the front button, turn the product on by pressing the power switch on the rear of the product. |
||
+ | * Keep the front button pressed until the front LED becomes solid red, then release it. (The LED should become solid red about 10 seconds after you turn on the product.) |
||
+ | * The front LED will blink blue. When it becomes static blue, press the front button once within 5 seconds to confirm reset. |
||
+ | If any of the steps are omitted, the product will boot normally without resetting. |
||
+ | |||
===Transmission=== |
===Transmission=== |
||
I restored a vanilla transmission via the custom capsule to get the web interface. |
I restored a vanilla transmission via the custom capsule to get the web interface. |
||
Line 108: | Line 134: | ||
apt-get install transgui |
apt-get install transgui |
||
And edit the other settings, amongst others the download-dir to some /shares/... |
And edit the other settings, amongst others the download-dir to some /shares/... |
||
+ | ===HTTP server=== |
||
+ | There is already a HTTP server running on ports 80 and 443. |
||
+ | <br>To add one: |
||
+ | |||
+ | /etc/lighttpd/lighttpd-dune.conf |
||
+ | <pre> |
||
+ | server.modules = ( |
||
+ | "mod_expire", |
||
+ | "mod_compress", |
||
+ | "mod_rewrite", |
||
+ | "mod_setenv", |
||
+ | ) |
||
+ | |||
+ | server.document-root = "/shares/Share/@dune/@yamj/Jukebox" |
||
+ | server.port = 8000 |
||
+ | server.errorlog = "/var/log/lighttpd-dune-error.log" |
||
+ | server.pid-file = "/var/run/lighttpd-dune.pid" |
||
+ | server.upload-dirs = ( "/lacie/tmp" ) |
||
+ | |||
+ | compress.allowed-encodings = ("gzip", "deflate") |
||
+ | compress.cache-dir = "/var/cache/lighttpd-dune" |
||
+ | compress.filetype = ("text/plain", "text/html", "text/javascript", "text/css", "text/xml") |
||
+ | |||
+ | index-file.names = ( |
||
+ | "index.html", "index.php" |
||
+ | ) |
||
+ | |||
+ | |||
+ | $HTTP["url"] =~ "index\.html"{ |
||
+ | setenv.add-response-header = ( "Cache-Control" => "no-cache, no-store" ) |
||
+ | } |
||
+ | |||
+ | $HTTP["url"] =~ "(gif|png|jpg|css)$"{ |
||
+ | expire.url = ( "" => "access 1 months" ) |
||
+ | setenv.add-response-header = ( "Cache-Control" => "public" ) |
||
+ | } |
||
+ | |||
+ | mimetype.assign = ( |
||
+ | ".gif" => "image/gif", |
||
+ | ".jpg" => "image/jpeg", |
||
+ | ".jpeg" => "image/jpeg", |
||
+ | ".png" => "image/png", |
||
+ | ".css" => "text/css", |
||
+ | ".html" => "text/html", |
||
+ | ".htm" => "text/html", |
||
+ | ".js" => "text/javascript", |
||
+ | # default mime type |
||
+ | "" => "application/octet-stream", |
||
+ | ) |
||
+ | </pre> |
||
+ | /etc/initng/httpdune.i |
||
+ | <pre> |
||
+ | #!/sbin/itype |
||
+ | # This is a i file, used by initng parsed by install_service |
||
+ | |||
+ | daemon httpdune { |
||
+ | need = virtual/net; |
||
+ | exec daemon = /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd-dune.conf; |
||
+ | respawn; |
||
+ | } |
||
+ | </pre> |
||
+ | ngc --start httpdune |
||
+ | Install service: edit /etc/initng/runlevel/default.runlevel and add httpdune |
||
+ | |||
==Two faulty disks on a 5-disk RAID5== |
==Two faulty disks on a 5-disk RAID5== |
||
+ | ===Situation=== |
||
That's pretty shitty. |
That's pretty shitty. |
||
<br>One drive failed and the box sent me an email to tell the array was degraded. |
<br>One drive failed and the box sent me an email to tell the array was degraded. |
||
Line 191: | Line 282: | ||
</pre> |
</pre> |
||
So /dev/sdc2 had dropped and is out-of-sync. |
So /dev/sdc2 had dropped and is out-of-sync. |
||
+ | ===Getting data back=== |
||
− | |||
Before messing up with it, better to shut down the Transmission server: |
Before messing up with it, better to shut down the Transmission server: |
||
ngc --stop transmission |
ngc --stop transmission |
||
Line 227: | Line 318: | ||
mdadm --assemble --force /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 |
mdadm --assemble --force /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 |
||
</pre> |
</pre> |
||
+ | ===Fixing with ddrescue=== |
||
At this point I could save a number of data but once I try to access files mapped on the faulty area, the array collapsed again. |
At this point I could save a number of data but once I try to access files mapped on the faulty area, the array collapsed again. |
||
<br>So I tried a different approach: |
<br>So I tried a different approach: |
||
Line 261: | Line 353: | ||
</pre> |
</pre> |
||
Not that bad. |
Not that bad. |
||
+ | # mdadm --assemble /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdd2 /dev/sde2 |
||
+ | mdadm: /dev/md4 has been started with 4 drives (out of 5). |
||
+ | And now disk sdc can be removed and replaced. |
||
+ | <br>Hot remove should be ok but let's do it cleanly: |
||
+ | <pre> |
||
+ | # cat /proc/mdstat |
||
+ | Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] |
||
+ | md4 : active raid5 sda2[0] sde2[5] sdd2[2] sdb2[1] |
||
+ | 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/4] [UUU_U] |
||
+ | md3 : active raid1 sda5[0] sde5[4] sdd5[3] sdc5[2] sdb5[1] |
||
+ | 255936 blocks [5/5] [UUUUU] |
||
+ | md2 : active raid1 sda9[0] sde9[4] sdd9[3] sdc9[2] sdb9[1] |
||
+ | 875456 blocks [5/5] [UUUUU] |
||
+ | md1 : active raid1 sda8[0] sde8[4] sdd8[3] sdc8[2] sdb8[1] |
||
+ | 843328 blocks [5/5] [UUUUU] |
||
+ | md0 : active raid1 sde7[3] sdd7[4] sdc7[2] sdb7[1] sda7[0] |
||
+ | 16000 blocks [5/5] [UUUUU] |
||
+ | # mdadm /dev/md0 --fail /dev/sdc7 |
||
+ | mdadm: set /dev/sdc7 faulty in /dev/md0 |
||
+ | # mdadm /dev/md0 --remove /dev/sdc7 |
||
+ | mdadm: hot removed /dev/sdc7 from /dev/md0 |
||
+ | # mdadm /dev/md1 --fail /dev/sdc8 |
||
+ | mdadm: set /dev/sdc8 faulty in /dev/md1 |
||
+ | # mdadm /dev/md1 --remove /dev/sdc8 |
||
+ | mdadm: hot removed /dev/sdc8 from /dev/md1 |
||
+ | # mdadm /dev/md2 --fail /dev/sdc9 |
||
+ | mdadm: set /dev/sdc9 faulty in /dev/md2 |
||
+ | # mdadm /dev/md2 --remove /dev/sdc9 |
||
+ | mdadm: hot removed /dev/sdc9 from /dev/md2 |
||
+ | # mdadm /dev/md3 --fail /dev/sdc5 |
||
+ | mdadm: set /dev/sdc5 faulty in /dev/md3 |
||
+ | # mdadm /dev/md3 --remove /dev/sdc5 |
||
+ | mdadm: hot removed /dev/sdc5 from /dev/md3 |
||
+ | # cat /proc/mdstat |
||
+ | Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] |
||
+ | md4 : active raid5 sda2[0] sde2[5] sdd2[2] sdb2[1] |
||
+ | 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/4] [UUU_U] |
||
+ | md3 : active raid1 sda5[0] sde5[4] sdd5[3] sdb5[1] |
||
+ | 255936 blocks [5/4] [UU_UU] |
||
+ | md2 : active raid1 sda9[0] sde9[4] sdd9[3] sdb9[1] |
||
+ | 875456 blocks [5/4] [UU_UU] |
||
+ | md1 : active raid1 sda8[0] sde8[4] sdd8[3] sdb8[1] |
||
+ | 843328 blocks [5/4] [UU_UU] |
||
+ | md0 : active raid1 sde7[3] sdd7[4] sdb7[1] sda7[0] |
||
+ | 16000 blocks [5/4] [UU_UU] |
||
+ | </pre> |
||
+ | Then physically remove sdc. |
||
+ | |||
+ | Checking the remaining drives thoroughly: |
||
+ | # smartctl -t long /dev/sda |
||
+ | # smartctl -t long /dev/sdb |
||
+ | # smartctl -t long /dev/sdd |
||
+ | # smartctl -t long /dev/sde |
||
+ | Five hours later... |
||
+ | # smartctl -l xselftest /dev/sda |
||
+ | # smartctl -l xselftest /dev/sdb |
||
+ | # smartctl -l xselftest /dev/sdd |
||
+ | # smartctl -l xselftest /dev/sde |
||
+ | They all say sth like |
||
+ | smartctl 5.40 2011-04-07 r5807 [arm-unknown-linux-gnueabi] (local build) |
||
+ | Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net |
||
+ | === START OF READ SMART DATA SECTION === |
||
+ | General Purpose Logging (GPL) feature set supported |
||
+ | SMART Extended Self-test Log Version: 1 (1 sectors) |
||
+ | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
||
+ | # 1 Extended offline Completed without error 00% 20001 - |
Latest revision as of 15:27, 13 October 2019
Lacie 5big network 2 10To
- CPU armv5tel (Feroceon 88FR131 rev 1 (v5l))
- RAM 512mb
- Flash ROM
- NIC 2x Marvell Ethernet Gigabit Ethernet 10/100/1000 Base-TX
- USB
- internal HDD supports BASIC (1 drive), RAID 0 or 1 (2 drives), RAID 0 or 5 (3 drives), RAID 0, 5 or 6 (4 or 5 drives)
- SATA Controller
- Drive Capacity 0, 5, 10, and 15 TB capacities available
- Fan Ultra-quiet cooling system with self-stabilizing oil-pressure bearing technology
- Initial firmware v2.0.5, upgraded to v2.2.8
cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid5 sda2[0] sde2[4] sdd2[3] sdc2[2] sdb2[1] 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] md3 : active raid1 sda5[0] sde5[4] sdd5[3] sdc5[2] sdb5[1] 255936 blocks [5/5] [UUUUU] md2 : active raid1 sda9[0] sde9[4] sdd9[3] sdc9[2] sdb9[1] 875456 blocks [5/5] [UUUUU] md1 : active raid1 sda8[0] sde8[4] sdd8[3] sdc8[2] sdb8[1] 843328 blocks [5/5] [UUUUU] md0 : active raid1 sde7[3] sdd7[4] sdc7[2] sdb7[1] sda7[0] 16000 blocks [5/5] [UUUUU]
Links
Unofficial wiki
Most of the info here comes from that wiki
- Section for Lacie 2big network 2
- Section for Lacie 5big network 2 contains much less info but most info of the 2big are applicable to the 5big
Root
Custom capsule
I created a custom capsule, a custom firmware which is very easy to do thanks to the provided script.
update: the website seems to have disappeared, see the original page and script on Archive.org.
Not much to say, just execute the script and answer to a few questions.
Then to flash it, you can use the LaCieNetworkAssistant or this method which I prefer, less dependent on network operations:
- In the dashboard, create a share named "Share"
- Create a folder in that share named "Update"
- Drop the capsule file into the share Share/Update
- Reboot the NAS
Note that if you use LaCieNetworkAssistant and it fails updating the fw, they tell to disable IPConf support. To do that on linux, once the assistant is launched, right click on its icon in the task bar => preferences.
Example:
New capsule built: '/home/phil/Downloads/lacie/capsule/lacie/5bignetwork2_2.2.8.1.capsule' After upgrading you can: - use SSH/SFTP with root privileges with your Lacie NAS with login 'root' (login: root | password: [same as admin password] | Port: 2222) - use Transmission Web Interface (url: http://YOUR.LACIE.NAS.IP:9091/transmission/web/) Don't forget to change the transmission download path. - access, after configuring port forwarding in your router, from a external network: - Lacie Dashboard: http://YOUR.NETWORK.EXTERNAL.IP/ - Transmission Web Interface: http://YOUR.NETWORK.EXTERNAL.IP:9091/transmission/web/
The script has been reported to work with capsule 2.2.10.1 producing a rooted capsule 2.2.10.1.1
Authentication
root password can be permanently changed by editing /usr/lib/python2.6/site-packages/unicorn/authentication/local/user.py & looking for 'root:$1...'
This step is automated when you create a custom capsule, see sshd.i.txt
SSH runs on port 2222 and you can make use of /root/ssh/authorised_keys as usual
Note that direct edition of user.py may be reverted back by a firmware update unless you customize directly the new capsule.
Note that some (all?) firmwares have an extra user called "partner" with the same rights as root.
Default passwords for root and partner are unknown AFAIK.
Their respective md5crypt hashes are:
$1$$1RDUuTsVHjre9juUvuICX. $1$AhmQ/2rZ$1cYuUexBvzYmM.Zk4R/6y.
We can remove partner account by editing /usr/lib/python2.6/site-packages/unicorn/authentication/local/user.py (search partner) and deluser partner or changing ExecMode in /usr/lib/python2.6/site-packages/exec_mode/exec_mode.py?
Misc
Serial port
There is probably a serial port on board giving access to uboot console
New disks
In case we need to play with new disks, keep in mind their UUID must be injected in th eEEPROM, see here
LaCieNetworkAssistant
These are the tools provided on the cdrom. A linux version comes, even with several packagings: auto extractible, tar.gz, rpm and deb
But they are only for i386 and won't work as such on an amd64 system.
Note that I think we can live without them, everything can be done via the web interface and the shared drive.
To try with the deb, we've to force a bit the things (probably it would be better to repackage it)
sudo dpkg -i --force-architecture --force-depends /media/cdrom/Linux/LaCieNetworkAssistant-1.4.1-Linux.deb
It provides a few binaries:
/usr/bin/LaCieNetworkAssistant /usr/bin/zsudo /usr/bin/tarTine /usr/bin/LCtftpd
Installing the existing ia32 libraries is not enough, some are missing:
$ ldd LaCieNetworkAssistant |grep "not found" libsmbclient.so.0 => not found
To solve it, you can download the i386 version and copy libsmbclient.so.0 to /usr/lib32
But this one has its own dependencies:
$ ldd libsmbclient.so.0 |grep "not found" libtalloc.so.2 => not found libwbclient.so.0 => not found
So, same thing, download & copy libsmbclient.so.0 libtalloc.so.2 libtalloc.so.2.0.7 libwbclient.so.0 to /usr/lib32
I got also an error linked to libtdb1 which is in the is32-libs, so again, get it & cp libtdb.so.1 libtdb.so.1.2.9 /usr/lib32
And now:
export GTK_PATH='/usr/lib32/gtk-2.0' LaCieNetworkAssistant
Reset
See LaCie website, it's possible to reset default fw, with or without data loss
Reset without data loss will move all data into /Share and will make it accessible only to the admin.
Admin password gets reset in the process.
Reset Without Data Loss
Caution: Following these steps will erase all Dashboard data such as users, groups, shares, and settings. It will also reset the machine name to the default and reset the network settings to DHCP.
Caution: After following these steps, all files will be moved to a folder called Recovery in Share and so by default will be available only to the administrator.
- Make sure the product is turned OFF.
- Press and hold down the front button. Without releasing the front button, turn the product on by pressing the power switch on the rear of the product.
- Keep the front button pressed until the front LED becomes solid red, then release it. (The LED should become solid red about 10 seconds after you turn on the product.)
- The front LED will blink blue. When it becomes static blue, press the front button once within 5 seconds to confirm reset.
If any of the steps are omitted, the product will boot normally without resetting.
Transmission
I restored a vanilla transmission via the custom capsule to get the web interface.
NEVER launch or stop transmission daemon via the LaCie web nterface, it would restore settings.json to its defaults. Note that it might be permanently changed by mangling /etc/initng/transmission.i and /usr/lib/python2.6/site-packages/unicorn/download/torrent.py
There should be a way to disable the LaCie interface, see /usr/lib/unicorn/webapp2/controller/download.py /usr/lib/unicorn/updaterapp/modules/download.py /usr/lib/unicorn/webapp/modules/neko/download.py and /usr/lib/unicorn/unicorn.conf
Once the web interface is active, you can also activate the remote control interface:
- Stop the daemon
ngc --stop transmission
- Edit /lacie/torrent_dir/transmission/settings.json
"rpc-enabled": true, "rpc-password": "your_password", # note that it will be encrypted next time automatically "rpc-port": 9091, "rpc-username": "your_name", "rpc-whitelist-enabled": "false", "rpc-authentication-required": "true",
Options are explained here
ngc --start transmission
Now you can use a remote client:
apt-get install transgui
And edit the other settings, amongst others the download-dir to some /shares/...
HTTP server
There is already a HTTP server running on ports 80 and 443.
To add one:
/etc/lighttpd/lighttpd-dune.conf
server.modules = ( "mod_expire", "mod_compress", "mod_rewrite", "mod_setenv", ) server.document-root = "/shares/Share/@dune/@yamj/Jukebox" server.port = 8000 server.errorlog = "/var/log/lighttpd-dune-error.log" server.pid-file = "/var/run/lighttpd-dune.pid" server.upload-dirs = ( "/lacie/tmp" ) compress.allowed-encodings = ("gzip", "deflate") compress.cache-dir = "/var/cache/lighttpd-dune" compress.filetype = ("text/plain", "text/html", "text/javascript", "text/css", "text/xml") index-file.names = ( "index.html", "index.php" ) $HTTP["url"] =~ "index\.html"{ setenv.add-response-header = ( "Cache-Control" => "no-cache, no-store" ) } $HTTP["url"] =~ "(gif|png|jpg|css)$"{ expire.url = ( "" => "access 1 months" ) setenv.add-response-header = ( "Cache-Control" => "public" ) } mimetype.assign = ( ".gif" => "image/gif", ".jpg" => "image/jpeg", ".jpeg" => "image/jpeg", ".png" => "image/png", ".css" => "text/css", ".html" => "text/html", ".htm" => "text/html", ".js" => "text/javascript", # default mime type "" => "application/octet-stream", )
/etc/initng/httpdune.i
#!/sbin/itype # This is a i file, used by initng parsed by install_service daemon httpdune { need = virtual/net; exec daemon = /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd-dune.conf; respawn; }
ngc --start httpdune
Install service: edit /etc/initng/runlevel/default.runlevel and add httpdune
Two faulty disks on a 5-disk RAID5
Situation
That's pretty shitty.
One drive failed and the box sent me an email to tell the array was degraded.
That's the second time it happens while the LaCie 5big is barely one year old.
So I wrote again to the support and they sent me a new drive upfront, good.
But during the addition of the new drive and the reconstruction, this implied obviously a thorough read of all the other drives and... yet another drive gave hardware read errors and the array collapsed completely.
I got a laconic email telling "array is inactive" and on the web interface all data seemed to have disappeared.
Hopefully I had rooted my box so I could ssh and look in the logs.
/var/log/messages looked like this (excerpts):
ata1.15: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x6 ata1.15: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable ata1.01: status: { DRDY ERR } ata1.01: error: { UNC } end_request: I/O error, dev sdc, sector 2107042224 raid5:md4: read error not correctable (sector 2102993328 on sdc2). raid5: Disk failure on sdc2, disabling device. raid5: Operation continuing on 3 devices. I/O error in filesystem ("md4") meta-data dev md4 block 0x0 ("xfs_unmountfs_writesb") error 5 buf count 4096 I/O error in filesystem ("md4") meta-data dev md4 block 0x1d171d2b8 ("xlog_iodone") error 5 buf count 4096 Filesystem "md4": Log I/O Error Detected. Shutting down filesystem: md4 LaCie-5big hald: unmounted /dev/md4 from '/media/internal_1' on behalf of uid 0
Structure of the array is the following:
[root@LaCie-5big /]# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid1 sdd5[3] sda5[0] sde5[4] sdc5[2] sdb5[1] 255936 blocks [5/5] [UUUUU] md2 : active raid1 sdd9[3] sda9[0] sde9[4] sdc9[2] sdb9[1] 875456 blocks [5/5] [UUUUU] md1 : active raid1 sdd8[3] sda8[0] sde8[4] sdc8[2] sdb8[1] 843328 blocks [5/5] [UUUUU] md0 : active raid1 sdd7[4] sde7[3] sdc7[2] sdb7[1] sda7[0] 16000 blocks [5/5] [UUUUU]
And /dev/md4 is missing, normally constructed from /dev/sd[abcde]2
A page I found with some useful tips: https://raid.wiki.kernel.org/index.php/RAID_Recovery
Getting some more info:
mdadm --examine /dev/sda2 >> raid_sdx2.status mdadm --examine /dev/sdb2 >> raid_sdx2.status mdadm --examine /dev/sdc2 >> raid_sdx2.status mdadm --examine /dev/sdd2 >> raid_sdx2.status mdadm --examine /dev/sde2 >> raid_sdx2.status $ cat raid_sdx2.status |egrep 'Event|/dev/sd' /dev/sda2: Events : 1306184 /dev/sdb2: Events : 1306184 /dev/sdc2: Events : 1306177 /dev/sdd2: Events : 1306184 /dev/sde2: Events : 1306184 [root@LaCie-5big ~]# cat raid_sdx2.status |grep Role Device Role : Active device 0 Device Role : Active device 1 Device Role : Active device 2 Device Role : spare Device Role : Active device 4 [root@LaCie-5big ~]# cat raid_sdx2.status |grep State State : clean Array State : AA..A ('A' == active, '.' == missing) State : clean Array State : AA..A ('A' == active, '.' == missing) State : clean Array State : AAAAA ('A' == active, '.' == missing) State : clean Array State : AA..A ('A' == active, '.' == missing) State : clean Array State : AA..A ('A' == active, '.' == missing)
So /dev/sdc2 had dropped and is out-of-sync.
Getting data back
Before messing up with it, better to shut down the Transmission server:
ngc --stop transmission
As a first attempt I tried to force reassembling the array:
mdadm --assemble --force /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 mdadm: forcing event count in /dev/sdc2(2) from 1306177 upto 1306184 mdadm: clearing FAULTY flag for device 2 in /dev/md4 for /dev/sdc2 mdadm: /dev/md4 has been started with 4 drives (out of 5) and 1 spare.
As soon as the array appears again, the box mounts the corresponding shares which become accessible again.
It also starts trying to resync the new drive (/dev/sdd2) and... it crashes again after a few hours when it hits the hw errors on /dev/sdc2
[root@LaCie-5big ~]# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid5 sda2[0] sdd2[6] sde2[5] sdc2[2] sdb2[1] 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/4] [UUU_U] [>....................] recovery = 0.0% (306560/1951489024) finish=30321.4min speed=1072K/sec
So better to start the array without the new drive, so at least we've a chance to save as much data as possible:
mdadm --stop /dev/md4 mdadm --assemble --force /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2
If the array refused to be stopped, that's because it's in use:
You need to unmount any share mounted over the network, including the one by the media box
tango3[~]# umount /tmp/mnt/smb/0 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 mdadm --stop /dev/md4 mdadm --assemble --force /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2
Fixing with ddrescue
At this point I could save a number of data but once I try to access files mapped on the faulty area, the array collapsed again.
So I tried a different approach:
Stop the array.
tango3[~]# umount /tmp/mnt/smb/0 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 umount /dev/md4 mdadm --stop /dev/md4
Run ddrescue to copy /dev/sdc2 (the faulty) to /dev/sdd2 (the new)
But ddrescue is not available on the box, neither screen which would be useful...
So I took them from Debian squeeze (oldstable).
Newer versions require newer libc & libstdc++6.
Extract the bins and drop them in the box. I copied also screenrc to /etc/, not sure if it's needed or not.
If the library is left in the current directory, calling screen has to be done as
LD_LIBRARY_PATH=. ./screen
And now we can call ddrescue:
./ddrescue -d /dev/sdc2 /dev/sdd2 /root/ddrescue.log
With the logfile it can be interrupted and restarted from where it was left.
Current status rescued: 1998 GB, errsize: 13824 B, current rate: 0 B/s ipos: 1082 GB, errors: 8, average rate: 39354 kB/s opos: 1082 GB, time from last successful read: 3.5 m Finished
Not that bad.
# mdadm --assemble /dev/md4 /dev/sda2 /dev/sdb2 /dev/sdd2 /dev/sde2 mdadm: /dev/md4 has been started with 4 drives (out of 5).
And now disk sdc can be removed and replaced.
Hot remove should be ok but let's do it cleanly:
# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid5 sda2[0] sde2[5] sdd2[2] sdb2[1] 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/4] [UUU_U] md3 : active raid1 sda5[0] sde5[4] sdd5[3] sdc5[2] sdb5[1] 255936 blocks [5/5] [UUUUU] md2 : active raid1 sda9[0] sde9[4] sdd9[3] sdc9[2] sdb9[1] 875456 blocks [5/5] [UUUUU] md1 : active raid1 sda8[0] sde8[4] sdd8[3] sdc8[2] sdb8[1] 843328 blocks [5/5] [UUUUU] md0 : active raid1 sde7[3] sdd7[4] sdc7[2] sdb7[1] sda7[0] 16000 blocks [5/5] [UUUUU] # mdadm /dev/md0 --fail /dev/sdc7 mdadm: set /dev/sdc7 faulty in /dev/md0 # mdadm /dev/md0 --remove /dev/sdc7 mdadm: hot removed /dev/sdc7 from /dev/md0 # mdadm /dev/md1 --fail /dev/sdc8 mdadm: set /dev/sdc8 faulty in /dev/md1 # mdadm /dev/md1 --remove /dev/sdc8 mdadm: hot removed /dev/sdc8 from /dev/md1 # mdadm /dev/md2 --fail /dev/sdc9 mdadm: set /dev/sdc9 faulty in /dev/md2 # mdadm /dev/md2 --remove /dev/sdc9 mdadm: hot removed /dev/sdc9 from /dev/md2 # mdadm /dev/md3 --fail /dev/sdc5 mdadm: set /dev/sdc5 faulty in /dev/md3 # mdadm /dev/md3 --remove /dev/sdc5 mdadm: hot removed /dev/sdc5 from /dev/md3 # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid5 sda2[0] sde2[5] sdd2[2] sdb2[1] 7805956096 blocks super 1.0 level 5, 512k chunk, algorithm 2 [5/4] [UUU_U] md3 : active raid1 sda5[0] sde5[4] sdd5[3] sdb5[1] 255936 blocks [5/4] [UU_UU] md2 : active raid1 sda9[0] sde9[4] sdd9[3] sdb9[1] 875456 blocks [5/4] [UU_UU] md1 : active raid1 sda8[0] sde8[4] sdd8[3] sdb8[1] 843328 blocks [5/4] [UU_UU] md0 : active raid1 sde7[3] sdd7[4] sdb7[1] sda7[0] 16000 blocks [5/4] [UU_UU]
Then physically remove sdc.
Checking the remaining drives thoroughly:
# smartctl -t long /dev/sda # smartctl -t long /dev/sdb # smartctl -t long /dev/sdd # smartctl -t long /dev/sde
Five hours later...
# smartctl -l xselftest /dev/sda # smartctl -l xselftest /dev/sdb # smartctl -l xselftest /dev/sdd # smartctl -l xselftest /dev/sde
They all say sth like
smartctl 5.40 2011-04-07 r5807 [arm-unknown-linux-gnueabi] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === General Purpose Logging (GPL) feature set supported SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 20001 -