Ya know, you’d think there would never be a situation when anybody would accidentally enter the following 2 commands in order:
# cd /
# rm -rf *
That’s what I thought, too, until I just did it today. Below is how that happened, plus all the gory details and lessons learned of rebuilding a Pogo so it provides a friendly, working environment for the house monitor stuff (plus a little web server), and how to back up the whole thumb drive for quick repair in case disaster strikes. The details are there in case I ever have to build it up from the ground again.
Much of the history of getting my home monitor system up on a Pogoplug is described here, and it’s been up and running for months. While we were away (at a square dance weekend), I realized I’d forgotten a critical Eagle schematic file I hoped to work on in my spare time. There is no access to the main PC it lived on from the outside, but I could ssh into the Pogoplug 3 feet away from that PC and on the same network. Could I hack into my own PC from the Pogo?
There was of course no Samba stuff installed on the small Archlinux distro on the Pogo. A little googling said I needed smbclient. I soon figured out that apt-get wasn’t the package management tool here, and told pacman to get smbclient. That of course was dependent on other files that needed to be upgraded, and those first needed some other upgrade. At one point, not understanding the impact of what I was doing, I typed pacman -Syu. That of course upgrades everything. And on a 1GB thumb drive there isn’t room for everything, so the disk was soon full and the system almost non-functional.
But worse than that, the upgrade bumped perl up from 5.10 to 5.14. There were apparently some API changes, and the Device::SerialPort module I depend on to talk to the RS485 network reported an undefined symbol and wouldn’t run at all. So in addition to a full disk, the monitor stuff was completely dead. And I didn’t even get smbclient to run to get the file I was after!
The path to stupidity
When I got home and started to rebuild the box, the first thing I found was the plugapps.com web site I’d always gotten the Linux distro from was gone. It redirected to archlinuxarm.com (not to be confused with archlinux.com!), and I while I don’t understand what happened and who the players actually are, I did find everything I needed there.
I had 2 Pogos, each with an OS thumb drive, different hostname and address, etc. The main one now had a completely full disk and, having failed in the middle of a major upgrade, didn’t even have all compatible parts. It would run, but wasn’t very useful.
The second Pogo was still completely functional, and in fact was built on a 2GB drive. The actual boxes were identical, and both had the magic bootloader in flash that allowed booting from a thumb drive. All the following happened on the first physical box, although that’s not relevant.
I booted up from the second thumb drive, and inserted and mounted the compromised one so I could access it. The first thing I did was pull the (one and only copy of!) the latest home automation stuff off the first drive safely to the second drive. One gig wasn’t enough, so I plugged in a 4 gig drive to build the new system on. I started by copying the files from the original 1 GB drive. It would run, but since trying to finish the aborted upgrade was going to be messy, I decided to reinstall the OS from scratch, upgrading to the latest as long as I was at it.
That meant deleting all the files from the 4GB drive. I booted from the second drive, and mounted the 4GB. I cd‘d to the mount point and looked around to make sure it was the drive I needed to clear. It was. Fine. I was already on that drive, so all I had to do was go to its root and delete everything. So I cd‘d to /, and typed rm -rf *.
Unfortunately, while my brain was chrooted to the 4GB drive, the actual machine didn’t know about that, so my cd went not to the root of the 4GB, but to the real root – of the boot drive with the working OS. It wasn’t until I got a prompt back and ls wouldn’t work that I realized what I had done. Yes, I was trying to go to the root directory and recursively delete everything. Just not on that filesystem. There went my one working system. Damn.
Rebuild attempts and notes
Here’s some history of what happened next on the way to being able to make (what I hope is) a complete list (The Steps, below) of the changes I had to make to get the whole system up and working as I wanted. I capture some details here to avoid clogging up The Steps with extra words.
I rebooted from the still partially functional 1GB drive, and mounted the 4GB. This time I succeeded in removing the files from the 4GB drive, leaving the OS intact.
I started a basic install by following the steps from
1) Mount new drive to /usb. Download and install Arch Linux ARM:
tar -xzvf ArchLinuxARM-armv5te-*.tar.gz # This will take a long time
sync # Takes a while when using a flash drive
2) Clean up and reboot. Cross your fingers and hope for the best.
Took maybe 40 minutes. I removed the old thumb drive, and on reboot from the 4GB, a serial terminal showed normal looking startup. I logged in as root/root and started setting it back up as I needed it.
For reference, uname -a gave: Linux alarm 2.6.39-ARCH #1 PREEMPT Tue Jun 14 15:55:01 MDT 2011 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux
I mounted the old compromised boot thumb drive as a reference. When I tried to go to /etc on the old drive, I ended up at the real /etc – just like when I clobbered the other thumb drive! Rats – I was supposed to learn from that! But at least it didn’t hurt anything.
I copied old /etc/rc.local over. I think that should get the network up. Rebooted, and yes – now I can ssh in! Rats – no, I left the old thumb drive in and it booted from that instead of the 4GB. Pulled old drive, rebooted. No, didn’t work. Back in thru the serial console. No ifconfig! Or route! Or netstat! WTH? Copied those binaries over from old thumb drive to /usr/bin (rather than where they came from). What happened?
Turns out this linux distro is significantly different from the one that was running before. In particular, “legacy” tools like ifconfig are in the net-tools package (group?), which is not included in the distro, while their iproute2 counterparts are the default tools. Here’s a partial list:
– ifconfig is replaced by ip addr (or ip a – shortened args work!)
– route (like route add…) is replaced by ip route
– netstat is replaced by ss (except -rn, which is ip route show)
I used pacman to install net-tools, so ifconfig and friends are available, but I guess I should start using the iproute2 tools. (I even VPN’d in to work to see if the iproute2 tools are there on a current Red Hat 5 system. They are – and now I wonder if some of the network scripting I’ve done in the past year or three would have been better done based on the new tools!)
Rc.conf has changed for network stuff, so copying the old file over was a bad idea. Works fine if you just modify the file included with the distro. Details in The Steps.
Changed sshd port in /etc/ssh/sshd_config to the standard non-public value.
Modified /etc/bash.bashrc so command line editing is vi-like and ‘dot’ is in the path.
Tried to install the little web server lighttpd, but pacman couldn’t find it. The repos pointed to by /etc/pacman.d/mirrorlist were still on plugapps.com. That had been redirected to archlinux.com for the main repo, but not the others. Changing the Main Server line to Server = http://archlinuxarm.org/arm/$repo let it find lighttpd, but it recommended upgrading pacman, and that also upgraded to linux-api-headers-3.0.1-1, glibc-2.14-4, pacman-3.5.4-3. Fine. Tried lighttpd again and 1.4.29-2 installed OK. It suggested optional upgrades of libxml2, lua, ilbmysqlclient, sqlite3, which I declined. (Note: installing latest pacman-mirrorlist updates the mirrorlist file the same way.)
Made some changes to /etc/lighttpd/lighttpd.conf so it would listen on my non-standard port and to let it deal with perl and keep an access log so I can tell when people have downloaded stuff I put up for them.
The start/stop/restart script /etc/rc.d/lighttpd doesn’t work if you kill the daemon manually. If it won’t work, make sure the daemon isn’t running, then remove /var/run/lighttpd/lighttpd-angel.pid and try again. Also, since /usr/sbin is in my path, even if you go to /etc/rc.d, you have to invoke it as ./lighttpd <action>. When I added a filename for the access log, lighttpd wouldn’t start until I created the log file and either changed it to 777 or did chown+chgrp to http. Don’t know if it would have created the file more gracefully if I’d rebooted instead of trying to restart the daemon (with either the start script or pkill -HUP). Copied the default page (index.pl) and the hacked Pogo image to the http root directory (/srv/http). Server seems to work fine.
While ntp is generally a good idea, it’s especially critical on the Pogo because there’s no battery-backed clock. Installed ntp and verified that its default config file pointed to appropriate time servers (pool.ntp.org). It defaults to -g, allowing a large jump at startup, which is essential since the Pogo always thinks it’s 12/31/69 on boot. But since it takes a few minutes for ntp to make that jump, there’s a problem with at least my pingstat.pl which starts very shortly after boot: When pingstat starts, it’s 1969. A few minutes later, it’s 2011. This results in an inappropriate value for “total run time”. Turns out there’s a very simple solution: The distro already has an ntpdate built in – though not enabled by default. (That deprecated tool is implemented with something like ntpd -g -q.) By putting ntpdate in the DAEMONS list of rc.conf – better before ntpd? – the time is set before pingstat is started by rc.local.
Installed ddclient. My IP doesn’t change much, but since I access it (via dyndns as home.jimlaurwilliams.org) I hope ddclient will care of those rare changes. Starts as daemon in rc.conf. Changed /etc/ddclient/ddclient.conf to check once/day, no mail, use web to get IP addr, use dyndns with my account info. I don’t really know any way to test it.
As long as I was building things up the way I want them, I installed smbclient. It might have required some other upgrades. Took a few tries to get the syntax right, but I finally used it to get into the main PC. Woo hoo! I captured that syntax in a gosmb script in my home directory. Couldn’t get into the laptop, though. I also added directory and file /etc/samba/smb.conf It’s empty, but avoids an annoying err msg.
To get a telnet client I installed inetutils.
To get the monitoring stuff working, I first copied * from old /root/perl directory, which is where all the home automation stuff lives. The main perl program (currently 485pollB.pl) is dependent on several perl modules. Not knowing exactly what they were, I started the annoying but necessary cycle of
– start program
– see what it complains about
– fix it
– go to top
I ran cpan and let it find servers, since I figured I’d be pulling a bunch from there. (Turns out I was wrong.)
The first failure was for SendMail.pm. (Capitalization is probably relevant.) Rats – that doesn’t seem to be on CPAN. Do I grab it from elsewhere or rewrite for a mail package I can find on CPAN? Ugh. I’ll grab the old one. Copied SendMail.pm (2.09) to /usr/share/perl5/site_perl.
The next fail was Cron.pm. Made dir /usr/lib/perl5/site_perl/Schedule and copied old Cron.pm (0.98) there.
The next fail was for ParseDate.pm, used in Schedule::Cron.pm. Copied old Time modules CTime.pm (99.06_22_01), ParseDate.pm (2006.0814), DaysInMonth.pm (99.1117), JulianDay.pm (2003.1125), Timezone.pm (2006.0814) to existing /usr/lib/perl5/core_perl/Time. No claim they’re the latest, best, or even all needed. ParseDate is an old favorite of mine, though it isn’t directly used in 485pollB.pl.
The next fail was for the dreaded SerialPort.pm. (Dreaded because it had failed on earlier attempts with an undefined symbol – Perl_Gthr_key_ptr.) Tried to use cpan to install Device::SerialPort.pm. The version on the old filesystem (1.04) was clearly visible on CPAN, but it took several tries to figure out how to ask cpan to get it. Finally “i Device-SerialPort-1.04” found the package, but somehow wouldn’t install it. (Hmm – I wonder if that was because the INSTALL script was missing?) I made a dir /usr/lib/perl5/site_perl/Device and copied SerialPort.pm from the old filesystem to it.
Next fail was for a loadable object for module Device::SerialPort. Found a .bs and a .so in /usr/lib/perl5/site_perl/auto/Device/SerialPort. Copied it all over, but still failed on undefined symbol. Guess I’ll have to build it from source. Went to CPAN and found the URL for the whole .gz and pulled it down with wget. The README said the INSTALL script would install everything – but there was no such file. Seeing a Makefile.PL I made an educated guess and tried perl Makefile.PL. It complained there was no gcc. Pacman found and installed the compiler. Perl Makefile.PL fared better this time, and said I was “ready to type make”. Typed make. No make. Installed make. Typed make again. It worked! Whatever unresolved symbol was in the old binaries was fixed by recompiling. And at that point the main perl program worked!
How about auto startup? Copied a couple of lines from the old rc.local to the new one.
On reboot it never came back up. The serial console said no partition table on the thumb drive! I knew everything was right there – but with a corrupted/missing partition table it was dead. Tried to remake the partition table exactly as I had before (not putting the file system on) and a few other things but no luck. Nothing to do but start over from the beginning. Damn.
It went a lot quicker the second time, but still took a long time. I was able to edit some files after they’d been untarred but while tar was still working on others. I was very glad I’d started these notes. When I got through all the steps (again), it all started up (again)!
At some point in the multiple rebuilds, I ran Jeff Doozan’s install_uboot_mtd0.sh to update the bootloader. It said:
## Valid uBoot detected: [pinkpogo jeff-2010-10-23-current]
## The newest uBoot is already installed on mtd0.
so I smiled, said “Thank you!” and went on to the next step.
After everything seemed to be running OK, it soon became apparent that the system was in a loop power cycling the modem and router. That could only mean pingstat.pl thought it couldn’t ping its target. I put entries for the modem and router in /etc/hosts, but the loop persisted. Finally looked at the dumb code – and was reminded that pingstat actually (and very appropriately) pings the router at my isp, not the local one. Added isp’s router to the hosts file and now it’s all happy. I guess if my isp re-addresses stuff that would break, but that should be an exceptionally rare occurrence.
— BASICS TO GET THE BOX UP AND RUNNING ON THE NET —
The following steps got the Pogo on the local network and the internet, enabled ssh into the box, provided a comfortable environment, enabled a web server, enabled samba access via smbclient to local PCs, and set the clock via ntp.
– Changed root password
– Added/changed following in /etc/rc.conf:
DAEMONS=(syslog-ng network netfs crond ntpdate sshd ntpd lighttpd ddclient)
– Added to /etc/resolv.conf:
– Added to /etc/modprobe/modprobe.conf to disable IPv6 on general principles:
#disable ipv6 10/1/11 jw
alias net-pf-10 off
– Updated /etc/pacman.d/mirrorlist, which used to point to plugapps.com:
Server = http://archlinuxarm.org/arm/$repo
(Alternately, install/update pacman-mirrorlist for the same effect.)
– Updated package database with pacman -Sy
– Installed net-tools (pacman -S net-tools), so ifconfig and friends are available (or learn to use iproute2 tools!)
– Installed ntp
– Changed sshd Port in /etc/ssh/sshd_config to the standard non-public value.
– Hacks to /etc/bash.bashrc:
changed PS1 to [\u@\h \w]\$;
set -o vi
– In /etc/lighttpd/lighttpd.conf, I added/changed:
static-file.exclude-extensions = ( “.php”, “.pl”, “.fcgi”, “.scgi” )
server.modules = ( “mod_cgi”,”mod_access”,”mod_accesslog” )
cgi.assign = ( “.pl” => “/usr/bin/perl”, “.cgi” => “/usr/bin/perl” )
accesslog.filename = “/var/log/lighttpd/access.log”
changed server.port to the standard non-public value
added “index.pl” to index-file.names
– touch /var/log/lighttpd/access.log
– chown http /var/log/lighttpd/access.log
– chgrp http /var/log/lighttpd/access.log
– Copied index.pl and PogoTux.jpg to /srv/http
– Installed smbclient – maybe needed some other upgrades?
– Added dir and file /etc/samba/smb.conf. It’s empty, but avoids an annoying err msg
– Created gosmb script in root’s home dir to capture syntax to run smbclient
– Installed ddclient. Changed these in /etc/ddclient/ddclient.conf:
password=<my dyndns passwd>
use=web, web=checkip.dyndns.org/, web-skip=’IP Address’ # found after IP Address
server=members.dyndns.org, protocol=dyndns2 jimshome.dyndns-web.com
– Installed inetutils to get telnet, ftp etc clients.
— GETTING HOME MONITOR STUFF TO RUN —
The following steps put the home monitoring stuff in place and allowed it to run. Perl (5.12) was included in the original distro. The additional hardware of an FTDI-based USB-RS485 adapter was already in place, and the driver for it was in the distro.
– Copied * from old /root/perl directory, which is where all the home automation stuff lives.
– Copied SendMail.pm (2.09) to /usr/share/perl5/site_perl. Note capitalization – there are lots of sendmail modules!
– Made directory /usr/lib/perl5/site_perl/Schedule and copied old Cron.pm (0.98) there.
– Copied old CTime.pm (99.06_22_01), ParseDate.pm (2006.0814), DaysInMonth.pm (99.1117), JulianDay.pm (2003.1125), Timezone.pm (2006.0814) to existing /usr/lib/perl5/core_perl/Time. There’s probably a better way to manage all of them – like putting them all in site_perl in an easy-to-find bunch.
– Installed gcc and make (to compile SerialPort, below.)
– Device::SerialPort.pm includes both a perl module and .bs and .o binaries. I downloaded source (Device-SerialPort-1.04.tar.gz) from CPAN to /opt/SerialPort and unzipped it. The following built it:
perl makefile.PL (that eventually announced my next step was to run make)
It seemed to work and install somewhere the module could find it.
– Added to /etc/rc.local to start automation stuff at boot time:
# start 485 poller
– Don’t forget to add these to /etc/hosts so pingstat.pl can find its targets:
Backing up the whole Pogo system drive
The goal of this backup is to be able to recover from a corrupted thumb drive, rather than backing up data. Being forced to build up the system as I want it several times in fairly rapid succession, I was very motivated to have a way back it up. Ideal would be a single tar file, just like they ship the distro with, but with all my code and modifications rolled in. Fortunately, booting is handled by code in flash, so the thumb drive with the OS only needs files.
I found a post on archlinuxarm.com (PostOnBackingUpPogo) with the script one guy uses for this. I studied that, read the man and info pages on tar a lot, tested a little, and modified his to make it my own (listed below). It cleans up the pacman cache to reduce size and creates a tar file preserving owner, perms, date with month/date/year as part of the filename. It skips over directories like /proc and /dev. It does a second pass to put a couple of files in /dev, though I don’t know whether they’re needed (and haven’t tested to see). Current file is about 500MB uncompressed. (Compression is pretty slow on the Pogo for big files, so I don’t bother.) Zipping it up on the PC resulted in a ~170MB file. Thanks to smbclient, I can transfer the backup to the PC easily 🙂
# back up full state of pogo
# To reinstall, untar this in root directory of
# a thumb drive with empty ext2 filesystem on partition 1.
# this clears all package cache files for a smaller archive
echo backing up to $tarname …
tar -cvpf $tarname –exclude=/media/* –exclude=/proc/* –exclude=/lost+found \
–exclude=/sys/* –exclude=/dev/* –exclude=/mnt/* –exclude $tarname/ \
tar -rpvf $tarname /dev/console /dev/null /dev/zero
ls -l $tarname
I can drop the resulting file on a thumb drive with nothing but an ext2 filesystem in the first partition, untar it, plug the drive in, and the pogo will boot to a fully functioning house monitor. The only drawback is that I will have lost the 485net and pingstat data since the last backup. I suppose I should think about backing that data up periodically.
I found a nice ext2 filesystem driver for the PC (ext2fsd.com) and I thought I could just manage the ext2 thumb drive with that, but it only almost worked: Since it makes the ext2 volume appear as a normal Windows disk (complete with drive letter), it’s subject to Windows’ (long) file name conventions. Unfortunately a couple of files in the Arch linux distro have names with characters Windows doesn’t like, (like /var/lib/pacman/local/vi-1:050325-1) so it refuses to work with them. Maybe there’s some other driver that doesn’t try to integrate so tightly with Windows that could avoid those problems, but this one doesn’t cut it.
Update 3/21/14: I’m finally getting around to putting a Pogoplug up at Workshop88. I brought a working one (with a serial port) in to see if I could get it running. Ted set up a DMZ and gave me a static address on 192.168.2.
I failed, as the boot sequence hung on ntpdate (really ntpd -q) because it couldn’t contact an ntp server. Unfortunately, even though I had a serial console (through a 3.3V USB-serial adapter and the Dell mini), I couldn’t break the hung ntpdate. When I connected it to a network port on the Dell which had been manually set to 192.168.99.99 (the Pogo’s old default router), I could ping the Pogo, but not ssh to it. That’s apparently because the daemon startup order in /etc/rc.local starts ntpdate just before sshd – so I can’t ssh in. Boo.
I brought it back home, and it came right up on the network it was built for. Ssh’d in, I modified rc.local with the new IP and (guessed) default gateway, hostname w88pogo, and changed the daemon startup order to start sshd (and lighttpd) before ntpdate. I saved the backup tar file I found there (though didn’t build a new one with the known working config), then made and saved a copy of a backup with the W88 config. I think it will boot correctly when I bring it to the space, and it should at least let me ssh in if there’s a problem, but I hope that found backup will let me get it back up at home if there’s a problem.
I’m very glad I did the work to get that backup script working!