I’ve been trying to migrate the main server program for the home automation system off the main PC for a couple of years. It’s a hassle to shut everything down gracefully when Microsoft pushes down yet another must-reboot update, and of course I can’t let it reboot automatically.
The target for the migration has been Linux on a cute little Pogoplug. I’ve collected a couple of them over time (so I’d have a spare) and gotten the open source plugbox distribution on all of them. I might even donate one to Workshop 88 for a combo monitoring system (maybe just motion sensors and door-open switch for starters) and a box I can log into and do things like pinging my own system from ‘out on the internet’ (using some kind of dynamic DNS to deal with the fact that we don’t have a static IP at the space).
While the main program is in perl and so should be pretty portable, there were a lot of roadblocks. I’d been using a Win32 serial port module to wiggle RTS to control TX/RX for the half-duplex RS 485 serial connection to the monitoring nodes. I’d also been using the Term::ReadKey module that is hooked into some Win32-specific raw keyboard routines so I could have some control over the main program from the window it was running in. The serial hardware was a home-brew RS232-485 converter, and of course there’s no RS232 on the Pogo. I’ve been hacking away, testing solutions to problems one at a time in isolation, but the pieces weren’t integrated until just now.
I found a nice cheap USB serial board (on Ebay, from China, of course) with the good old FTDI chip on it for which I knew the linux distro had a driver. That has RS232, TTL, and RS485 outputs, and auto-sense TX/RX switching for the RS485, taking away the problem of messing with RTS for direction control. While the wiring around the house for the network of sensor nodes uses CAT-5 cable and RJ45 connectors, the data is just one pair (brown/white). (I use the other 3 pairs to carry unregulated 12V to power the nodes.)
Fortunately, the keyboard input is very localized, so replacing it with something else shouldn’t be too hard. I settled on the brute-force approach of a signal handler in the main program that reads a file of user input, along with a small separate script that gets a command from the user, writes it to the file, and sends a SIGUSR1 to the main program. There’s even a reverse channel to get output from the main program back to the user interface script. (Prototypes of that are tested and working, but not integrated into the real program yet.) I wanted a separate program so I wouldn’t be locked down to stdin/stdout on one window. Aside from problems of running the program from a ssh session that might go away, I wanted to be able to do something like sshing in to the Pogoplug from the netbook while I was outside tuning up the landscape watering which is controlled by the automation system.
In the short term, I just ripped the keyboard input section out, so I have no control, and the only output is stdout in the window I start the main perl program from. That will do for a little while until I hook the signal-based user interface in. And the serial channel timing still needs some tuning.
But it’s up and running and sending data to the web site – woo hoo!
Update 5/24/11: First: Huge thanks to commenter Stephen for dramatically improving the stability of my Pogo! It had been failing to boot to Plugbox after a power cycle. The fact that I could get it booted correctly after a couple of tries, or from a reboot issued on the serial console is not very helpful if I’m away on vacation and there’s been a long enough power outage to drain the (very large) UPS batteries and it has to come up on its own. I had another Pogo that didn’t exhibit the symptom, and it eventually dawned on me that I’d hacked the bad one a year or so before the one that worked. I reloaded the latest U-boot bootloader, and voila! – it comes up every time now. If it hadn’t been for the conversations with Stephen, who knows how long it would have taken to figure that out. Thanks!
Well, it was up and running for a day or so. Unfortunately, the cron thread that knows when to turn the sprinkler system on couldn’t talk via the global variables needed to get the main program to actually operate the valves. I just reseeded part of the lawn, and really couldn’t afford to have the sprinklers not work while we went away for the weekend, so I had to go back to running it on the main PC
[Mildly interesting aside: There's an occasional failure of the ftp append function that adds new data to the big file on the web host every 5 minutes, and that failure occurred some time shortly after we left for the weekend The failure results in losing all the old data, including a startup line with a time offset needed for anything to work, making the house monitor page useless until I fix it by copying a bunch of older data (with the necessary startup line) from the PC to the web host. I did that (after we got home), but there was an ugly anomaly in the data - basically no data for a day or so. D'oh - that was when the Pogo was collecting data, not the main PC! So I ftp'd the datafile from the Pogo to the main PC (it's the same perl program, so the data format is identical), did a little hand editing to put that block of data in the right place, and pushed the big file back up to the web host. Anomaly gone.]
I’m using the threads::shared perl module, and it allowed globals to work fine between threads on the PC – but not on Linux. After a couple of rounds of hacking, I found the “nofork=>1″ option in the Schedule::Cron module I was using to be able to use a crontab-like file to control the sprinklers. Windows doesn’t really have a ‘fork’ capability, though Linux obviously does. When the cron module actually forks, there’s no way for globals to communicate between processes. Telling it to not fork – even though the host O/S supported it – fixed the problem. So while there’s still no user interface, with a little luck the main program will live on the Pogo for the foreseeable future. I will implement some user control (via signals) soon – but it’s less urgent than keeping my new grass from drying out.
Update 6/4/11: Major progress! Inspired by the fact that the bedding plants are planted and I really don’t want to have to worry about watering them, it’s time to get all the landscape watering plumbing hooked up, working, and not leaking too much.
But the last part involves testing the automated system. Back when it was run with X10, I could use an RF X10 remote controller to turn the valves on one by one. But after a couple of flooded gardens due to latching relays and pathologically worst case timing of power outages, it’s all controlled by my own hardware now. There are in fact physical buttons on the sprinkler controller in the basement, but that’s an awful lot of hassle for the several valves and zones I have to test. I should just be able to do it from outside. Unfortunately, the control program on the Pogo didn’t have a user interface (yet).
It does now! I got the SIGUSR1 sender working and hacked almost all the old functionality into the program in response to the signals. In particular, I can now ssh into the Pogo (say, from a netbook over wifi) and control the sprinklers! And stdout is now redirected into a nice file I can do a tail -f on if I want to see it. (Rolls over at midnight, keeps one old file.) And both the main program and the ping stats script now start in an rc script at boot time. It’s really close to in final form. Woo hoo!
Update 6/9/11: Oops We just had the first power outage since hosting the house monitor on the Pogo. It failed because – it wasn’t plugged into the UPS! During the months of on-and-off (mostly off) development leading to actually running the app on it, it was just sitting out (on top of the printer, actually) so I could get to it easily. When it actually ran, I was so happy I forgot about production details like putting it on the UPS.
I was sitting next to the Pogo (on the main PC) when the power went out. The PC of course was on the UPS, so was perfectly happy (and was the only source of light in the room). But the Pogo was dark. Scrambling around with a flashlight, I got it plugged into a more appropriate outlet. But no network. As part of just throwing it together, the closest network connection was an extra port on a secondary wireless access point. But since that’s not the main router, it’s not on UPS either. More rooting around with a flashlight under the table, and I got that up and the Pogo was on the air again.
Of course one of the corners cut in the minimalist Pogo(/Sheevaplug) design is that there’s no hardware clock. And while I do have ntpd running, when the Pogo booted there was no network, so it couldn’t figure out what time it was. There’s now one “STARTED” line in the log with a timestamp of Wed Dec 31 18:01:16 1969.
It was raining (and thundering and lightninging) when the power went out. I had just yesterday set the cron file up to run all the sprinkler zones. But given the significant rain, I wanted to turn them off for a while. The “disable all watering for N hours” feature of the user interface is one of the very few I didn’t implement. Rats. I hacked the cron file (from a laptop on battery – the main PC was off to conserve UPS battery) and restated the cron stuff (a feature I had in fact implemented), so at least the sprinklers didn’t go on this morning. Now if I can just remember to unhack the cron file and restart it…
Update 6/30/11: It’s getting more stable. (Actually, I figured it was already pretty stable – wrong.) I tried to clean up the code a little by putting repeated code in little subroutines. And then I misspelled the name of a sub in one place, so it crashed days later when that code leg was executed. It records stdout to a file. I tried several times to make that file roll over (keeping one previous version) at midnight. That mostly just didn’t work due to various errors, and crashed at 12:01 once or twice. It reports a bogus router reset at every restart due to a self-describing data header in the data file that is misinterpreted. I think I fixed that one today. And I finally got pkill to work to clean it up and made a start/restart script that runs it nohup so I’ll have a clue what went wrong next time it crashes.
But it’s a real blessing to be able to ssh in to the box and fix stuff remotely, and another to be able to reboot my PC without going through a graceful shutdown dance, so I’m still very pleased with the current setup. And while it didn’t last nearly as long as I expected on the big UPS during the last big power outage, if I can get the big PC to shut itself down automatically there should be lots of battery for the Pogo (and network stuff). And it worked fine sshing in over wifi when I had to test out some sprinkler system repairs!
Update 7/3/11: Bitten by the “you can’t have it all” thing again. The main perl program redirects stdout to a log file. While it’s sometimes very valuable for troubleshooting, that info isn’t very relevant after a day or so, so I decided to have the file roll over around midnight, keeping one old version (one day’s worth) of the file. (And avoiding a dumb out of disk error down the road.) After *way* more failures that I expected for a simple task, I recently got it working, and thought I was done. Not quite.
The transport code that ftps updates to the web host also reports its successes and failures to stdout. But after a midnight rollover, that information went into the old file – not the new one! Well duh – it forks a separate process for the transport code, so closing and reopening stdout in the main process can’t possibly change where stdout in the child process goes. OK – how about rewriting it so the transport code is a separate thread rather than a separate process – and so can properly share the filehandles? But that won’t work: the reason the transport code was in a separate process (besides the fact that I didn’t know how to use threads when I wrote it) is that it may block for long periods on ftp/network problems – and I don’t want to block the main program for that long. So unless I could find or write a non-blocking ftp package, having a separate process is appropriate.
Yeah I suppose I could have the transport code write its output to a little file, have the main program look for that file and on finding it write it to stdout/log and delete the file. Ugh. Or I could put similar rollover code in the transport code. (But I can’t just copy all the code, since it removes the old log file.) And even so, there’s a race condition at rollover time. And closing and reopening stdout to point to the right file each time it tries to ftp something is asking for trouble as well. It doesn’t write a lot – I suppose it could have its own log file. Ugh. Before I made the log roll over it was all in one place.
Why can’t I just have it all?
Update 10/1/11: After a sad story about trashing the Pogo while trying to use it remotely to hack into my main PC, I’ve rebuilt the little server to the best it’s ever been, complete with a working backup mechanism. It has the latest image from archlinuxarm.com (not plugapps.com any more), Device::SerialPort rebuilt from source, ntp stuff that works great, Samba so it can talk to the PC, all hosted on a 4GB (instead of 1GB) thumb drive.
It’s far enough along that it’s not really a development platform any more, so I’ve moved it from the computer bench to a fairly final home in the basement, near the switch and other network stuff. Cables are remade to just fit, and dressed in a not too ugly manner. The plate mounting the RJ-45 jack where 12VDC is injected into the network had a spare hole, so I added a second jack to connect to the Pogo now that it’s nearby.
It’s an appliance (if a different one than the manufacturer imagined), so it’s appropriate that it should live pretty much out of sight. I have on occasion used it to serve files to people, but there’s one open USB port quite accessible on the front for sticking another thumb drive into. Sure, there will be further software tweaks, but at this point it’s pretty officially “in production”. Yay!