House monitor on Linux!

I’ve been trying to migrate the main server program for the home automation system off the main PC for a couple of years.  It’s a hassle to shut everything down gracefully when Microsoft pushes down yet another must-reboot update, and of course I can’t let it reboot automatically.

The target for the migration has been Linux on a cute little Pogoplug.  I’ve collected a couple of them over time (so I’d have a spare) and gotten the open source plugbox distribution on all of them.  I might even donate one to Workshop 88 for a combo monitoring system (maybe just motion sensors and door-open switch for starters) and a box I can log into and do things like pinging my own system from ‘out on the internet’ (using some kind of dynamic DNS to deal with the fact that we don’t have a static IP at the space).

While the main program is in perl and so should be pretty portable, there were a lot of roadblocks.  I’d been using a Win32 serial port module to wiggle RTS to control TX/RX for the half-duplex RS 485 serial connection to the monitoring nodes.  I’d also been using the Term::ReadKey module that is hooked into some Win32-specific raw keyboard routines so I could have some control over the main program from the window it was running in.  The serial hardware was a home-brew RS232-485 converter, and of course there’s no RS232 on the Pogo.  I’ve been hacking away, testing solutions to problems one at a time in isolation, but the pieces weren’t integrated until just now.

I found a nice cheap USB serial board (on Ebay, from China, of course) with the good old FTDI chip on it for which I knew the linux distro had a driver.  That has RS232, TTL, and RS485 outputs, and auto-sense TX/RX switching for the RS485, taking away the problem of messing with RTS for direction control.  While the wiring around the house for the network of sensor nodes uses CAT-5 cable and RJ45 connectors, the data is just one pair (brown/white).  (I use the other 3 pairs to carry unregulated 12V to power the nodes.)

Fortunately, the keyboard input is very localized, so replacing it with something else shouldn’t be too hard.  I settled on the brute-force approach of a signal handler in the main program that reads a file of user input, along with a small separate script that gets a command from the user, writes it to the file, and sends a SIGUSR1 to the main program.  There’s even a reverse channel to get output from the main program back to the user interface script.  (Prototypes of that are tested and working, but not integrated into the real program yet.)  I wanted a separate program so I wouldn’t be locked down to stdin/stdout on one window.  Aside from problems of running the program from a ssh session that might go away, I wanted to be able to do something like sshing in to the Pogoplug from the netbook while I was outside tuning up the landscape watering which is controlled by the automation system.

In the short term, I just ripped the keyboard input section out, so I have no control, and the only output is stdout in the window I start the main perl program from.  That will do for a little while until I hook the signal-based user interface in.  And the serial channel timing still needs some tuning.

But it’s up and running and sending data to the web site – woo hoo!

Update 5/24/11: First:  Huge thanks to commenter Stephen for dramatically improving the stability of my Pogo!  It had been failing to boot to Plugbox after a power cycle.  The fact that I could get it booted correctly after a couple of tries, or from a reboot issued on the serial console is not very helpful if I’m away on vacation and there’s been a long enough power outage to drain the (very large) UPS batteries and it has to come up on its own.  I had another Pogo that didn’t exhibit the symptom, and it eventually dawned on me that I’d hacked the bad one a year or so before the one that worked.  I reloaded the latest U-boot bootloader, and voila! – it comes up every time now.   If it hadn’t been for the conversations with Stephen, who knows how long it would have taken to figure that out.  Thanks!

Well, it was up and running for a day or so.  Unfortunately, the cron thread that knows when to turn the sprinkler system on couldn’t talk via the global variables needed to get the main program to actually operate the valves.  I just reseeded part of the lawn, and really couldn’t afford to have the sprinklers not work while we went away for the weekend, so I had to go back to running it on the main PC  😡

[Mildly interesting aside: There’s an occasional failure of the ftp append function that adds new data to the big file on the web host every 5 minutes, and that failure occurred some time shortly after we left for the weekend 🙁  The failure results in losing all the old data, including a startup line with a time offset needed for anything to work, making the house monitor page useless until I fix it by copying a bunch of older data (with the necessary startup line) from the PC to the web host.  I did that (after we got home), but there was an ugly anomaly in the data – basically no data for a day or so.  D’oh – that was when the Pogo was collecting data, not the main PC!  So I ftp’d the datafile from the Pogo to the main PC (it’s the same perl program, so the data format is identical), did a little hand editing to put that block of data in the right place, and pushed the big file back up to the web host.  Anomaly gone.]

I’m using the threads::shared perl module, and it allowed globals to work fine between threads on the PC – but not on Linux.  After a couple of rounds of hacking, I found the “nofork=>1” option in the Schedule::Cron module I was using to be able to use a crontab-like file to control the sprinklers.  Windows doesn’t really have a ‘fork’ capability, though Linux obviously does.  When the cron module actually forks, there’s no way for globals to communicate between processes.  Telling it to not fork – even though the host O/S supported it – fixed the problem.  So while there’s still no user interface, with a little luck the main program will live on the Pogo for the foreseeable future.  I will implement some user control (via signals) soon – but it’s less urgent than keeping my new grass from drying out.

Update 6/4/11:  Major progress! Inspired by the fact that the bedding plants are planted and I really don’t want to have to worry about watering them, it’s time to get all the landscape watering plumbing hooked up, working, and not leaking too much.

But the last part involves testing the automated system.  Back when it was run with X10, I could use an RF X10 remote controller to turn the valves on one by one.  But after a couple of flooded gardens due to latching relays and pathologically worst case timing of power outages, it’s all controlled by my own hardware now.  There are in fact physical buttons on the sprinkler controller in the basement, but that’s an awful lot of hassle for the several valves and zones I have to test.  I should just be able to do it from outside.    Unfortunately, the control program on the Pogo didn’t have a user interface (yet).

It does now!  I got the SIGUSR1 sender working and hacked almost all the old functionality into the program in response to the signals.  In particular, I can now ssh into the Pogo (say, from a netbook over wifi) and control the sprinklers!  And stdout is now redirected into a nice file I can do a tail -f on if I want to see it.  (Rolls over at midnight, keeps one old file.)  And both the main program and the ping stats script now start in an rc script at boot time.  It’s really close to in final form.  Woo hoo!

Update 6/9/11:  Oops We just had the first power outage since hosting the house monitor on the Pogo.  It failed because – it wasn’t plugged into the UPS!  During the months of on-and-off (mostly off) development leading to actually running the app on it, it was just sitting out (on top of the printer, actually) so I could get to it easily.  When it actually ran, I was so happy I forgot about production details like putting it on the UPS.

I was sitting next to the Pogo (on the main PC) when the power went out.  The PC of course was on the UPS, so was perfectly happy (and was the only source of light in the room).  But the Pogo was dark.  Scrambling around with a flashlight, I got it plugged into a more appropriate outlet.  But no network.  As part of just throwing it together, the closest network connection was an extra port on a secondary wireless access point.  But since that’s not the main router, it’s not on UPS either.  More rooting around with a flashlight under the table, and I got that up and the Pogo was on the air again.

Of course one of the corners cut in the minimalist Pogo(/Sheevaplug) design is that there’s no hardware clock.  And while I do have ntpd running, when the Pogo booted there was no network, so it couldn’t figure out what time it was.  There’s now one “STARTED” line in the log with a timestamp of Wed Dec 31 18:01:16 1969.

It was raining (and thundering and lightninging) when the power went out.  I had just yesterday set the cron file up to run all the sprinkler zones.  But given the significant rain, I wanted to turn them off for a while.  The “disable all watering for N hours” feature of the user interface is one of the very few I didn’t implement.  Rats.  I hacked the cron file (from a laptop on battery – the main PC was off to conserve UPS battery) and restated the cron stuff (a feature I had in fact implemented), so at least the sprinklers didn’t go on this morning.  Now if I can just remember to unhack the cron file and restart it…

Update 6/30/11: It’s getting more stable.  (Actually, I figured it was already pretty stable – wrong.)  I tried to clean up the code a little by putting repeated code in little subroutines.  And then I misspelled the name of a sub in one place, so it crashed days later when that code leg was executed.  It records stdout to a file.  I tried several times to make that file roll over (keeping one previous version) at midnight.  That mostly just didn’t work due to various errors, and crashed at 12:01 once or twice.  It reports a bogus router reset at every restart due to a self-describing data header in the data file that is misinterpreted.  I think I fixed that one today.  And I finally got pkill to work to clean it up and made a start/restart script that runs it nohup so I’ll have a clue what went wrong next time it crashes.

But it’s a real blessing to be able to ssh in to the box and fix stuff remotely, and another to be able to reboot my PC without going through a graceful shutdown dance, so I’m still very pleased with the current setup.  And while it didn’t last nearly as long as I expected on the big UPS during the last big power outage, if I can get the big PC to shut itself down automatically there should be lots of battery for the Pogo (and network stuff).  And it worked fine sshing in over wifi when I had to test out some sprinkler system repairs!

Update 7/3/11: Bitten by the “you can’t have it all” thing again.  The main perl program redirects stdout to a log file.  While it’s sometimes very valuable for troubleshooting, that info isn’t very relevant after a day or so, so I decided to have the file roll over around midnight, keeping one old version (one day’s worth) of the file.  (And avoiding a dumb out of disk error down the road.)  After *way* more failures that I expected for a simple task, I recently got it working, and thought I was done.  Not quite.

The transport code that ftps updates to the web host also reports its successes and failures to stdout.  But after a midnight rollover, that information went into the old file – not the new one!  Well duh – it forks a separate process for the transport code, so closing and reopening stdout in the main process can’t possibly change where stdout in the child process goes.  OK – how about rewriting it so the transport code is a separate thread rather than a separate process – and so can properly share the filehandles?  But that won’t work:  the reason the transport code was in a separate process (besides the fact that I didn’t know how to use threads when I wrote it) is that it may block for long periods on ftp/network problems – and I don’t want to block the main program for that long.  So unless I could find or write a non-blocking ftp package, having a separate process is appropriate.

Yeah I suppose I could have the transport code write its output to a little file, have the main program look for that file and on finding it write it to stdout/log and delete the file.  Ugh.  Or I could put similar rollover code in the transport code.  (But I can’t just copy all the code, since it removes the old log file.)  And even so, there’s a race condition at rollover time.  And closing and reopening stdout to point to the right file each time it tries to ftp something is asking for trouble as well.  It doesn’t write a lot – I suppose it could have its own log file.  Ugh.  Before I made the log roll over it was all in one place.

Why can’t I just have it all?

Update 10/1/11:  After a sad story about trashing the Pogo while trying to use it remotely to hack into my main PC, I’ve rebuilt the little server to the best it’s ever been, complete with a working backup mechanism.  It has the latest image from archlinuxarm.com (not plugapps.com any more), Device::SerialPort rebuilt from source, ntp stuff that works great, Samba so it can talk to the PC, all hosted on a 4GB (instead of 1GB) thumb drive.

It’s far enough along that it’s not really a development platform any more, so I’ve moved it from the computer bench to a fairly final home in the basement, near the switch and other network stuff.  Cables are remade to just fit, and dressed in a not too ugly manner.  The plate mounting the RJ-45 jack where 12VDC is injected into the network had a spare hole, so I added a second jack to connect to the Pogo now that it’s nearby.

It’s an appliance (if a different one than the manufacturer imagined), so it’s appropriate that it should live pretty much out of sight.  I have on occasion used it to serve files to people, but there’s one open USB port quite accessible on the front for sticking another thumb drive into.  Sure, there will be further software tweaks, but at this point it’s pretty officially “in production”.  Yay!

This entry was posted in Home Automation. Bookmark the permalink.

11 Responses to House monitor on Linux!

  1. stephen says:

    Hi,
    May I ask how did you install the FTDI driver to the native PogoPlug system?
    I tried but failed. My application is using the PogoPlug V2 to communicate with a device which has a FTDI chip for serial comm.
    I have tried when Plugapps running in the USB stick and the Linux of Plugapps has the FTDI driver embedded and thus I was successful. But with native PogoPlus, the Linux is old and does not have the FTDI chip installed.
    Thanks!
    Stephen

  2. stephen says:

    Sorry, it should be the Native PogoPlus’s Linux does not have the FTDI driver installed. So my device is not listed in /dev/ttyUSBX.
    Thanks!
    Stephen

  3. Jim says:

    Hi Stephen,

    Sorry I wasn’t clear: I’m running Plugapps linux, so the FTDI driver I’m using is the one you’ve already found to be present there. I’ve never even tried to do anything with the native PogoPlug linux – not even trying to stream media, like it was designed to do.

    Afraid I can’t help. What are your concerns about running Plugapps?

  4. stephen says:

    Hello Jim
    I have installed the PlugApps to the USB stick and it works just fine EXCEPT when it is powered down and next time when it is powered up it goes to the native PogoPlug and you have to unplug it and plug it again in order to let it switch to Plugapps environment. This is an issue called Alternate Booting that has not figured out yet with Plugapps. There are several solutions but none of them seem work.
    If we sell the whole package to the customer, we can not ask the customer to manually plug and unplug the power cord. So if there is a power failure in the field the system will be down. That’s why I want to use the native PogoPlug to do the job.

    Have you tried your Pogo when powered down next time where it goes? I believe it will go back to the PogoPlug and you have to repower it up.

    Thanks!
    Stephen

  5. Jim says:

    Ugh. Yeah, I’ve seen that. Even the second power cycle doesn’t always work. Going in on the serial console and doing a reboot does always seem to work for me. But yes, that’s not an acceptable situation for a commercial environment.

    I haven’t done exhaustive testing, but I sort of think one of my Pogoplugs doesn’t suffer from that problem. I should test it some more.

    I haven’t looked at the potential solutions yet. Could you put something in an rc script that would do a uname -a and conditionally reboot based on that?

  6. stephen says:

    Hi Jim
    Before I could not know how to make the native flash writable. Now I can do:
    mount -o rw,remount /
    I may try to find a place to reboot the system after a minute in native Pogoplug. In PlugApps there is a file called rc.local but I do not know what file it is in Pogoplug. But this method will take too much time.

    Yes you are right you have to plug the power cord three time. The second reboot the system says bad partition (I monitored it via the serial console) and it is stuck there. So if the Pogo goes here the above the above solution won’t work either. That’s why I just want to modify the native pogo instead of running the Plugapps.

    After more research on-line, I guess I can ipkg the FTDI in the native pogo. I will give a try and get back to you.
    Thanks!
    Stephen

  7. Jim says:

    OK, I tested 2 Pogos, #1 and #2. #2 boots correctly after a power cycle. I tried it 10 times in a row – a couple with it unplugged for a couple of minutes. Every time it booted up to plugapp first try.

    I swapped thumb drives, and # 2 booted fine the 3 times I tried it with #1 thumb drive, so it’s really the hardware (or firmware).

    I booted #1 and it showed an irregular “alternating boot” behavior with both thumb drives.

    Interesting. I set # 1 up a long time ago – a year or more? I set up # 2 more recently. Might there be an updated u-boot? I just reloaded the bootloader on #1. Since reloading, it has rebooted to plugapp successfully after each of 5 power cycles. During boot it reports:
    U-Boot 2010.09 (Oct 23 2010 – 11:51:16)
    uname -a gives:
    Linux jimspogo 2.6.35.1-00164-g23e919b-dirty #1 PREEMPT Wed Aug 11 15:38:50 CDT 2010 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

    Is that the same version you’re running?

  8. stephen says:

    Thanks for the test Jim,
    I will check my version tonight.
    Last night I tried to install the ftdi and usbserial into the native PogoPlug and it failed.
    the kernel-module-ftdi-xxx and usbserial that I downloaded from web seems for other version of Kernel and other cpu.
    I guess I have to rebuild my kernel with same version as native Pogo (2.6.22.18) and add the ftdi module in my new build and install them from there.
    Thanks!
    Stephen

  9. Jim says:

    Good luck with the kernel rebuild.

    But do check the versions. If what I just installed is a later and _better_ version that actually comes up reliably after a power cycle and already has the ftdi driver, that sounds pretty close to what you need. (OK, there’s still a thumb drive sticking out of the Pogo, but that’s not awful.)

    I stumbled across something (on plugapps.com) that seemed to be under construction for a plugbox install into flash. The name had ‘bit’ in it, but I don’t remember and don’t have time to dig for it now. There were large warnings that it was work in progress, but it would be great if somebody got that to work. Hmm – I sort of care about it, and while I’m not a kernel guy, maybe I could even contribute in some small way. I’ll have to go find it again 🙂

    In any event, you’ve caused me to do stuff that has led to a huge improvement in the stability of my setup. Thanks very much!

  10. stephen says:

    Actually I tried to install the plugbox in the NAND but failed. Like you said it is still progressing.
    The thumb drive out of the PogoPlug is not that concerns to me but the bad thing is when the stick is there while the system is just rebooting or any time before the system has stabilized and if you unplug the stick the stick may be damaged and the only thing to recover is to rebuild your external Plugapps. This make the system very vulnerable. Yes huge improvement is a must here.
    Stephen

  11. Pingback: Rebuilding my Pogoplug house monitor server | Jim's Projects

Leave a Reply

Your email address will not be published. Required fields are marked *