This is a work in progress!
Overview
The system starts with several small “sensor nodes” that monitor things (like temperature) or control things (like sprinklers). They’re connected via an RS-485 network at 56Kb/sec on Cat 5 cable. That cable also provides 12V DC to power the nodes.
The other end of the 485 network is the main control or monitor machine. Software running on this box polls the sensors , collects their data, and ftps updates to the web host. Additional software there creates the page we see.
Software structure
There are three main components to the software:
- Node-specific software in the sensor nodes (written in PIC assembler) that performs the node’s tasks and communicates over the RS-485 network.
- The “poll” program running on the monitor machine at home (currently a Pogo) that polls all the sensors over the RS-485 network, collects their information, and ftps updates to datafile.csv on the web host every 5 minutes. Other cooperating programs on that machine provide a user interface, monitor network connectivity, and more.
- The “page display” software running on the web host which paints the main status page and creates the graphs (and text reports) from the data in datafile.csv displayed on that page. It also has links to these Project Notes and other pages served by that GoDaddy-hosted web server.
Node software
As of 4/13, all the nodes are still running on PICs with code in assembler. While I finally structured that code into some sort of skeleton with all the communication stuff plus some node-specific parts, it’s still not fun to work on, and I have to figure it out again each time I need a new node. (And figure out how to run the programmer, and probably download a new version of the IDE…) Some day I expect to get an Arduino (or some Atmel chip) based sensor node running so I can write in C. An SHT15 temperature/humidity sensor that’s been sitting on the shelf for a year is a likely driver for that. Some day.
Update 6/3/14: I’m starting to try to run nodes on Arduinos (or the like). First step is to reverse engineer the protocol. Fortunately, there are fair comments, so it’s not like starting with a sniffer 🙂
The protocol is: [sync/header byte (‘U’)] [src addr byte] [dest addr byte] [opcode hi nibble; length lo nibble] [0 or more data bytes] [CRC lo byte] [CRC hi byte]
Master addr is 0. Slaves are defined uniquely. We can have up to 16 opcodes and length up to 16 bytes. Found this in some source:
;------------ OPCODES ------------- #define OCpoll 1 #define OCretr 2 #define OCrst 3 #define OCgeterr 4 #define OCdata 5 #define OCdata.err 6 #define OCack 7 #define MAXOPCODE 7
“Received length” is at least 6. What’s in the protocol length nibble is (num data bytes + 2 for CRC). That’s always 4 less than “received length” (hdr, src, dest, op/len are not counted in the protocol length).
CRC is really just a 16 bit checksum at the moment.
6/5/14: OK – got the Mega with a 75176 on a breadboard to send Quick Brown Fox to the USB-485 converter plugged into the Ubuntu Dell Mini 9 at 115200! Used Mega since it has multiple serial ports. Grounds between PC and Dell Mini 9 didn’t seem to be a problem. Wrote up most of a basic message parser/validator. Never sends out the 485 device. Should echo parsed received messages when plugged into the real 485 net. (And shouldn’t screw the network up!) Basically a protocol aware (but content unaware) 485 net sniffer to see if it works, is fast enough, etc. Currently stuck on parsing message boundaries. I think timeout is essential as terminator. Maybe?
6/6/14: I made up a little breakout to 0.1″ pins for an RJ45 female with the standard 485 net Cat5 pinout. Here it is on the Mega with a driver and the initial pair to a USB-485 adapter. Very next step is to see if it can parse (read only!) live messages on the house network!
Later that day: Wouldn’t read anything at all until I reversed the hard-wired data direction on the 75176 (from initial fox send tests). Oops. I would expect that always-on sender to mess up the network, but surprisingly, it didn’t seem to. While the RJ45 breakout isn’t wired yet, you can see the direction pullup still connected to +5 in the pic above.
After a few tweaks, success! While looking at the parsed messages, I found that the temp2 (addr 5) outside temp sensor seems to constantly respond with opcode 6, which I vaguely recall indicated first response after startup. Looks like it is used for first reply after start as well as somehow to report errors.
The great setTimeout() on serial stuff inheriting from the Arduino 1.x Stream class made the code pretty clean. I’m kind of depending on silence as a message delimiter. Seems to me I had to put delays in maybe to allow for turnaround time in 485 adapters. I put a pot on the Mega so I could play with the timeout, and found that with timeout of 29mS it works pretty consistently, but at 31 it misses some data (gets the polls, though). At 34 it only gets polls. That’s a very useful data point. Since I’ll be managing data direction on the 75176/483 TTL/RS485 (422) driver, I can use any timeout I choose for receive parsing. Let’s pick 10mS.
Guess I better add a 30mS delay before I respond to polls, though. I should look back through the old node code to look for a delay like that. (later) Found:
;----------------------------------- ; MAIN LOOP PROCESS RECEIVED MESSAGE ;----------------------------------- rx call tenms call tenms call tenms ;------------ process opcode--------
Next step: (Read a sensor and) actually respond on the network!
Update 6/9/14: Various bumps along the way, but I got code working on the Mega2560. It decoded existing traffic on the live 485 net (yeah, the first connect was scary). Then I added stuff for the DHT22 temp/humidity sensor, put code in to respond to a poll, and (with the new section in 485pollC.pl,) it all seems to work!
Of course the target is a Tiny 4313 (since it has an actual serial port that has a chance of being close enough to speed (with a crystal or 8MHz ceramic resonator) to work at 57600. First I took out the second serial port stuff and verified that it still worked on the Mega. Then I tried to compile for a 4313. (Using a boards.txt entry for “2313 hacked for 4313”.) Unfortunately, the new Serial.setTimeout() and Serial.readBytes() the code depends on aren’t there in the Tiny4313 core. Rats. Rewriting the code without the timeout was very surprisingly hard. Changing readBytes() to use read() was pretty easy. And both sets of code were nicely localized. Got the new version running on the Mega, and then stripped out the second serial port stuff again for another try to compile for 4313.
Got the dreaded i:/program files/arduino-1.0.5-old/hardware/tools/avr/bin/../lib/gcc/avr/4.3.2/../../../../avr/lib/avr25/crttn2313.o:(.init9+0x2): relocation truncated to fit: R_AVR_13_PCREL against symbol `exit’ defined in .fini9 section in i:/program files/arduino-1.0.5-old/hardware/tools/avr/bin/../lib/gcc/avr/4.3.2/avr25\libgcc.a(_exit.o) error. Messed with stuff I’d tried before about importing bin directories from WinAVR 2009 and 2010 – usually getting an error like ‘can’t create process’.
Googled the error messages some more and found an Arduino 1.0.5 install package with WinARV 2010 tools embedded from Erwin Reid and installed it. And now it compiles! And that tool chain even actually knows about Tiny4313, so I don’t have to hack a 2313 entry for it! The bad news is that the smallest I could get it was about 4380 bytes – too much for the 4313’s 4096 bytes of flash. Tried the Adafruit DHT22 lib, but that was even bigger. Tried turning off the float support by #defining DHT22_NO_FLOAT, but it didn’t help. That extra couple hundred bytes is pretty much a show stopper for using a 4313. Bummer. Yeah, I suppose something without the DHT lib might fit. Grumble.
I did recompile for a Duemilanove, and that works. So I could make up a little shield for a Diavolino so I could put something in the bedroom, (or I suppose, run off a breadboard shield). That might buy some time until I decide to get some 328P SMT 32 pin TQFPs and make an intial 485 PCB with them. Ugh.
Still to do is responding to other opcodes: Retry, Reset, GetErrs, and Data to control stuff. But since it doesn’t have to fit in 4K, the extra code shouldn’t be a problem.
Update 6/26/19: Still working on using the first Arduino node. Motivation is rising, as I have a DHT-22 sensor installed in a perfect location under the eave on the far side of the garage, with cable going to the relocated (currently PIC) node. I have an Arduino node built, with code that mostly works to read the temp/humidity sensor. But since it will also be the rain collector reader, it will need additional code (very likely interrupt driven), and new code in the poll program. But when the code works, it should be a 5 minute job to swap the node out and bring it online. Theoretically.
<more detail needed>
The “poll” program
Currently about 1740 lines of Perl running on a Pogo, this program is the main control of the system. Current version is called 485pollB.pl 485pollD.pl It talks through a USB-RS485 converter professionally mounted to the side of the Pogo, polling all nodes once/min over that serial network. While the basic poll message is the same for each node (except of course address and checksum), each node replies with unique data. The poll program has a separate callback for each sensor to decode and deal with its reply. Each sensor also has several associated data structures to store current values, accumulated counts, error counts, etc. Both the callback and data hashes are indexed by the sensor name – “sump”, “rain”, etc.
Update 6/26/19: The “professional” rubber band mounting has been upgraded to be supported by a sheet metal saddle, the other side of which supports a 120 GB SSD. The latter should reduce the USB thumb drive “disk” failures.
For those nodes which can be controlled – primarily the sprinkler controller – there’s a simple external control mechanism. A requirement was to be able to telnet ssh into the Pogo to control them, rather than accessing stdin/stdout on the poll program, which could only be done from the Pogo console. So the poll program now listens for a USR1 signal in its main loop, reads a command file when it receives that signal, and sends control messages based on the file’s (one line ASCII) content. A companion command line program (sendusr1) – which can be accessed by telnetting sshing in – provides a very simple user interface, writing the file and sending a signal when <cr> is pressed. The prototypical application of this is logging in from the netbook a tablet from outside to control the sprinkler and watering systems while setting them up in the spring.
Update 6/8/14: With an Arduino-based temp/humidity sensor almost working, I need to add a new block to the main poll program – which I haven’t done in a long time. Working on 485pollC.pl, here’s what I had to do:
- Under SET UP ALL KNOWN SENSORS, create new block for the new sensor. Addr (here 11) must be unique. I think the small ints for etypes, errtypes, retypes should be unique, though that’s not true for sprinkler and water. It needs a unique string name (here th1 for 1st temp/humidity sensor), and a variable name for the function pointer, typically ${the name}ref.
$sensors{th1} = [11,5,$th1ref];
#$sentMsg{th1}=1; # don’t do dead check on me
$errs{th1}=[0,0,0];
$data{th1} = [0,0,0,0,0,0];
$etypes{th1}=24;
$errtypes{th1}=25;
$retypes{th1}=26;
$etypes is critical – that’s what’s sent to web server. Not sure if any others are used yet.
- Create a # SPECIFIC PROCESSING FOR yoursensor TESTING block, typically by copying another block and doing significant modifications reflecting the intimate details of the data provided by the new sensor. Here’s the block for th1:
#—————————————
# SPECIFIC PROCESSING FOR TEMP/HUMIDITY SENSOR 1
# this is the most up to date one 6/9/14 – use as template for others
$th1ref = sub {
$len=@locchars;
# 10=6 standard bytes plus 2 2-byte data values: temp, humidity
lenOcChk(10);#—-Pull out specific data for this sensor—-
$temp1lo=$locchars[4];
$temp1hi=$locchars[5];
$temp1=($temp1lo+256*$temp1hi)/10;
$huml1lo=$locchars[6];
$hum1hi=$locchars[7];
$hum1=($hum1lo+256*$hum1hi)/10;#—– Announcement to 485stdout using data from above ——
printf (“TH1 OK: $detStr $OCstr temp: %0.1f humid: %0.1f”,$temp1,$hum1);#——— LOG TH1 EVENTS using data from above ——–
# event type, time, temp, humidity
# ugh. Gotta keep these two strings in sync!
$logString=’$etypes{th1},$timesecs,$temp1,$hum1′;
logit(“$etypes{th1},$timesecs,$temp1,$hum1”);
}; # end th1 sub
lenOcCheck() must reflect this sensor’s total byte count including data and CRC. Up thru the printf() are sensor-specific, and should be pretty obvious. The $logString/logit({args}) are the critical part. There’s some other remnants in other sections, but this seems to be what’s needed.
Update 6/9/14: OK – seems to work with the new Arduino slave node.
Update 6/17/14: If I actually get temp/humidity info from a sensor or 2, I’d like to include tomorrow’s high temp (forecast) to make the WHF decision. That means I need a way to have the poll program interact with other scripts on the monitoring host (like doing a wget and parsing info). There might be other applications for interacting with other code running on the main processor, too – like modulating watering. So I need a general mechanism for that interaction. I suppose it could just be files, but that seems pretty crude.
<needs more…>
Software on the web host
The page is served by an http request to the top directory finding index.php. That file paints the structure of the status page, including the top nav buttons and sections for each of the reports – rain, temperature, power, etc. Index.php used to include graphs.php, which is where most of the work is done. Graphs.php reads and parses datafile.csv, checks the date range, and creates the .pngs for the graphs. But that took annoyingly long each time the page was called, so now the graphs are created right after new data is ftp’d up, by a wget on the Pogo to graphs.php. There’s some more here on that home page speedup.
The actual graphing is done by a php package called jpgraph. That’s pretty good, providing several graph styles, lots of control over appearance, auto scaling, etc. That package failed with a php upgrade when I moved my GoDaddy hosting to a newer server in mid 2019. Downgrading php (from 7.x to 5.4) allowed it to work again, but it took lots of hacks to keep from generating a multi-gigabyte error log. Seems stable now (6/19).
Configuration of the status page is ugly. It was originally accessible from the main page, but that was removed after bad guys did bad stuff. <to be continued>
The home page got hacked by some damn Pakistani hackers in March of 2013, probably through some php back doors I foolishly left visible. My bad. I changed all the passwords, restored the original code, and removed all my back doors. I think I also removed all the several back doors they put in to get back in. Ugh.
Some day, I hope to implement the code on the host as some kind of round-robin database so I don’t have to manually trim the dumb ascii file every so often. rrdtool is a likely candidate. (That’s still a goal/hope as of 6/19).
Other software
Partly out of curiosity about how much of the time my DSL connection was up, and partly to make the system more robust by resetting the modem and router when there are problems, another Perl program on the monitor machine runs a continuous ping to the first router at my ISP every 5 seconds. It keeps track of max, min and average round trip times (though they’re not displayed on the web page) as well as missed pings. After a number of consecutive missed pings (currently 20), it sets a flag (touch ./resetFlag) to have the main poll program send a command to one of the sensor (here, controller) nodes to power cycle the modem and router. There’s an additional delay after such a reset before it will try another reset. I suppose I should import the details about it from this post to have them all in one place. (More details below.)
That’s fine for detecting and fixing a failure resulting in not being able to access the Internet. But the dumb router – even the new one – apparently gets confused after running for a while (weeks) and stops serving addresses via DHCP. It seems to still pass traffic when in that confused state, so the Pogo, with its static IP address and others whose DHCP leases haven’t run out yet still work. But other devices – like my phone or a recently awakened laptop that request addresses on the fly – suddenly don’t work.
Fortunately, when the router is “confused”, it also fails to respond to http requests for its admin screens. That allows another pingish mechanism to detect confusion: If a wget to the router fails, the router is (probably?) confused and deserves to be reset. Code to do that started to be installed 12/16/11, but couldn’t actually trigger a reset. Working on this is a slow process – I have to wait a couple of weeks until the next incident to do more testing after it’s been reset.
A downside is that the router wget could possibly trigger a reset while something useful was going on. Might just have to live with that.
There have been so many router failures – almost all handled by the router resetter since mid 2017 – that I have a separate post here to record them. I’m sure there have been some that the robot has fixed that we never even knew about. Yay!
On the ping stuff: There’s pingstat.pl, running as a separate process since ~2001. It’s gotten fairly sophisticated, recording average, max, and min ping times as well as missed pings, and automatically adjusting a threshold above which ping times are flagged as extra late. It can report 2 cycles of stats – about every 5 minutes and about once/hr. It logs its exception findings to pingxfer.txt, and by touching sendit2 tells the main poll program to ftp that sparse ping stat file to pingstats.csv (which is not csv) on the web host. Graphs.php handles it from there.
Hardware details
The sensors are implemented on small PC boards with the PIC processor and 10MHz crystal, a voltage regulator, a TTL-RS422 driver chip, some LEDs, I/O connections as appropriate for the task at hand, and an RJ-45 jack to connect to the 485 network. There are several versions, depending on what they needed to do. Later versions changed the ICSP programming connector from 4
pins to 5 to work with a knock-off PICkit 2 programmer I started using a few years ago. The biggest sensor node controls the landscape watering, and has 8 relays and a larger PIC processor. Some have opto-isolated inputs to connect to water sensors on the floor or other devices at dangerous potentials. New types will be created as new needs arise.
I chose RS-485 for its noise-resistance for long runs. The network extends to the detatched garage (for the rain sensor), so long runs are part of the requirement. The 57.6 Kb/sec rate was a compromise between speed and reliability. There isn’t a lot of data moving around (yet), so that speed is quite sufficient. Using RJ-45 connectors and Cat 5 cable is dangerous in a place with Ethernet running on similar wiring. All the 485 net cabling is blue, and I work hard to not use that color for ethernet.
Since the half-duplex RS-485 uses only one pair (and a reference ground), I use the other 3 pairs in the cable to provide unregulated 12V DC to power the nodes and provide the ground reference. Here’s the pinout:
Updates and bug fixes
7/8/11: I made a minor fix to the software on the web host that serves the main status page, and realized I didn’t have any place to record details of that (kind of ugly) software. This is the first entry in a sort of change log, and has provided me the opportunity to put all the details of the system in one place.
Today’s fix was simple, but the background is a little convoluted. I wanted to get rid of a harmless but ugly error message that occurred when the page display stuff tried to pull router reset events out of the main datafile.csv if there were no events there. It wasn’t a problem before because there was always at least one “event”. That event was not an actual router reset, but a startup header line that is part of trying to make the data in datafile.csv be sort of self-describing by starting out with a pair of lines – one data, the other field descriptions. The poll program at home (now on the Pogo) posts one such pair of lines for each sensor each time the poll program restarts.
Unfortunately, that first-time-only line was also interpreted as a router reset, so the web page incorrectly reported a router reset each time the poll sofware was restarted. No problem – I can fix that! Not wanting to disrupt the (as yet unused) self-describing data mechanism by blocking that one entry, I hacked the code (in the poll software) for that one sensor’s handler to modify its initial data description entry (prepending a “#”) so it would not be recognized by the parser in the page display software. That worked to get rid of the bogus router reset report, but created the situation that there were often no entries at all for that data type. Thus when the page display iterated through it to report all the resets, the iteration failed – and thus the ugly error message.
So I just added a check to make sure there were events before I looped through them. All fixed. But as long as I’m making an entry for the web site software, I suppose I should put some info about it. I guess this will evolve into the migration of the little “Home Automation” page to this more proper home.
7/22/11: Main perl code crashed yesterday trying to send me a text that it timed out a sensor 5 times. Unfortunately, since when I ported it to the Pogo I couldn’t find SendMail.pm (NOT Sendmail.pm!) I’d ripped out the sendSMS() function, so when it tried to use it it crashed. Found SendMail.pm, copied it over, tested it and put it back into the main code. Also honored a note that the sensor timeout alert needed some kind of reset. Added a walk through all sensors to clear timeout count and clean %sentMsg as an overloaded feature of the “A” turn on alerts control.
The reason the main poll program couldn’t talk to the sensors was that somehow the serial port wasn’t working. It’s a USB-485 adapter, and when I unplugged it and plugged it back in it started working. (Probably restarted the main program, too.) No way to check for that yet, but I probably should. I tried restarting the perl program remotely, but that didn’t help. I think if I’d rebooted remotely it would have fixed it.
Update 7/28/11: The towels that live on the floor by the below-grade family room door to soak up rain leakage were found very wet and smelly yesterday. We don’t get water there much any more, and didn’t check them after the recent big rains. If the floor water/moisture sensors had been in place, we would have been alerted. It’s time to bump up the priority of getting those sensors in place.
The 3 parts required to do that are the actual floor water sensors, PIC nodes with opto-isolated inputs to keep the 485 net from getting ground-referenced when the floor is wet/flooded, and the software to run them.
I haven’t touched the code in a long time, so I guess I better attack that. I just downloaded the latest MPLAB IDE from Microchip, and I’ll use this opportunity to reacquaint myself with the structure of the PIC code and document it here, filling in that very lonely Node software heading above.
I played with water sensor boards a while back. The basic interlocking fingers is sound. An early version had the opto isolator on the sensor board. Wrong! That still puts wires directly connected to the 485 net on the ground in the water! The good board has nothing but real estate for water to provide some conductivity to turn on a transistor. Isolators are safely up on the PIC node. Looks like I laid it out for various transistor lead layouts.
I’ve had some boards with 3 channels of opto isoloation for a while now, and even have one all made up. That one will probably be the first production water sensor. The isolated water sensors need to turn on the LED in the opto, so they need power. There’s leads for a small battery pack – 2xAAA? – on the isolated side for that. I’ll probably spend 2 channels for 2 water sensors, leaving the last for serial from the UPS battery monitor running on a Teensy.
Update 12/5/15: After a router failure, I looked at the home page and saw old data and some php warning messages. Apparently, config.php had gotten corrupted. I hacked the main index.php to remove the option to modify the config file to turn on/off power outage details in hopes of avoiding future corruption. Unfortunately/insecurely, if you clicked the ‘submit’ button near the display/hide details button, you can see the POST URL to config.php. Anybody seeing that could try to inject garbage or worse. By removing that option, I hope to close that door. I changed it to a link to a separate page with the stats, so the info is still available. I think this is a good change.
Update 9/2/18: The Godaddy hosting file datafile.csv seemed to need to be truncated too frequently. I verified that it was still being restarted (with a “STARTED POLL APP” line) every 24 hours, but noticed the sump and power durations each had ~10 bytes of extraneous decimal data for every log line. I changed these 2 lines in 485pollD.pl:
> #$dur=(256*$locchars[6]+$locchars[5])/5;
> $dur=int((256*$locchars[6]+$locchars[5])/5+0.1);
> #$Pdur=(256*$locchars[10]+$locchars[9])/5;
> $Pdur=int((256*$locchars[10]+$locchars[9])/5+0.1);
I think this lost the last 1/5 sec resolution, but saving ~24 bytes out of each ~116 byte log sequence seemed worth it. Here’s old and new:
3,815,0,0,0,5004,15805.8000000007,0,0,1,10,7500.20000000001
15,815,1
11,816,32,77
7,814,79.7,82.4
5,814,0,3432
—–
3,65,1,3,0,1,3,0,0,1,0,0
15,65,0
11,66,32,77.9
7,126,79.7,81.5
5,127,0,0
I need to keep an eye on it to make sure that didn’t break anything.