Sunday, January 3, 2016

A StecaGrid firmware update thriller

For unknown reasons, I always feel this urge to update software to the latest version. I mean, it's generally fine for OS updates, or the programs that run under linux or Windows.... But when tweaking firmware on a phone, iPad or Wii, it's actually quite risky, because on these locked-down devices you cannot easily recover when something goes wrong.

But fully aware of these risks I still find myself staring at installation progress bars, praying they end successfully. And admittedly, they usually do.

Early in june I found this update for the StecaGrid 3600 on Photovoltaik forum. It was posted by someone who works at Steca's technical service, and the download link pointed to a hidden location at Steca's website, so I guess you could say it was a trustworthy source. They don't have any changelog, so I have no clue what has been improved or changed.

Firmware for the StecaGrid Coolcept/Coolcept-x inverter (this link always points to the latest version): http://www.steca.de/data/media/74/7448_0x0_StecaGrid_coolcept_software.zip

This zip contains several img firmware packages for the various Coolcept models, and each of these img files is actually a password-protected zip archive. Since the Steca User Software provides an option to load such a package and start the update process on the inverter they must have the zip password stored somewhere. And it's pretty easy to obtain this from their .NET assemblies, allowing you to check for yourself what's inside the img... which is mostly binary stuff.

In the table below I've listed the version numbers of the various packages as shown in Steca User Software. The 'installed' version is what was present on my particular inverter, and the 'latest' version is the version of latest firmware update (as of 1-1-2016).

PackageInstalled versionLatest version
STM32 BFAPI2.3.1
SYS FBL protected1.0.4
SYS APP2.19.02.27.0
SYS PAR4.4.64.4.10
SYS OEM StecaGrid 36001.1.8
SYS LGP5.0.177.0.1
PU FBL1.0.4
PU APP2.14.05.4.0
PU PAR4.0.05.3.0
ENS1 FBL1.0.4
ENS1 APP3.15.03.51.0
ENS1 PAR13.0.621.0.9
ENS2 FBL1.0.4
ENS2 APP3.15.03.51.0
ENS2 PAR13.0.621.0.9

So... it was 2:00 am when I loaded the firmware package in the StecaGrid User Software, and hit the 'Update' button.

The update process started immediately and it looked really promising.

But then, after about five minutes, the 6th and largest package stalled around 37% and the progress percentage kept jumping back and forth. I let it continue like that for 10 minutes before hitting that dreaded 'Cancel' button. The update process stopped (at least that worked seemingly gracefully) and the inverter rebooted without a problem.

I quickly checked the package versions on the inverter to find out that the first 5 packages had actually been updated to latest version. However, the ENS APP and ENS PAR packages were entirely gone, and there was only a ENS FBL protected 1.0.4. It was too late to worry about it, I was relieved the inverter was still alive and I just went to bed.

Early next morning I saw bright sunrays through the bedroom curtains, and I quickly checked my little energy dashboard to see whether the inverter was active..... which it was not.... 0 watt in bright sun.... panic mode engaged...

The inverter itself was indeed idle, its display dim. I started planning recovery scenario's: I'd call my supplier and Steca customerservice, and then hope either of them could fix the inverter without too much more downtime.

As a last resort I started the firmware update again. I watched every percent tick by, and it even passed the critical point of 37% in the 6th package where it had stalled the night before. But before I could cheer it entered an identical endless loop around 43%. Two thoughts crossed my mind:
(1) perhaps there is some storage capacity issue on my older inverter model
(2) if the moment of failure differs between attempts, this implies that the firmware package itself is probably totally fine and some randomness may be in play

Coincidentally, I had DebugView running in the background, which had captured all kind of debug output from the Steca's firmware update process:

State: Bootloading, ResponseCode: Ok, Progress: 42,639%, Timestamp: 72,754s, Interval: 0,390 
State: Bootloading, ResponseCode: Ok, Progress: 42,918%, Timestamp: 73,225s, Interval: 0,471 
State: Bootloading, ResponseCode: Ok, Progress: 43,198%, Timestamp: 73,643s, Interval: 0,418 
State: Bootloading, ResponseCode: Ok, Progress: 43,215%, Timestamp: 73,971s, Interval: 0,328 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 74,361s, Interval: 0,390 
State: Bootloading, ResponseCode: Ok, Progress: 43,774%, Timestamp: 74,735s, Interval: 0,374 
State: Bootloading, ResponseCode: NoResponse, Progress: 44,054%, Timestamp: 76,482s, Interval: 1,747 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 76,748s, Interval: 0,266 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 77,028s, Interval: 0,280 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 77,418s, Interval: 0,390 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 77,793s, Interval: 0,375 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 78,105s, Interval: 0,312 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 78,417s, Interval: 0,312 
(the last 2 lines repeat...)

So it seems as if there's one timeout (NoResponse), which in turn causes the entire remaining communication to go out-of-sync. I'd certainly consider this a bug in the serial communication implementation, because both ends of the line should be able to agree on the current (successful) progress after an unexpected event such as a timeout occurs.

Well, it didn't help me much further back then. After I had sent my findings in a mail to Steca customerservice, I felt miserable and defeated and went back to bed. But before that I had started the firmware update one last time, which progress I monitored on my iPad from bed.

Slowly it approached the moment-of-doom... it went passed 37%... it went passed 43% and slowly kept going. You can hardly imagine how I cheered at that progress bar to keep going. Or the intense feeling of relief when it actually completed all 7 packages with a happy 'Update successful' message at the end.

I went back upstairs and was just in time to see the inverter reboot and immediately kick off its power production engines at 2000 watt.

Well, I'm happily running the latest firmware now, but will surely pass for the next update.

Tuesday, October 13, 2015

Kamstrup meter software update

On September 29th around 1:45 at night, it seems that a software update has been performed on our Kamstrup smartmeter, triggered remotely.

The noticeable result is that the P1 telegrams have different (less) content. And since my monitoring script checks for an expected filesize, all telegrams were rejected and discarded that night. Fortunately only irrelevant/unused fields were removed, so after tweaking the expected filesize next morning everything worked just fine again.

Before:After:
/KMP5 KA6U001660000000

0-0:96.1.1(204B413655303031363630000000000000)
1-0:1.8.1(02576.028*kWh)
1-0:1.8.2(01115.025*kWh)
1-0:2.8.1(00453.811*kWh)
1-0:2.8.2(01129.910*kWh)
0-0:96.14.0(0001)
1-0:1.7.0(0000.19*kW)
1-0:2.7.0(0000.00*kW)
0-0:17.0.0(999*A)
0-0:96.3.10(1)
0-0:96.13.1()
0-0:96.13.0()
0-1:24.1.0(3)
0-1:96.1.0(3238303131303031333035000000000000)
0-1:24.3.0(150929010000)(08)(60)(1)(0-1:24.2.1)(m3)
(01850.034)
0-1:24.4.0(1)
!
/KMP5 KA6U001660000000

0-0:96.1.1(204B413655303031363630000000000000)
1-0:1.8.1(02577.074*kWh)
1-0:1.8.2(01115.202*kWh)
1-0:2.8.1(00453.811*kWh)
1-0:2.8.2(01137.015*kWh)
0-0:96.14.0(0002)
1-0:1.7.0(0000.00*kW)
1-0:2.7.0(0002.68*kW)
0-0:96.13.1()
0-0:96.13.0()
0-1:96.1.0(3238303131303031333035000000000000)
0-1:24.3.0(150929110000)(08)(60)(1)(0-1:24.2.1)(m3)
(01850.050)
!

This filesize validation was mostly in place to detect a partial telegram due to some glitch in our serial communication. Now it's time to turn this validation into something more robust: we don't want to reject valid and complete telegrams when they ever decide to remove some other obscure field, or even add a new field. Especially since there's such a clean format requirement available: the first line must start with a '/' and the last line must end with a '!', and these symbols are unused in the main body of the telegram.

So instead of the filesize check in mbsolget_p1.sh we could use something like this:

...
  #Report unexpected filesize
  if [ $filesize = $FSIZE ]; then
    log "Received successfully"
  else
    log "Warning: Unexpected filesize!"
  fi

  telegram=$(cat $WORKDIR/p1_temp.log)

  if [[ $(expr match "$telegram" '/.*!') > 0 ]]; then

    #Process $WORKDIR/p1_temp.log to p1_daily.tmp
    MESSAGE=`echo "$telegram" | awk -f $AWKSCR > $STORE`
...

Edit:
On october 14th around 11:50 I started seeing 'Unexpected filesize' warnings in the logs - the P1 telegram format change mentioned above had been reverted for some reason. The script kept happily parsing and processing the telegrams, which means that the above tweak has already been useful and works as intended. At the same time I'm unsure what's really going on; why does the telegram content change so frequently all of a sudden? I'll keep an eye the logs to see if and when it changes again.

Edit 2:
On december 11th around 8:35 the above-mentioned fields have been removed from the P1 telegram again. Also, I noticed a small error in the validation check; it should have used double brackets in order to allow the greater-than comparison. I corrected this in the code above.

Thursday, August 20, 2015

Setting up StecaGrid communication

So, this StecaGrid 3600 was a fair bit cheaper than normal, and as always cheapness comes at a price. It turns out that there exist different revisions of the inverter and I received an older version of it. And from what I understand the newer models have an ethernet interface which connects directly to your LAN, allowing you to navigate to a nice little dashboard or read raw xml data.

That was not the case with my model. I had already verified that it came with two RJ45 connections at the bottom, but unfortunately both are used for a funky RS485 serial connection. So the only way forward was to figure out how this serial connection works, and I started research before I even got my solar panels.

The manufacterer Steca Elektronik nicely provided me with the technical documentation of their RS485 interface and this looked really promising; it contained various commands to retrieve almost all metrics that were available via the inverter display itself... among which the daily and total yield values!

So after this inverter had been installed I couldn't wait to get started. But before even thinking about the data and documentation I first had to to get the serial connection to work, and by that I mean the actual physical wire that connects the inverter to a computer.


I sacrificed this old 20-meter UTP patch cable and cut off one of the RJ45 connectors, so I could connect the relevant wires to an RS485-USB adapter. I then plugged it into the computer which detected the USB serial port...

And at that point I got stuck.

I found several serial communication software tools, and tried sending some bytes to see if the inverter would send any response, which it did not. I also had the tools listen on the serial port for several days and log every single byte they received, and although there seemed to be spontaneous activity now and then, it mostly consisted of 0x00 bytes and a handful of other random bytes, garbage in proper programmer's terminology.

The technical documentation from Steca was at a higher protocol level than the serial protocol I was currently struggling with. They described command bytes that led to certain response values... perfect... but that command had to be wrapped with all kinds of control and checksum bytes; get one wrong and your command would simply be ignored.

On top of that, there were still more variables in play:
  • Did I attach the correct wires to the A (485+) and B (485-) pins of the adapter?
  • Which of the two RJ45 sockets on the inverter should I use?
  • In what position should I put the tiny line-termination switch at the bottom of the inverter?
  • Did I use the correct serial connection parameters?

I must admit I almost gave up here. Almost even resorted to pulse-counting on the S0 port of the kWh-meter, which turned out to be a trivial arduino sketch and circuit, but at the same time felt so dirty and disappointing.



And then... two weeks after I had connected the cable... on this german forum Photovoltaik I found a mention of the Steca User Software that also used this RS485 connection. After installation, it started its inverter detection on the serial port, and hallelujah, within 3 seconds it reported a "StecaGrid 3600". Communication established, arduino back in the box, the mission was back on track.


This program could update the inverter's firmware (more on that later), but also retrieve the actual panel power, voltage and frequency measurements, visualize the day curves from the past 30 days, and the yield totals from the past months and years.

Whatever this program could do, I wanted to do it myself and tightly focused on what I really needed, and prepared for automated readouts.


It was time to take the next tools from my toolbox: a serial traffic monitor which could capture all bytes on the serial port, and a decompiler for Steca's .NET assemblies. Time to reverse engineer all these data commands.


A final note on the variables that lead to successful communication with the StecaGrid:
  • It doesn't matter which of the two sockets you use since they're internally connected to each other; RS485 was designed to connect multiple devices in series over long distances.
  • The line-termination switch can be in either position, but I found that I get more 0x00 padding with the switch in ON (right-most) position, so I put to OFF.
  • UTP wire 1 (orange/white) is connected to data A (485+)
    UTP wire 2 (orange) is connected to data B (485-)
    UTP wire 8 (brown) is connected to ground
  • Connection parameters:
    Baud rate:38400
    Parity:none
    Data bits:8
    Stop bits:1

Saturday, July 11, 2015

The next chapter: solar energy

On may 13th we had our solar PV panels installed; 12 monocrystalline black panels with a total system size of 3720 watt-peak. The inverter is a StecaGrid 3600 Coolcept, a german brand, not commonly seen.

The array of panels nicely fits below the flat-plate thermal collector that came with the house and had already been located all the way at the top deliberately.

Our roof's azimuth is 146° (south-east) and its tilt 45°. There's no shadow, except for a short moment at the end of summer afternoons when the sun is just about to move to the front of the house. Then the chimney casts a shadow across the entire surface; we'll actually see this effect in our graphs later.


In the weeks before installation I had already been monitoring the values of the electricity meter, visualizing this in separate "power consumption" and "power generation" gauges, the latter - of course - had always shown 0 watt. As mentioned in an earlier post, this interpretation had worked fine until the moment the inverter was activated on may 13th.

Let's begin with the basic situation:



We use electricity in our home appliances, and their requested power is pulled from the grid. What we measure at the smart meter is the amount of energy flowing into the house and in this scenario this happens to be exactly equal to our energy consumption. So far so good.

Now when the inverter is activated and it starts pumping amps into our house, we suddenly get the following situation:



I remember seeing my virgin "power generation" gauge go up to 2500 watt, while the "power consumption" value dropped down to 0... Hmmm... In this scenario a portion of the generated power is directly consumed while the rest is pushed out into the power grid.

Now imagine - and this could be just several seconds later - that we turn on the microwave while a cloud blocks the sun:



Our generated power is no longer enough to fulfil the request and the remaining power is drawn from the grid.

It becomes clear that with our P1 telegram measurements we only measure the net energy that passes through the meter, and as we see, this could be in either direction throughout the day. A sad result is that we have lost all insight in our actual consumption (or generation, for that matter).

Fortunately our electricity meter does have separate counters for the two directions in which energy can flow through it, so we can at least distinguish those; let's call these export and import from now on.

Note that at any moment in time you can either import or export energy, but never both simultaneously. Now you may think that our P1 monitoring will always have one value at 0, but remember that we're actually measuring averages over 5 minute periods. And in one such period both import and export could have occurred.

We can toss our variables in a formula to see how they relate to each other:

      Import + Generation = Export + Consumption

This formula makes sense if we assume that we don't store any energy in batteries for later use: all the energy that comes into the house (either generated or imported) must have gone somewhere (either consumed or exported).

Since we're already measuring import and export periodically we only need the value of one additional variable for each period, and we would have a complete picture again. An obvious candidate is generation; both the inverter itself and a separate kWh meter are already keeping track of this.

We'll have to figure out a way to obtain this additional measurement automatically and get it at the right place in order to apply the formula. Well, this turned out to be a proper challenge, but fun adventure...

Sunday, July 5, 2015

Bash and Python to make it happen

Before anything else, I should give credits to Marco Bakker for sharing his scripts that got me started straight away. Within one day after I got my Raspberry Pi, I had it capture P1 telegrams, generate graphs and even upload data to external logging sites. In the weeks that followed I polished the script alot, and the result of that is what this post is about.

An overview of the most important changes:
  • Calculations are now based on the actual time between measurements. Even though the script is triggered every 5 minutes, the exact time between measurements can vary because of the fact that we may have to wait up to 10 seconds for the telegram to arrive. But it's also possible that the Raspberry Pi has had some downtime and resumes after an unknown period of time.
  • The script was getting so big that it was hard to navigate around, so I split it into smaller parts; one that holds all the configuration, one that contains the long rrdtool commands, another one that outputs data into various file formats, and a last one that contains interfaces to external websites.
  • Clean output, so we can capture it to a useful log.
  • Various readability, robustness and flow tweaks.

Also a little disclaimer; the original script provides support for several features that I don't use myself, and are therefore untested (but perhaps fully functional) in my version of the script:
  • Upload to Xively.
  • P1 dsmr v4 format. In principle, you could easily support any P1 format by adding an appropriate awk script.
  • Output to csv format.
  • Output to html page. The generated html is awfully retro, but I haven't bothered changing it.

This script is distributed under the GNU General Public License v3, and by downloading the script you agree to the terms of this license. Furthermore, usage is at your own risk and without guarantees.

mbsolget_svg  .tar.gz (10kb) .zip (12kb)


Optional dependencies: rrdtool, ncftp, sendemail, mysql

After extraction into its own directory, the only thing you need to do is edit config.sh to set P1PORT to the name of the serial device that connects to the P1 port, and also set the correct WORKDIR (and actually anything else you find in there - but let's keep it simple for now).

Finally, make the main script executable with: chmod +x mbsolget_p1.sh

You're ready to go, so give it a test run: ./mbsolget_p1.sh

No errors? Good, then you'll find a bunch of new files and directories. Note that most interesting stuff is configured to happen on every 5 minute interval of the hour.

The raw captured P1 telegrams are stored as log/p1-YYYYMMDD-hhmm.log
The extracted values of the last P1 telegram are stored in p1_gas.tmp and p1_value.tmp
All calculated statistics are written to debug.tmp, the meaning of which are in comments inside the main script. These statistics are used to build the json, csv, sql or xml data output.
In order to initialize your rrd storage file, you should run the script once with the 'create' argument: ./mbsolget_p1.sh create

When you have it configured the way you want, schedule the script to run automatically with crontab -e, and add the following line (but with the proper path of course):
*/5 * * * * /home/pi/path/to/mbsolget_p1.sh >> ~/log/mbsolget.log 2>&1

At this point - and for me this was early april - the data flow shown in last post is fully functional (except for the backup part, but this is just another cron job). I'll leave it to you to get creative with the json or xml that is pushed to your webserver every 5 minutes. You could even write a smartphone app around it.

On the right I've included a screenshot of the dashboard I created. It keeps itself up-to-date by requesting the json and graphs periodically via ajax requests.

Next chapter: solar PV panels!

Tuesday, June 23, 2015

Measurement frequency versus resolution

The idea is to use the monotonically increasing values of T1 and T2 to calculate the average power consumption over the period between two subsequent measurements.

Power [W] =
Tnow [Wh] - Tprevious [Wh]
tnow [s] - tprevious [s]
* 3600 [s/h]

...where T is the energy meter value, and t the timestamp of the measurement.

My initial plan was to capture every single telegram and process it, but after the following considerations I concluded this was really bad idea:
  • What do we do with the 8640 (!) data points per day? Store them locally? Push them over the network?
  • Will we be stressing the Raspberry Pi hardware and resources if we're processing every telegram. How about CPU load and temperature, and aren't we unnecessarily burning the limited writes of an sd-card?
  • I asked myself how real-time this monitoring really needed to be... What am I trying to achieve with it?
  • What would be the resolution of the measurements anyway? Let's see: our meter values are in Wh so the smallest difference we can measure is 1 Wh. Over a timespan of 10 seconds, this would be equivalent to 360 watt.... Wait... so we can only report power consumption in increments of 360 watt? That's entirely useless!
I settled for capturing the P1 telegram every 5 minutes and generating graphs every 15 minutes. The resulting measurements are pushed to external logging services pvoutput.org and mindergas.nl, and also to my own webserver to allow me to build some dashboard around it.


This picture pretty much covers the data flow and their frequencies.
Next, we'll check out the code that is running on the Raspberry Pi to make this all happen...

Saturday, June 20, 2015

Of Watts and Joules

In electrical applications we measure power in watts (W), and energy in watt-hours (Wh), both are derived SI units. It's sometimes hard to wrap your head around the meaning of these units and the correlation between them, and I especially find the word 'hour' in watt-hour very confusing.

A quick overview.

Power is the rate at which energy is generated or consumed, and as such is measured in 'energy per time unit'. The unit for energy is joule (J), which means we can express power in joules/second (J/s).

So what's the relation to this watt unit then?

1 W is defined as 1 J/s

Ok, that's easy. How about watt-hours...

Imagine you have a light-bulb with a power of 50 watt and leave it burning for 1 hour, that light-bulb consumed 50 watt-hours.

And we can express this in joules as well:

1 Wh = 1 W · 1 hour = 1 J/s · 3600 s = 3600 J

An important thing to realize is that watt-hour is a time-independent unit; it expresses an amount of energy. Note the similarity with 'lightyear', which is a distance and has nothing to do with time either.
It gets really confusing when you say device X used 50 watt-hours in 5 minutes, but it's a perfectly valid statement (and you would by now be able to deduct that device X's power consumption must be 600 watt, right?).

In my head I often use this analogy that watt is like the download speed in bytes/second, and watt-hour is like the amount of bytes you have transferred after a certain amount of time. Perhaps a bit odd comparison and I'm stuck forever with this mental image of the microwave downloading energy.

When this is all clear we're ready to go crazy at power and energy statistics.