Sunday, January 3, 2016

A StecaGrid firmware update thriller

For unknown reasons, I always feel this urge to update software to the latest version. I mean, it's generally fine for OS updates, or the programs that run under linux or Windows.... But when tweaking firmware on a phone, iPad or Wii, it's actually quite risky, because on these locked-down devices you cannot easily recover when something goes wrong.

But fully aware of these risks I still find myself staring at installation progress bars, praying they end successfully. And admittedly, they usually do.

Early in june I found this update for the StecaGrid 3600 on Photovoltaik forum. It was posted by someone who works at Steca's technical service, and the download link pointed to a hidden location at Steca's website, so I guess you could say it was a trustworthy source. They don't have any changelog, so I have no clue what has been improved or changed.

Firmware for the StecaGrid Coolcept/Coolcept-x inverter (this link always points to the latest version): http://www.steca.de/data/media/74/7448_0x0_StecaGrid_coolcept_software.zip

This zip contains several img firmware packages for the various Coolcept models, and each of these img files is actually a password-protected zip archive. Since the Steca User Software provides an option to load such a package and start the update process on the inverter they must have the zip password stored somewhere. And it's pretty easy to obtain this from their .NET assemblies, allowing you to check for yourself what's inside the img... which is mostly binary stuff.

In the table below I've listed the version numbers of the various packages as shown in Steca User Software. The 'installed' version is what was present on my particular inverter, and the 'latest' version is the version of latest firmware update (as of 1-1-2016).

PackageInstalled versionLatest version
STM32 BFAPI2.3.1
SYS FBL protected1.0.4
SYS APP2.19.02.27.0
SYS PAR4.4.64.4.10
SYS OEM StecaGrid 36001.1.8
SYS LGP5.0.177.0.1
PU FBL1.0.4
PU APP2.14.05.4.0
PU PAR4.0.05.3.0
ENS1 FBL1.0.4
ENS1 APP3.15.03.51.0
ENS1 PAR13.0.621.0.9
ENS2 FBL1.0.4
ENS2 APP3.15.03.51.0
ENS2 PAR13.0.621.0.9

So... it was 2:00 am when I loaded the firmware package in the StecaGrid User Software, and hit the 'Update' button.

The update process started immediately and it looked really promising.

But then, after about five minutes, the 6th and largest package stalled around 37% and the progress percentage kept jumping back and forth. I let it continue like that for 10 minutes before hitting that dreaded 'Cancel' button. The update process stopped (at least that worked seemingly gracefully) and the inverter rebooted without a problem.

I quickly checked the package versions on the inverter to find out that the first 5 packages had actually been updated to latest version. However, the ENS APP and ENS PAR packages were entirely gone, and there was only a ENS FBL protected 1.0.4. It was too late to worry about it, I was relieved the inverter was still alive and I just went to bed.

Early next morning I saw bright sunrays through the bedroom curtains, and I quickly checked my little energy dashboard to see whether the inverter was active..... which it was not.... 0 watt in bright sun.... panic mode engaged...

The inverter itself was indeed idle, its display dim. I started planning recovery scenario's: I'd call my supplier and Steca customerservice, and then hope either of them could fix the inverter without too much more downtime.

As a last resort I started the firmware update again. I watched every percent tick by, and it even passed the critical point of 37% in the 6th package where it had stalled the night before. But before I could cheer it entered an identical endless loop around 43%. Two thoughts crossed my mind:
(1) perhaps there is some storage capacity issue on my older inverter model
(2) if the moment of failure differs between attempts, this implies that the firmware package itself is probably totally fine and some randomness may be in play

Coincidentally, I had DebugView running in the background, which had captured all kind of debug output from the Steca's firmware update process:

State: Bootloading, ResponseCode: Ok, Progress: 42,639%, Timestamp: 72,754s, Interval: 0,390 
State: Bootloading, ResponseCode: Ok, Progress: 42,918%, Timestamp: 73,225s, Interval: 0,471 
State: Bootloading, ResponseCode: Ok, Progress: 43,198%, Timestamp: 73,643s, Interval: 0,418 
State: Bootloading, ResponseCode: Ok, Progress: 43,215%, Timestamp: 73,971s, Interval: 0,328 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 74,361s, Interval: 0,390 
State: Bootloading, ResponseCode: Ok, Progress: 43,774%, Timestamp: 74,735s, Interval: 0,374 
State: Bootloading, ResponseCode: NoResponse, Progress: 44,054%, Timestamp: 76,482s, Interval: 1,747 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 76,748s, Interval: 0,266 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 77,028s, Interval: 0,280 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 77,418s, Interval: 0,390 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 77,793s, Interval: 0,375 
State: Bootloading, ResponseCode: Ok, Progress: 43,495%, Timestamp: 78,105s, Interval: 0,312 
State: Bootloading, ResponseCode: ReceivedDataInvalid, Progress: 43,774%, Timestamp: 78,417s, Interval: 0,312 
(the last 2 lines repeat...)

So it seems as if there's one timeout (NoResponse), which in turn causes the entire remaining communication to go out-of-sync. I'd certainly consider this a bug in the serial communication implementation, because both ends of the line should be able to agree on the current (successful) progress after an unexpected event such as a timeout occurs.

Well, it didn't help me much further back then. After I had sent my findings in a mail to Steca customerservice, I felt miserable and defeated and went back to bed. But before that I had started the firmware update one last time, which progress I monitored on my iPad from bed.

Slowly it approached the moment-of-doom... it went passed 37%... it went passed 43% and slowly kept going. You can hardly imagine how I cheered at that progress bar to keep going. Or the intense feeling of relief when it actually completed all 7 packages with a happy 'Update successful' message at the end.

I went back upstairs and was just in time to see the inverter reboot and immediately kick off its power production engines at 2000 watt.

Well, I'm happily running the latest firmware now, but will surely pass for the next update.

2 comments:

  1. super story! Thanks, it covers at least part of my concerns. I think I'll call the installing company.

    ReplyDelete
  2. Thanks for the story. I have a 3600 model installed in 2014, but I can't figure out how you updated it... I have the software, but it looks like it wants to use a serial cable. Your post makes no reference to how you connected... could you add some detail for reference? I'm hoping by upgrading I get access to the measurements.xml file to enable the generation to be monitored... but I don't know if that's what this will do for me.

    ReplyDelete