Drive connected Win Apps in Docker

MissMissM (she/her)
6 min readApr 22, 2021

--

CC-BY-SA image borrowed from Wikipedia Container Ship

And perhaps contribute to your clients green goals too. I have had clients with IoT device vendors who often have sneaky Win32 only Application to engage the “factory” setup or white-label process or recovery on their devices and it is critical to retain the vendor support whilst optimising the workflow at scale.

Or even you could have the vendor itself wanting to semi-automate their factoryline to initialize devices that vendor has developed Win32 app you suddenly find yourself having to “plug it in” to your automation magic.

Or you could be just a regular home user wanting to drive some windows app automatically because you are just simply time poor or you really hate dialogs and popups and stuff or using the app itself ..

My example use-case and the associated problems heavily revolves around the connected IoT devices which may need a proprietary / legacy windows app but it’s applicable to quite few other things if you can’t use for various reasons the Windows UI automation stuff already out there or if you simple need to scale headless parallel prosessing by using containers without heavy headed instance overhead like I do.

Ideal isn’t always possible ..

Ideally ofcourse one could have the source or appropriate binaries or vendor would do the white labeling.

Real world often involves white-label and then some sort activation and if you can keep things generic pre-activation that would be the best but often there is a need for this and there are protocols like Broadband forum TR.64 to do just these kind of things but one cannot always avoid the bootloader friend.

The world is never perfect and we can’t expect it always be especially in transitional situations whilst addressing the risks.

.. but we can certainly help it be so

Many times I’ve quickly protyped something like this only for the vendor to suddenly “cave in” and to provide better solution which would not have happened without a push like this.

Not only that but the vendor relationship has usually benefited from increased collaboration from the initial distrust situation and at best it may help put your clients business objectives at the top on priority list.

One could even use the app for just plain extra validation if one is really concerned about some risk one identifies and then return it to the vendor without bothering the customer who would angrily return it and writing a rant in a review hurting the success especially on a early stage hardware startup with limited number of other reviews.

Finicky workflows and Apps

The clients in these cases can’t just flick the devices to some magic outsource hole as the workflows using these finicky App can be painful as well typically estimating 40 minutes per device each not taking into account the Q&A etc. from my experience.

Leased IoT device churn

It’s not only the setup but often IoT devices often are leased and then returned by customers, even a 2% return rate for re-use can be a lot considering if/when you have huge volumes to begin with when you’ve scaled up.

But it’s not as bad as it could be :)

And with today’s chip shortage this becomes even more important as these valuable devices can be in a short supply.

Numbers matter

Say if you have a client who say also white-labels the devices and we say save 35 minutes per device for 100k devices annually we will directly save ~32 FTE

(35 min * 100k) / 60 (hr) / 1840 (FTE p.a. typical)

As do the other numbers we sometimes forget

Not only that but from my experience clients who have gone the manual workflow have had greatly high return/failure rate due to lack of Q&A and human tendency to error in repeated especially boring tasks like this like the linked story outlines statistics for — which in turn affects brand and not to mention pressure to support costs.

IoT can be risky business

Not only that but sometimes one can lease these things and might return to base for adaptation which requires bootloader and engaging these devices can be risky remotely which will be it’s own article.

What not to do

Some might go reverse-engineer say the NAND flash/storage typically on these devices present juicy target but this would not only break the support but would probably eventually lead to an IoTcapolypse.

One could record the bootloader interaction what the app does but often there are variety of problems why this isn’t such a good idea outside the support issue such as there might be multiple versions of the physical device and each needs ROM.

Don’t mess with the magic sauce

There might be some “magic” logic going to generate few key/values or perhaps checksums and signatures the lot which may not break immediately but later with catastrophic consequences..

Like what may happen year 2038 Epochcalopyse with 32 bit signed integer used for some timestamps that’s been counting the seconds since the epoch.

IoT reliability also it’s own topic.

The IoT Loader Problems

Typical problems to be solved here with IoT “at factory” are:

  • Physical power cycle to engage the bootloader at specific time
  • Finicky “Sun spots are not aligned” App/Device
  • High human error rate w/ or lack of Q&A
  • Parallelisation problems e.g. with mgmt IPv4 192.168.1.1 collission

Let’s keep our client green

All we need is some Docker chops to be a bit little nicer to environment by being able to recycle the devices and Q&A the whole thing.

Parallelisation for scale

We also might want to parallelise this thing say we can have person load up 20 IoT devices at a time and just loading/unloading when the computer says so.

In the past I’ve just mapped “loading bays” to physical switchports to a VLAN bundle delivered over a tunnel etc. to a virtual instance(s) whatever suits in the client environment/use case.

Docker/Wine/Xvfb stack to the rescue

The ingredients we could use to tackle these kinds of problems:

  • Docker container say using Alpine Linux minimal image
  • Wine Win32 compatibility layer for *NIX etc.
  • X Virtual Frame buffer (Xvfb) — Common GUI framework for *NIX
  • Scripting language e.g. Perl — don’t hate me its just ~always everywhere :)
  • X11 “Driver/Screenshotter” e.g. CPAN X11::GUITest
  • WeMo Power AC Switch or anything you can drive off/on with some RPC
  • Optical Character Recognition (OCR) e.g. GNU OCR (gocr)
  • Image Matching e.g. CPAN Imager::Search
  • Human interface (debug, monitor & control)
  • Script that drives the Win32 App over Wine

To make it even better perhaps

We can even further improve this which I’ve typically done in the past

  • Record/Learning script instead of hardcoded
  • API to debug/monitor/control the related instance(s)
  • “Human interface” Front-End towards API

So how do all these components play together?

First each loading bay could be related/associated with it’s own

  • Power control (WeMo)
  • Docker container
  • sub-VLAN tag to separate overlapping IP’s to their own Virtual LAN tag

All these would be connected in a bundle to say a virtual machine running the container instances where the script could drive the Win32 App over Wine and X virtual framebuffer with the help of X driver and other associated things like the power cycle control with WeMo that we can submit SOAP calls to either turn it Off or On.

OCR/Image Match used for triggering

From the screenshots off the active Windows in the X virtual buffer the OCR and Image Matching or Recognition can be used to trigger anything on driving the app forward or dealing with error conditions.

It is important to cover all the error situations so there is less debug and restarting to reduce the toil on the human monitoring and control interface.

Record/Replay is ideal for client self-maintenace

The script ideally has a recording and replay functionality so say whenever evidently the vendor updates the proprietary Win32 application an operator at the client can just record the interaction with all the expected error conditions again.

The recording would use a combined OCR and Image recognition/matching as the GUI would typically never change under a virtual frame buffer compared to say literal “screen scraping” from a CRT monitor screen which introduces it’s own artefacts and like.

Human Interface can be API driven

Human interface (ideally behind API before Front-end) usually needs to:

  • Allow debugging (e.g. accessing screenshots/what is going on)
  • Recycle the ephemal container driving the instance(s)
  • Loading bay(s) and Q&A status for parallel operation monitoring

--

--