This is a technical post separated into two parts. One readable, the other not (to most).

I will start with an introduction to Sapiengraph CCTV, and how we, as a company, sought out to add software smarts into a camera.

In the second part, I will be copy-pasting the README file that Bach, our CTO, wrote in building the product.

Part 1: CCTVs suck

We started with a boatload of data and computer vision technology. We ended up with a camera because we needed to sell something. It is far easier to sell a physical thing than digital bytes in the cloud.

Every business needs a CCTV. But honest to goodness, CCTVs are terrible devices. The "bleeding edge" Chinese devices from Dahua or Hikvision are antiquated. Let's walk you through why.

The DVR - Digital Video Recorder

Do you know that CCTVs require a dedicated hardware unit to record videos from a CCTV network? It is called the DVR. Most small businesses settle with one that records no more than three weeks tops. Also, most small businesses keep the DVR in the store because a DVR offsite requires a whole load more money. (I will explain why later)

  • If your DVR spoils, it will not alert you.
  • A thief that understands CCTV networks will walk into a store brazenly, look for the DVR and take it along with his loot with him. And bam, the owner will not have any recordings.
  • If the hard disk in the DVR spoils, it will not alert you.
  • If the power is down, nothing is recorded

The DVR is the bottleneck of CCTV networks.

With Sapiengraph CCTV, we aspired to do it better. We wanted

  • lifetime cloud backup storage
  • Battery redundancy for cameras, so power trips are no issue
  • And most importantly, SMS notifications to business owners when the CCTV ceases recording

The IP Camera

What if you want to view your cameras remotely? Here come IP cameras. IP Cameras connect to the cloud. To connect to the cloud, cameras require the internet. To use the internet, every store needs an internet connection, powered by a router. So, to access your camera OUTSIDE of the shop's wifi network, you have to bypass the internet network thing called "NAT". So then the camera can be exposed to the internet to allow your smartphone to view the CCTV footage remotely.

And to accomplish that:

  • the business owner has to do port forwarding for each CCTV in the shop
  • If the IP address of the shop changes, the business owner has to apply dynamic DNS.

Sounds like a whole lot of technical mambo jumbo, right? That is because it is. It is done this way because doing it the "correct way" will cost a whole lot more. And that was to build up a "cloud network" of CCTVs.

We will manage the cloud. But it is super expensive. Imagine a thousand cameras streaming at 1080p resolution 24/7, 365 days a year, and having another hundred business owners live-streaming it. The cost of the bandwidth will be murderous. But we could have figured it out given my background in running a VPN service.

Part 2: Hacking a Xiaomi CCTV (Background & Motivation)

Warning: This part is technical.

Hardware & Ecosystem

  • Wyze is an inexpensive IP camera with many desirable features: PTZ, high resolution, IR night vision.
    It is also hackable.
  • Xiaomi-Dafang-Hacks: is an open source firmware for IP camera based on the Ingenic T10 or T20 board.
    Out of the box, it supports a few models.
    However it also contains a wealth documentation about porting to other camera based on the same board or how to compile other softwares.
  • Ingenic is a SoC (system on chip) made for IP camera.
    It runs Linux and the SDK is available online.
    The original source of the SDK is questionable.
  • RTSP is a popular protocol for realtime video streaming.
    It is content agnostic.
    This means the protocol itself does not dictate anything about the video format.
    It can even be used to send and receive random binary.
    However, it provides mechanism for client to discover metadata about the stream: codec, bitrate, dimension...
  • v4l2rtpserver is an open source RTMP server that serves video from a Video4Linux device (/dev/video<n>).
  • live555 provides open source library and softwares related to RTMP.
    The proxyServer can be used to forward a stream.

Some background on the video mess ecosystem

Strictly speaking, file extensions such as mp4, mkv, flv... are not to denote video format.
They are to denote "container format".
A video file can be grossly simplified as a stream of images and sound.
The stream of images is encoded using a codec such as h264, VP8...
The stream of sound is encoded using a codec such as mp3, aac, ogg...
Together, they are put inside a container which provides other things like:

  • Metadata: what codec is used
  • Subtitle
  • Alternate audio streams for other languages
  • Font

The problem with that is: one cannot know whether a video file is playable before actually reading the container and examine the contained streams.
This leads to weird API such as this: canPlayType where return value can be probably, maybe and empty string.

Thus, when talking about video especially about conversion, it is important to differentiate between container, video codec and audio codec.


ffmpeg is a suite of tools and libraries to work with audio and video.
It is used almost every conversion software both mobile and desktop.
Most of them either use ffmpeg in library form or call ffmpeg executable in the background.


We need to quickly develop a service that provides the following:

  • An IP cam that works out of the box when plugged in.
  • Its live stream can be viewed remotely.
  • Past footages are automatically backed up in cloud storage such as S3.

All the pieces are present, most of the work is glueing them together.
Due to the time constraint, some solution will not be optimal.




A camera is a Linux system running v4l2rtspserver which exposes video over RTMP.
This is a server that listens for connection.
Normally it is not accessible out of the local network.
Thus, we will use a classic trick: SSH Tunnel.
dropbear is a lightweight ssh implementation usually used in embedded system instead of the more popular openssh.

A tunnel will be openened from the camera to our backend, exposing the RTMP server.
For each camera, a live555ProxyServer is spawned to multiplex its stream to multiple clients.
While the v4l2rtspserver can handle multiple connections, we do not want to strain the limited resource of the camera.

Multiple video processors can connect to a proxy server to consume the live stream.
Currently, the only processor we need to develop is an archiver.
It transcode and downsample the live stream while at the same time, segmenting the stream into small video files and upload them to an S3 compatible storage.

For end-user viewing, we provide another live555ProxyServer that aggregates the stream from multiple live555ProxyServer-s belonging to a single user.
It exposes all streams under a single public domain (e.g: rtsp:// and behind a single username-password pair.




At boot time, init (from busybox) will spawn v4l2rtspserver-master and dropbear (more precisely dbclient).
They will be supervised and restarted when crashed.

Sidenote: it is not clear where the suffix -master comes from in the custom firmware.
There is no seconary or slave component.
It may be a reference to the master branch which it was built from which is an useless piece of information since master is not a stable reference.


  1. dropbear connects to opensshd using a public and private key pair
  2. opensshd verifies the keypair and spawn a tunnel.
  3. Through ForceCommand, a program called sapiengraph-cctv-mon is also launched.
  4. sapiengraph-cctv-mon requests k8s to spawn live555ProxyServer and various processors (currently, only sapiengraph-cctv-archiver).
  5. The processors connect to live555ProxyServer which in turn connects through the tunnel to the v4l2rtspserver-master inside the camera.


  1. Connection from dropbear is dropped
  2. opensshd destroys the tunnel
  3. At the same time, it will also send SIGHUP to the ForcedCommand which is sapiengraph-cctv-mon.
  4. sapiengraph-cctv-mon requests k8s to destroy all resources related to the CCTV before terminating itself.


Except for the custom firmware, server-side components are deployed in DO k8s.
Whenever terms such as Deployment, Pod, Service or Container are mentioned, it is implicitly understood to be k8s concepts.

Custom firmware

Based on Xiaomi-Dafang-Hacks.
More details can be found here.


Contains an installation of opensshd configured to launch the previously mentioned ForcedCommand.
More details can be found here.


Based on proxyServer, with some small modifications.
More details can be found here.


This is a processor that transcode and segment the video stream into multiple files then upload them to S3-compatible storage.
More details can be found here.