Build once. Build everywhere.
Design, derisk, deploy, and deliver
cloud to edge.

Quick Start Guide

NOTE:
Code and features discussed here are currently only available to alpha customers.
Apply for access

Development Primer

FØCAL is a web service that allows complex image and video processing pipelines to be built, tested, and deployed on scalable compute resources. Several important performance and usability goals are realized in the FØCAL software architecture. Most importantly, the public API provides a way for third-party developers to build and deploy FØCAL pipelines to suit the needs of their specific applications.

This document describes basic workflows made available to third-party developers through FØCAL’s public API . Most of the API interactions discussed in this document are issued through a command line interface or client code. They can, however, also be performed manually through our Dashboard UI. Discussion of Dashboard features is outside the scope of this document.


Project organization

FØCAL maintains repositories for code, documentation, and issue tracking on Github.

Code

  • f0cal-bug – Project-wide issue tracking. Please submit everything here.
  • f0cal-cli – The command line interface. Critical for developers.
  • f0cal-spec – Machine-readable API documentation.
  • f0cal-client-py – Official Python client. Generated from f0cal-spec using fullmetal.
  • f0cal-sdk – Required to contribute service-side functionality. Apply for SDK access here.
  • fullmetal – Full-stack code generation for Python. This is what we used to expose the FØCAL API to the web.

Docs

Following emerging best-practices – see Swagger, RAML, and others – the FØCAL API is “self-documenting.” The XML instance describing the API can be found as f0cal-spec. Official client bindings are auto-generated from this XML instance using fullmetal.

Comprehensive, up-to-date, human-readable docs are best obtained using f0cal-cli.

$ f0cal doc --help

Web-based documentation is also provided on a per-repository basis.


Getting started

First, some terminology:

  • Pipeline – A series of processing operations through which data is extracted from images or frames of video.
  • Audit – The process of determining how well a given pipeline is doing its intended job.
  • Ground truth – The data with which an audit is performed. This data provides both pre-condition and post-condition information for assessing a pipeline’s fitness, and is typically assembled by human experts.
  • Bundle – A data structure for tracking the performance of one or more pipelines over time.
  • Client credentials – A unique string of characters that allows client code to encrypt and decrypt communication with the API.

CLI installation

The FØCAL command line interface (CLI) is the principal mechanism by which service-bound resources are managed. This includes but is not limited to:

  • Creating accounts
  • Initializing client credentials
  • Listing deployed resources
  • Starting and stopping pipelines
  • Connecting third-party services

The FØCAL CLI is written in Python, and requires both pip and git for installation. We recommend performing the install within a virtual environment.

$ pip install git+git://github.com/f0cal/f0cal-cli

Installing the CLI will also install Python client bindings, the utility of which will be discussed in upcoming sections.

Account setup

The first steps that you’ll take with FØCAL involve establishing the authentication tokens that are required to communicate with the API. These will require a valid email address. For the purposes of our examples, we’ll use foo@bar.com.

$ f0cal create user foo@bar.com # will prompt for password
$ f0cal confirm user foo@bar.com # will prompt for emailed confirmation code
$ f0cal create client
$ f0cal confirm client # will prompt for emailed confirmation code

Working with data

Computer vision systems are concerned with data of two important types:

  • Live data – These are transient data that come from sources external to FØCAL, such as web cams and submission of images to third-party applications.
  • Persisted data – These data exist in persistent storage, and are used for system training, testing, and audits.

The current section is concerned exclusively with persisted data. Working with live data will be discussed in upcoming sections, in the context of pipeline deployment.

Data stores

With so many popular options available for cloud-based image and video storage, FØCAL does not provide its own persistence mechanism. Instead, we integrate Dropbox, Google Drive, Flickr, and other third-party storage services. The following CLI examples illustrate how to add an external data store to your account:

$ f0cal create datastore MyDropbox --service=dropbox --service-id=foo@bar.com
$ f0cal create datastore MyDrive --service=google-drive --service-id=foo@bar.com

Data sets

A data set is a collection of persisted images or videos that a user curates for system training or testings purposes. Data sets can be assembled from content persisted on multiple data stores.

$ f0cal create dataset InputData --from-store=MyDropbox/some_folder
$ f0cal view dataset InputData
$ f0cal update dataset InputData --from-store=MyDrive/another_folder --watch
$ f0cal update dataset InputData --use-gui
$ f0cal check dataset InputData

The check command deserves a special mention. Since FØCAL relies on external services to persist your image data, we need to have some guarantee that your images aren’t being modified without FØCAL’s knowledge. When FØCAL imports new data, it generates a unique hash from the bytes of each imported image or video. This provides certain guarantees about the data to future usage scenarios, namely that the data are exactly the same, bit-for-bit, as when they were imported.


Working with pipelines

Pipelines are the fundamental units of data processing in the FØCAL architecture. They encapsulate a series of operations that is designed by the author to extract information of interest from ingested imagery. The process of designing a pipeline to extract a specific desired information is beyond the scope of this document and will be discussed in future tutorials.

Pipelines can be created from other, pre-existing pipelines.

$ f0cal create pipeline FirstPipeline --from-url=git+https://github.com/f0cal/first-pipeline

Authoring pipelines

Pipelines can also be authored “from scratch.”

One of the core design features of the FØCAL architecture is that even the most complex pipelines can be fully serialized to a single, readable document. All the ingredients necessary to exactly reproduce a particular analytic are able to be captured in a way that makes easy to copy, modify, and reuse. The serialization format that FØCAL relies on is XML.

Pipelines can be authored in a number of ways:

  • By hand – This option involves working directly with XML syntax in your favorite editor. The resultant instance document should conform to our pipeline XML schema.
$ emacs first_pipeline.xml
$ f0cal create pipeline FirstPipeline --from-file=first_pipeline.xml
  • Dashboard GUI – FØCAL provides a graphical user interface for users who prefer interactive feedback about pipeline correctness.
  • Client bindings – Official and third-party bindings allow the pipeline XML instance to be authored more intuitively, and without special knowledge of XML syntax. See next section for examples.

Python client bindings

Running pipelines

Pipelines can accept persisted data as input.

$ f0cal create dataset OutputData --store=MyDropbox
$ f0cal exec pipeline FirstPipeline --input=InputData --output=OutputData

Deploying pipelines

Deployment is the process by which a pipeline is readied for live data ingest.

$ f0cal exec pipeline FirstPipeline --live-input

The --live-input directive causes the CLI to return a reference to the live pipeline in the form of a URL. This URL is globally accessible and accepts input data via standard HTTP or web sockets. Only authorized FØCAL clients are permitted to communicate with a deployed pipeline.

$ f0cal exec pipeline FirstPipeline --live-input > deployed.url
$ cat deployed.url
http://api.f0cal.com/pipeline/3d6d6b02-d4ef-11e5-afe4-af7e2a48f820
$ curl `cat deployed.url` --data-binary @some_image.jpg

Bundling up

FØCAL provides bundles for tracking system performance across structural differences in constituent pipelines, structural changes to individual pipelines, and modifications to training data.

Bundles are important for two reasons:

  • Comparison – Several structurally different pipelines can be imagined to solve any give analysis problem. But which one is the best for your unique input data? Bundles were developed to provide an apples-to-apples comparisons across different pipelines.
  • Debugging – Computer vision systems rely heavily on numerical algorithms. Bugs in numerical code are pernicious in ways that other software bugs are not. They tend to reveal themselves not with obvious stack traces, but instead in subtle changes to system performance. A best-practice for dealing with numerical bugs is to create a kind of regression suite based on ground truth data. Bundles solve this problem.

Basic operations on bundles include creation, adding data sets, adding pipelines, performing audits, and reviewing results.

$ f0cal create bundle MyBundle
$ f0cal update bundle MyBundle --add-dataset=InputData --add-dataset=OutputData
$ f0cal update bundle MyBundle --add-pipeline=FirstPipeline

Performing audits

Audits are a necessary evil. Machine learning systems – of which computer vision systems are a subset – can’t be trusted to generalize well. It is highly likely that (1) deployed systems will encounter input data that are qualitatively different from those that they were designed to handle, and that (2) new error modes will appear as a result. Though latent, these novel error modes will be detrimental to system performance. The only way to identify them is to compare known inputs to desired outputs in a regimented fashion. FØCAL audits support precisely this process.

The first step in performing an audit is establishing a ground truth data set, or a data set that maps known inputs to desired outputs. There are a number of ways to establish ground truth data, and many features of the FØCAL architecture are designed to expedite the process. Discussion of these is beyond the scope of this document.

$ f0cal update bundle MyBundle --add-dataset=GroundTruth
$ f0cal update bundle MyBundle --audit FirstPipeline GroundTruth
$ f0cal get bundle MyBundle --show-performance

Git access

FØCAL data sets, pipelines, and bundles are all backed by Git repositories. State on these data structures can be updated by modifying the file structure within the corresponding Git repositories and executing a git push.

$ f0cal create dataset NewDataset --clone
Cloning into 'foo@bar.com-NewDataset'...
$ f0cal create pipeline NewPipeline
$ f0cal retrieve pipeline NewPipeline --clone
Cloning into 'foo@bar.com-NewPipeline'...
$ cd foo@bar.com-NewPipeline
$ emacs pipeline.xml
$ git commit -am "Some relevant commit message..."
$ git push origin master

As with any revision-controlled data structures, FØCAL data sets, pipelines, and bundles can all be branched. The FØCAL API supports referencing these data structures by branch name and commit hash. The FØCAL git service includes a number of hooks that will prevent malformed data structures from being submitted.

Discussion of direct manipulation of FØCAL data structures contained in the git repositories is beyond the scope of this document. Please see the relevant technical documentation for:


Conclusion

The FØCAL API, and specifically its client CLI, supports a number of important workflows that can be leveraged quickly and easily by third-party developers. These workflows allow the developer to build, test, and deploy FØCAL resources on the cloud in support of their specific applications. Data sources, data sets, pipelines, bundles, and audits and their utility were all discussed in detail. Demonstrations were made of realistic API interactions throughout. Developers familiar with similar tools should be fully empowered to start using FØCAL productively.

Start here