🤖 Hosting and deployment

Integration language choice

The Reincubate engineering team primarily use Python. Consequently the most sophisticated sample client implementations are built in Python, although there’s no hard reason to use language. The API is used by clients running .NET / C#, PHP, Python, Java and JavaScript / Node.js.

Connectivity and datacentre selection

The ricloud API’s infrastructure is hosted by a number of providers in the US, EU and Canada, including Google Cloud Platform, Amazon Web Services and Microsoft Azure.

When consuming data from aschannel in production, the listener process will be responsible for listening to a firehose of data, and consequently requires a reliable, high bandwidth Internet connection of at least 100Mb/s. Slower connections may work but could drop data, as described in aschannel.

Clients likely to consume large amounts of data should use hosting partners for this part of their infrastructure, with tier 1 peering to ensure reliable, high-bandwidth access to these hosts. Similarly, this is advisable for clients using legacy ricloud 1.x.

Given requests to asapi and asmaster are typically low-bandwidth and low-volume, there are no particular requirements for hosting clients of these components.

High-availability strategies for aschannel

As only a single listener process may attach to aschannel at a time, clients needing high-availability in their own implementation might consider the Open Source keepalived project for high-availability.

Simultaneous stream consumption

A sensible approach involves a single aschannel listener in the client’s environment, functioning as a reverse-proxy to allow multiple services to connect and simultaneously consume the stream. This works extremely well, and can power a number of scenarios:

  • Allowing multiple developers or staging environments to connect simultaneously
  • Having a process to parse the stream and another to record the stream for later playback & debugging

Functionality to reverse-proxy streams in this way is built into web-servers and load-balancers such as nginx and haproxy. nginx can be configured to do this using ngx_stream_core_module. The following configuration should suffice:

worker_processes auto;

error_log /var/log/nginx/error.log info;

events {
    worker_connections  1024;
}

stream {
    upstream backend {
        hash $remote_addr consistent;

        server aschannel.reincubate.com:443;
    }

    server {
        listen 443;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
        proxy_pass backend;
    }
}

Using the sample client in production

For some use-cases, the ricloud-py sample client can be put straight into production, saving clients from building integrations.

When deployed in production in listener mode, the library will require a machine with storage to buffer received content until the client’s application retrieves and purges it.

Supported platforms

Only the default, basic install of Ubuntu Server 16.04 LTS with MySQL 5.7 is required. Once installed, the required packages can be added with the following command:

$ apt-get install python-pip libmysqlclient-dev mysql-server

Unsupported platforms which are known working

  • Most server and cloud Linux distros
  • macOS 10.12 with Brew’s MySQL distro (download)

Preparing the server for production

Ubuntu 16.04 has a number of defaults which may limit functionality of the listener in a production environment.

Maximum number of open files

By default the limit of open file descriptors is set to 171,614, and the maximum file descriptors per session is set to 1,024:

$ cat /proc/sys/fs/file-max
171614
$ ulimit -n
1024

In order to optimise receiving many files concurrently, the listener keeps a file stream open for each file being chunked down. Consequently, if many files are being streamed and chunked at the same time, it is likely the 1,024 limit will be exceeded.

The session limit can be lifted by creating a file named /etc/security/limits.d/ricloud-listener.conf with the following contents:

* hard nofile 10240
* soft nofile 10240

Once a new session has been started these new limits will take effect. Without having to restart the session, the limit can be changed with the following command:

$ ulimit -n 10240

Maximum MySQL packet size

By default the largest packet MySQL will handle on Ubuntu 16.04 is around 3.5MB:

mysql> show global variables like 'max_allowed%';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 33554432 |
+--------------------+----------+
1 row in set (0.11 sec)

Large feed bodies can overflow this size as they are inserted. A lot of granular HealthKit data entries can easily run up several hundred megabytes of JSON. This will be indicated by the following line in the MySQL log:

[Note] Aborted connection 277783 to db: 'ricloud' user: 'ricloud' host: 'localhost' (Got a packet bigger than 'max_allowed_packet' bytes)

The following error will be present in the ricloud.log:

2017-05-01 16:23:52,828 root CRITICAL <class '_mysql_exceptions.OperationalError'>: (2006, 'MySQL server has gone away')
OperationalError: (2006, 'MySQL server has gone away')

This can be resolved by resetting max_allowed_packet, which has a maximum size of 1GB. This can be changed at run-time with this:

SET GLOBAL max_allowed_packet=1073741824;

It can be changed permanently by configuring the server’s my.cnf file:

max_allowed_packet=1073741824

Note

GCP users can change this value using the Cloud SQL Flags in the GCP console. Homebrew users should run cp $(brew --prefix mysql)/support-files/my-default.cnf /usr/local/etc/my.cnf to get a local configuration file to work with, and then use mysql.server restart once modified.

Maximum MySQL InnoDB log file size

The MySQL InnoDB storage engine has a restriction whereby the size of items inserted into the table cannot be larger than 10% of the redo log size. The error this restriction can trigger looks like:

(1118, 'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.')

As the error suggests, the fix is to increase the size of the innodb_log_file_size setting to an appropriate value. The restrictions on the value of this setting can be found here.

Configuration as a service

Configuring the listener to run as a service on Ubuntu 16.04 is straightforward.

Add the following to a new file at /etc/systemd/system/ricloud-listener.service:

[Unit]
Description=ricloud listener
After=network-online.target

[Service]
WorkingDirectory=/var/www-hosts
ExecStart=/usr/bin/python -m ricloud --listen
Restart=always
KillSignal=SIGQUIT
Type=simple
StandardError=syslog
NotifyAccess=all
User=www-data
Group=www-data

[Install]
WantedBy=multi-user.target

Ensure that the /var/www-hosts/logs folder exists, and that it is writable by the www-data user. Of course, any given folder or user could be configured for this. Then, restart the system.d service and start the new ricloud-listener service.

$ sudo mkdir -p /var/www-hosts/logs
$ sudo chown -R www-data:www-data /var/www-hosts
$ sudo systemctl daemon-reload
$ sudo systemctl start ricloud-listener

Important

Clients should monitor and configure log-rotation for the listener’s log files. In the service description above, they will be stored in /var/www-hosts/logs/ricloud.log.

Firewalling the sample client

Inbound

  • No inbound Internet traffic access is required for normal operation.
  • When being used in listener mode a client’s setup may required inbound LAN traffic on TCP/3306 for the local MySQL instance, and other ports for any filesystem sharing.

Outbound

  • The ricloud library makes outbound requests on TCP/443 to the three hostnames in retrieving data: asmaster.reincubate.com, asapi.reincubate.com, and aschannel.reincubate.com.
  • Traffic to asmaster and asapi can be proxied safely if need be, but it is important to note the requirements around connectivity. Proxying aschannel is not recommended.

Using desktop components in production

Supported platforms

The white-labelled asrelay component runs on Windows Vista and newer (7, 8, 8.1, 10, etc.) and macOS 10.8 and above.

Code-signing

In order to create a trusted white-label build of the desktop component, code-signing certificates are be required. Microsoft publish information on these, and Apple provide certificates as part of their developer program.

Establishing browser trust

Browsers use different techniques for applying a basic level of protection around downloads. Modern browsers do not trust unknown files by default. Chrome shows a message on downloads stating “This type of file can harm your computer. Do you want to keep binary.exe anyway?”. Microsoft’s Edge browser throws up a “Windows Smartscreen” pane. Browsers use a number of different criteria when understanding whether to warn: they’ll look at the domain, the nature of the file, and above all whether they’ve seen it enough times before to understand whether it is safe.

If one was to download Reincubate’s consumer application no warning will be shown as it is well known by the browsers.

In the case of the asrelay client, it is not publicly distributed as-is: it gets white-labelled and signed with a client’s code-signing key. Browsers trust Reincubate’s code-signing key and domain, but they tend not to recognise asrelay as it is not pushed to the public.

When asrelay is customised with a client’s name and code-signing, it’ll be treated as a new application (it won’t have Reincubate’s certificate) and will start with no reputation, so it’ll be flagged by browsers on download. In order to become trusted, the new program needs to get known many of computers as soon as possible. Generally, when making a significant change to a trust factor one can get past those screens pretty rapidly with a few days’ worth of downloads and installs.