FL Tool description for innohub

This tool makes it possible for participants to run the Federated Learning experiments devised under Federated Learning Service. Hence, before setting up the tool, the Server should create an experiment using the Federated Learning Service. The Service is also where Clients can join existing experiments.

NOTICE

The Dockerized Federated learning tool was developed and tested on Linux. We recommend Linux for running this tool. When using Windows, some setup steps may differ and there might be additional steps (e.g., firewall rules).

Prerequisites

The Federated Learning Tool utilizes software containerization technology via Docker. Thus, a working setup of Docker on the machine where the Federated Learning experiments will be running is required, both for the FL server and for the FL clients. Here is a tutorial for setting up Docker.

Logging into the Docker registry:

docker login registry.ispatial.survey.ntua.gr -u YOUR_USERNAME

Downloading configuration

To proceed further, the orchestrator of the FL experiment must utilize a FL configuration file. It can be created and downloaded via the Federated Learning Service on the Innohub website.

Startup files

Both the FL server and the FL clients should download their respective local docker scripts. The download can be found under the FL service on Innohub, where the orchestrator created the experiment.

The zip consists of a compose.yml file which starts the actual FL server or client, and an .env file to connect Docker to the local environment.

Certificates

For security, the TLS certificate files are required to make the communication between the FL Server and FL clients encrypted. You can download them using the button below.

Environment setup

Fill in the values inside the .env file so Docker knows where the needed files are on your system. It is similar for the client and the server:

– The paths to the TLS certificates. (If you use the TLS certificates provided by Innohub, then the names of the files are the same as in the example below.)

– The server's IP address, with the port 9092 for the client, and 9093 for the server. (The server's IP address is shared with the clients at the FL Service webpage for all their experiments.)

– For the Clients: the path to the output directory and the path to the preprocessed training data.

– For the Server: the config.json you downloaded from Innohub.

Client side:

SERVER_ADDRESS=10.75.1.152:9092
CERT_FOLDER=../server/superlink-certificates
CERT_FILE=../superlink-certificates/ca.crt
DATA_FILE=./data/processed_data_VitalDB_client_0_with_50_patients.h5
OUTPUT_DIR=./output

Server side:

CERT_FOLDER=./superlink-certificates
CA_CERT_FILE=../superlink-certificates/ca.crt
SUPERLINK_CERT_FILE=../superlink-certificates/server.pem
SUPERLINK_KEY_FILE=../superlink-certificates/server.key
INNOHUB_CONFIG=./innohub-config.json
OUTPUT_FOLDER=./output
SERVER_ADDRESS=10.75.1.152:9093

Result folder

Docker needs to be able to copy the results into the output directory, so it needs temporary ownership of that directory:

sudo chown -R 49999:49999 /path/to/your/output/library

After the experiment is done, you can switch ownership back:

sudo chown -R user:user /path/to/your/output/library

Startup

Clients

Run in the directory where your compose.yml is located:

sudo docker compose -f compose.yml up

This starts the FL client and attempts to connect to the server.

Server

First establish the connection (same as the clients):

sudo docker compose -f compose.yml up

Afterwards specify and start the model training (replace MODEL with the model you are training):

sudo docker exec -it MODEL-fl-serverapp bash
./start.sh

By default, the FL Server aggregates and distributes the model; it does not train or save the model itself. If the Server also wants to participate in training and keep the trained model, it should run a local client as well (set up identically to other clients).

Results

While Federated Training is in progress, log messages are shown in the command line. Once finished, every client will find a .pth file (trained model) in its output directory, plus any additional metrics selected in the experiment configuration. These performance metrics are also sent to the server and available in its output directory.