First steps with `fluke` API¶

This tutorial will guide you through the first steps with the fluke API. We will show how to quickly run an experiment using the API.

Try this tutorial:

Installation via pip¶

If you haven’t installed the package yet, you can do so by running the following command:

pip install fluke-fl

Loading and splitting the dataset¶

First of all, we need to load the dataset. Let say we want to load the MNIST dataset.

from fluke.data.datasets import Datasets
dataset = Datasets.get("mnist", path="./data")

dataset is a DataContainer which is a simple data structure containing the dataset as it is loaded from the files. The downloaded files are stored in the directory path. If the dataset is already downloaded, path can be set to the directory containing the files.

After loading the dataset, we need to prepare it for the distribution. For simplicity, let say that we use the test set provided by the dataset as the server-side test set (which will test the performance of the global model), and the training set as the client-side training set (which will be distributed to the clients). For now, we will use the default data distribution strategy, which is IID and client-side we do not have any test set.

from fluke.data import DataSplitter
splitter = DataSplitter(dataset=dataset,
                        distribution="iid")

A DataSplitter is the class responsible for splitting the dataset into the server-side and client-side datasets.

Setting up the evaluator¶

The evaluator is the class responsible for evaluating the performance of both the global and local models. It must be defined in the global setting of fluke as follows.

from fluke.evaluation import ClassificationEval
from fluke import FlukeENV

evaluator = ClassificationEval(eval_every=1, n_classes=dataset.num_classes)
FlukeENV().set_evaluator(evaluator)

Here we are using an evaluator for the classification task (to now the only one suppoerted). eval_every is the number of communication rounds after which the models are evaluated.

Instantiate and configure the federated learning algorithm¶

Now, we are ready to instantiate our algorithm. We will go with the standard FedAvg but many others are available on fluke.

Instantiating a federated learning algorithm requires to set a bunch of hyper-parameters. fluke divides these parameters into two groups:

client-side: the hyper-parameters of the clients which include the type of optimizer (and scheduler), learning rate, the number of local epochs, etc..
server-side: hyper-parameters of the server, which are typically less than the clients’ hyper-parameters, e.g., whether the aggregation is weighted or not.

In the following code, we will set the hyper-parameters of the clients using a DDict that is a convenient data structure defined in fluke. A simple dictionary can be used as well.

from fluke import DDict
client_hp = DDict(
    batch_size=10,
    local_epochs=5,
    loss="CrossEntropyLoss",
    optimizer=DDict(
      lr=0.01,
      momentum=0.9,
      weight_decay=0.0001),
    scheduler=DDict(
      gamma=1,
      step_size=1)
)

# we put together the hyperparameters for the algorithm
hyperparams = DDict(client=client_hp,
                    server=DDict(weighted=True),
                    model="MNIST_2NN")

As you may see, we need also to specify the model (i.e., the neural network) that will be used in the federated learning process. In this example, we will use the MNIST_2NN model which is a simple multi-layer perceptron with two hidden layers. The model is defined in the nets module of the fluke package.

Finally, we are all set to create the federated learning algorithm. The FedAvg class is the implementation of the Federated Averaging algorithm. The FedAvg class requires the following parameters:

from fluke.algorithms.fedavg import FedAVG
algorithm = FedAVG(100, splitter, hyperparams)

Before running the algorithm, we need to make sure to log the results. fluke is designed to allow different types of logging. For this reason, it implements the design pattern Observer. To attach a logger to the algorithm, we need to create an instance of the logger and attach it to the algorithm.

from fluke.utils.log import Log
logger = Log()
algorithm.set_callbacks(logger)

Log is a simple logger that logs the results in the console, while keeping the history of the results in a dictionaries.

Ready to go!¶

Finally, we can run the algorithm. The run method of the algorithm requires to specify the number of rounds and the fraction of clients that will participate in each round.

algorithm.run(2, 0.5)

First steps with fluke API¶