Android SDK API Reference

This document provides a comprehensive API reference for the NVIDIA FLARE Android SDK, enabling federated learning on Android devices using ExecuTorch.

Note

This API reference assumes familiarity with the mobile development guide and basic Android development concepts.

Overview

The Android SDK provides native Kotlin/Java libraries for implementing federated learning on Android devices. The SDK handles communication with FLARE servers, model training using ExecuTorch, and data management.

Key Components

  • AndroidFlareRunner: Main orchestrator for federated learning.

  • Connection: HTTP/HTTPS communication with FLARE servers.

  • ETTrainer: ExecuTorch-based model training.

  • DataSource: Interface for providing training data.

  • Dataset: Data interface for training examples.

AndroidFlareRunner

The main orchestrator for federated learning on Android devices. Handles job fetching, task execution, result reporting, component resolution, filtering, and event handling.

Constructor

AndroidFlareRunner(
    context: AndroidContext,
    connection: Connection,
    jobName: String,
    dataSource: DataSource,
    deviceInfo: Map<String, String>,
    userInfo: Map<String, String>,
    jobTimeout: Float,
    inFilters: List<Filter>? = null,
    outFilters: List<Filter>? = null,
    resolverRegistry: Map<String, Class<*>>? = null
)

Parameters

  • context: Android application context.

  • connection: Connection instance for server communication.

  • jobName: Name of the FL job to participate in.

  • dataSource: Data source providing training data.

  • deviceInfo: Device metadata (device_id, platform, etc.).

  • userInfo: User metadata (user_id, etc.).

  • jobTimeout: Timeout in seconds for job operations.

  • inFilters: Optional input filters for data processing.

  • outFilters: Optional output filters for result processing.

  • resolverRegistry: Optional component resolver registry.

What is a Resolver?

A Resolver is a component that maps string identifiers to actual class implementations. In the context of FLARE’s edge SDK, resolvers are used to dynamically instantiate training components, filters, and other plugins based on configuration data received from the server.

For example, when the server sends a job configuration that specifies a trainer component, the resolver looks up the string identifier (like “ETTrainerExecutor”) and maps it to the actual class that should be instantiated. This allows for flexible, configuration-driven component loading without hardcoding specific implementations.

The resolverRegistry parameter allows you to register custom resolvers for your own components, enabling the system to dynamically load and instantiate them as needed.

Properties

val jobName: String
// The name of the federated learning job

Methods

run()

Starts the main federated learning loop. This method runs continuously until the job is complete or stopped.

fun run()

Usage:

lifecycleScope.launch {
    flareRunner.run()
}

stop()

Stops the federated learning process and cleans up resources.

fun stop()

Usage:

override fun onDestroy() {
    super.onDestroy()
    flareRunner.stop()
}

Built-in Component Resolvers

The AndroidFlareRunner includes built-in resolvers for common components:

  • Executor.ETTrainerExecutor: ExecuTorch-based training executor.

  • Trainer.DLTrainer: Deep learning trainer (mapped to ETTrainerExecutor).

  • Filter.NoOpFilter: No-operation filter.

  • EventHandler.NoOpEventHandler: No-operation event handler.

  • Batch.SimpleBatch: Simple batch processing.

Connection

Manages HTTP/HTTPS communication with FLARE servers. Handles authentication, certificate validation, and request/response processing.

Constructor

Connection(context: Context)

Parameters

  • context: Android application context

Properties

val hostname: MutableLiveData<String>
// Server hostname (observable)

val port: MutableLiveData<Int>
// Server port (observable)

val isValid: Boolean
// Whether the connection configuration is valid

fun getUserInfo(): Map<String, String>
// Get current user information

Methods

setCapabilities(capabilities)

Sets device capabilities for the connection.

fun setCapabilities(capabilities: Map<String, Any>)

Parameters: * capabilities: Map of device capabilities.

setUserInfo(userInfo)

Sets user information for the connection.

fun setUserInfo(userInfo: Map<String, String>)

Parameters: * userInfo: Map of user information.

setScheme(scheme)

Sets the HTTP scheme (http/https).

fun setScheme(scheme: String)

Parameters: * scheme: "http" or "https".

setAllowSelfSignedCerts(allow)

Configures whether to allow self-signed certificates.

fun setAllowSelfSignedCerts(allow: Boolean)

Parameters: * allow: true to allow self-signed certificates.

Warning

Allowing self-signed certificates creates security vulnerabilities. Only use in development or controlled environments.

getJob(jobName, deviceInfo, userInfo)

Requests a job from the server.

suspend fun getJob(
    jobName: String,
    deviceInfo: Map<String, String>,
    userInfo: Map<String, String>
): JobResponse?

Parameters: * jobName: Name of the job to request. * deviceInfo: Device information. * userInfo: User information.

Returns: JobResponse if successful, null otherwise.

getTask(jobId, taskName)

Requests a task from the server.

suspend fun getTask(
    jobId: String,
    taskName: String
): TaskResponse?

Parameters: * jobId: Job identifier. * taskName: Name of the task to request.

Returns: TaskResponse if successful, null otherwise.

reportResult(jobId, taskId, result)

Reports task results to the server.

suspend fun reportResult(
    jobId: String,
    taskId: String,
    result: Map<String, Any>
): ResultResponse?

Parameters: * jobId: Job identifier. * taskId: Task identifier. * result: Task execution results.

Returns: ResultResponse if successful, null otherwise.

ETTrainer

ExecuTorch-based trainer for on-device model training. Implements AutoCloseable for proper resource management.

Constructor

ETTrainer(
    context: android.content.Context,
    meta: Map<String, Any>,
    dataset: Dataset? = null
)

Parameters

  • context: Android application context.

  • meta: Model metadata.

  • dataset: Optional dataset for training.

Methods

train(config, dataset, modelData)

Trains the model using the provided configuration and dataset.

@Throws(Exception::class)
fun train(
    config: TrainingConfig,
    dataset: Dataset,
    modelData: ByteArray
): Map<String, Any>

Parameters: * config: Training configuration. * dataset: Training dataset. * modelData: Model data in ExecuTorch format.

Returns: Training results including loss and predictions.

Throws: Exception if training fails.

Usage:

ETTrainer(context, meta, dataset).use { trainer ->
    val result = trainer.train(config, dataset, modelData)
}

close()

Closes the trainer and releases resources.

override fun close()

DataSource Interface

Interface for providing training data to the FL system.

Interface Definition

interface DataSource {
    fun getDataset(jobName: String, context: Context): Dataset
}

Methods

getDataset(jobName, context)

Retrieves a dataset for the specified job.

fun getDataset(jobName: String, context: Context): Dataset

Parameters: * jobName: Name of the federated learning job. * context: FLARE context.

Returns: Dataset instance for training.

Example Implementation:

class MyDataSource : DataSource {
    override fun getDataset(jobName: String, context: Context): Dataset {
        return when (jobName) {
            "cifar10_job" -> CIFAR10Dataset(context)
            "xor_job" -> XORDataset("train")
            else -> throw IllegalArgumentException("Unknown job: $jobName")
        }
    }
}

Dataset Interface

Interface for providing training examples to the trainer.

Interface Definition

interface Dataset {
    fun size(): Int
    fun getBatch(batchSize: Int): List<Map<String, Any>>
}

Methods

size()

Returns the total number of examples in the dataset.

fun size(): Int

Returns: Number of examples.

getBatch(batchSize)

Retrieves a batch of training examples.

fun getBatch(batchSize: Int): List<Map<String, Any>>

Parameters: * batchSize: Number of examples to return.

Returns: List of training examples.

Example Implementation:

class MyDataset : Dataset {
    private val data = mutableListOf<Map<String, Any>>()

    override fun size(): Int = data.size

    override fun getBatch(batchSize: Int): List<Map<String, Any>> {
        return data.shuffled().take(batchSize)
    }
}

TrainingConfig

Configuration class for training parameters.

Properties

val localEpochs: Int
// Number of local training epochs

val localBatchSize: Int
// Batch size for local training

val localLearningRate: Float
// Learning rate for local training

val localMomentum: Float
// Momentum for local training

val inFilters: List<Filter>?
// Input filters

val outFilters: List<Filter>?
// Output filters

Usage Examples

Basic Setup

class MainActivity : AppCompatActivity() {
    private lateinit var flareRunner: AndroidFlareRunner

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        // Create connection
        val connection = Connection(this)
        connection.setScheme("https")
        connection.setAllowSelfSignedCerts(false) // Use true for development only

        // Create data source
        val dataSource = MyDataSource()

        // Create FlareRunner
        flareRunner = AndroidFlareRunner(
            context = this,
            connection = connection,
            jobName = "my_fl_job",
            dataSource = dataSource,
            deviceInfo = mapOf(
                "device_id" to getDeviceId(),
                "platform" to "android",
                "app_version" to getAppVersion()
            ),
            userInfo = mapOf("user_id" to getUserId()),
            jobTimeout = 30.0f
        )

        // Start federated learning
        lifecycleScope.launch {
            flareRunner.run()
        }
    }
}

Custom Data Source

class CIFAR10DataSource : DataSource {
    override fun getDataset(jobName: String, context: Context): Dataset {
        return CIFAR10Dataset(context)
    }
}

Custom Dataset

class XORDataset(private val split: String) : Dataset {
    private val data = generateXORData()

    override fun size(): Int = data.size

    override fun getBatch(batchSize: Int): List<Map<String, Any>> {
        return data.shuffled().take(batchSize)
    }

    private fun generateXORData(): List<Map<String, Any>> {
        // Generate XOR training data
        return listOf(
            mapOf("input" to floatArrayOf(0f, 0f), "label" to 0f),
            mapOf("input" to floatArrayOf(0f, 1f), "label" to 1f),
            mapOf("input" to floatArrayOf(1f, 0f), "label" to 1f),
            mapOf("input" to floatArrayOf(1f, 1f), "label" to 0f)
        )
    }
}

Error Handling

The Android SDK provides comprehensive error handling through exceptions and logging.

Common Exceptions

  • NVFlareError (com.nvidia.nvflare.sdk.core.NVFlareError): Custom base exception for FLARE-related errors.

  • IOException (java.io.IOException): Standard Java exception for network communication errors.

  • RuntimeException (java.lang.RuntimeException): Standard Java exception for general runtime errors.

Exception Hierarchy

The SDK uses a custom exception hierarchy where NVFlareError extends Exception and provides specific error types. In practice, the Android app primarily handles ServerRequestedStop specifically, while other errors are handled generically:

sealed class NVFlareError : Exception() {
    // Network related
    data class JobFetchFailed(override val message: String) : NVFlareError()
    data class TaskFetchFailed(override val message: String) : NVFlareError()
    data class InvalidRequest(override val message: String) : NVFlareError()
    data class AuthError(override val message: String) : NVFlareError()
    data class ServerError(override val message: String) : NVFlareError()
    data class NetworkError(override val message: String) : NVFlareError()

    // Training related
    data class InvalidMetadata(override val message: String) : NVFlareError()
    data class InvalidModelData(override val message: String) : NVFlareError()
    data class TrainingFailed(override val message: String) : NVFlareError()
    object ServerRequestedStop : NVFlareError()
}

Error Handling Best Practices

The Android SDK uses a simplified error handling approach that catches generic exceptions and provides specific handling for NVFlareError.ServerRequestedStop:

try {
    val result = flareRunner.run()
} catch (e: Exception) {
    Log.e("FLARE", "Training failed with error: $e")

    // Check for specific NVFlareError types
    if (e is NVFlareError.ServerRequestedStop) {
        Log.i("FLARE", "Server requested stop")
        // Gracefully stop training
    } else {
        // Handle other errors generically
        Log.e("FLARE", "Error: ${e.message}")
    }
}

Note

The Connection class does use more specific error handling, converting IOException to NVFlareError.NetworkError and throwing appropriate NVFlareError subtypes based on HTTP status codes. However, the main application code uses the simplified approach shown above.

Logging

The SDK uses Android’s standard logging system. Enable debug logging to see detailed information:

if (BuildConfig.DEBUG) {
    Log.d("AndroidFlareRunner", "Starting federated learning")
}

Troubleshooting

Common Issues

Build Errors * Ensure all dependencies are properly linked. * Check ExecuTorch library compatibility. * Verify SDK files are correctly copied.

Runtime Errors * Check network connectivity. * Verify server configuration. * Review device logs for specific error messages.

Performance Issues * Monitor memory usage during training. * Optimize model architecture. * Adjust batch sizes and training parameters.

Certificate Errors * Use proper certificate validation in production. * Consider certificate pinning for enhanced security. * Test with self-signed certificates in development only.

Best Practices

  • Resource Management: Always use try-with-resources or AutoCloseable for ETTrainer.

  • Error Handling: Implement comprehensive error handling and logging.

  • Security: Use proper certificate validation in production.

  • Performance: Monitor memory usage and optimize model size.

  • Testing: Test with various network conditions and device configurations.

For more information, see the mobile development guide and edge examples