Android SDK API Reference¶
This document provides a comprehensive API reference for the NVIDIA FLARE Android SDK, enabling federated learning on Android devices using ExecuTorch.
Note
This API reference assumes familiarity with the mobile development guide and basic Android development concepts.
Overview¶
The Android SDK provides native Kotlin/Java libraries for implementing federated learning on Android devices. The SDK handles communication with FLARE servers, model training using ExecuTorch, and data management.
Key Components¶
AndroidFlareRunner: Main orchestrator for federated learning.
Connection: HTTP/HTTPS communication with FLARE servers.
ETTrainer: ExecuTorch-based model training.
DataSource: Interface for providing training data.
Dataset: Data interface for training examples.
AndroidFlareRunner¶
The main orchestrator for federated learning on Android devices. Handles job fetching, task execution, result reporting, component resolution, filtering, and event handling.
Constructor¶
AndroidFlareRunner(
context: AndroidContext,
connection: Connection,
jobName: String,
dataSource: DataSource,
deviceInfo: Map<String, String>,
userInfo: Map<String, String>,
jobTimeout: Float,
inFilters: List<Filter>? = null,
outFilters: List<Filter>? = null,
resolverRegistry: Map<String, Class<*>>? = null
)
Parameters¶
context: Android application context.connection: Connection instance for server communication.jobName: Name of the FL job to participate in.dataSource: Data source providing training data.deviceInfo: Device metadata (device_id,platform, etc.).userInfo: User metadata (user_id, etc.).jobTimeout: Timeout in seconds for job operations.inFilters: Optional input filters for data processing.outFilters: Optional output filters for result processing.resolverRegistry: Optional component resolver registry.
What is a Resolver?¶
A Resolver is a component that maps string identifiers to actual class implementations. In the context of FLARE’s edge SDK, resolvers are used to dynamically instantiate training components, filters, and other plugins based on configuration data received from the server.
For example, when the server sends a job configuration that specifies a trainer component, the resolver looks up the string identifier (like “ETTrainerExecutor”) and maps it to the actual class that should be instantiated. This allows for flexible, configuration-driven component loading without hardcoding specific implementations.
The resolverRegistry parameter allows you to register custom resolvers for your own components, enabling the system to dynamically load and instantiate them as needed.
Properties¶
val jobName: String
// The name of the federated learning job
Methods¶
run()¶
Starts the main federated learning loop. This method runs continuously until the job is complete or stopped.
fun run()
Usage:
lifecycleScope.launch {
flareRunner.run()
}
stop()¶
Stops the federated learning process and cleans up resources.
fun stop()
Usage:
override fun onDestroy() {
super.onDestroy()
flareRunner.stop()
}
Built-in Component Resolvers¶
The AndroidFlareRunner includes built-in resolvers for common components:
Executor.ETTrainerExecutor: ExecuTorch-based training executor.Trainer.DLTrainer: Deep learning trainer (mapped toETTrainerExecutor).Filter.NoOpFilter: No-operation filter.EventHandler.NoOpEventHandler: No-operation event handler.Batch.SimpleBatch: Simple batch processing.
Connection¶
Manages HTTP/HTTPS communication with FLARE servers. Handles authentication, certificate validation, and request/response processing.
Constructor¶
Connection(context: Context)
Parameters¶
context: Android application context
Properties¶
val hostname: MutableLiveData<String>
// Server hostname (observable)
val port: MutableLiveData<Int>
// Server port (observable)
val isValid: Boolean
// Whether the connection configuration is valid
fun getUserInfo(): Map<String, String>
// Get current user information
Methods¶
setCapabilities(capabilities)¶
Sets device capabilities for the connection.
fun setCapabilities(capabilities: Map<String, Any>)
Parameters:
* capabilities: Map of device capabilities.
setUserInfo(userInfo)¶
Sets user information for the connection.
fun setUserInfo(userInfo: Map<String, String>)
Parameters:
* userInfo: Map of user information.
setScheme(scheme)¶
Sets the HTTP scheme (http/https).
fun setScheme(scheme: String)
Parameters:
* scheme: "http" or "https".
setAllowSelfSignedCerts(allow)¶
Configures whether to allow self-signed certificates.
fun setAllowSelfSignedCerts(allow: Boolean)
Parameters:
* allow: true to allow self-signed certificates.
Warning
Allowing self-signed certificates creates security vulnerabilities. Only use in development or controlled environments.
getJob(jobName, deviceInfo, userInfo)¶
Requests a job from the server.
suspend fun getJob(
jobName: String,
deviceInfo: Map<String, String>,
userInfo: Map<String, String>
): JobResponse?
Parameters:
* jobName: Name of the job to request.
* deviceInfo: Device information.
* userInfo: User information.
Returns: JobResponse if successful, null otherwise.
getTask(jobId, taskName)¶
Requests a task from the server.
suspend fun getTask(
jobId: String,
taskName: String
): TaskResponse?
Parameters:
* jobId: Job identifier.
* taskName: Name of the task to request.
Returns: TaskResponse if successful, null otherwise.
reportResult(jobId, taskId, result)¶
Reports task results to the server.
suspend fun reportResult(
jobId: String,
taskId: String,
result: Map<String, Any>
): ResultResponse?
Parameters:
* jobId: Job identifier.
* taskId: Task identifier.
* result: Task execution results.
Returns: ResultResponse if successful, null otherwise.
ETTrainer¶
ExecuTorch-based trainer for on-device model training. Implements AutoCloseable for proper resource management.
Constructor¶
ETTrainer(
context: android.content.Context,
meta: Map<String, Any>,
dataset: Dataset? = null
)
Parameters¶
context: Android application context.meta: Model metadata.dataset: Optional dataset for training.
Methods¶
train(config, dataset, modelData)¶
Trains the model using the provided configuration and dataset.
@Throws(Exception::class)
fun train(
config: TrainingConfig,
dataset: Dataset,
modelData: ByteArray
): Map<String, Any>
Parameters:
* config: Training configuration.
* dataset: Training dataset.
* modelData: Model data in ExecuTorch format.
Returns: Training results including loss and predictions.
Throws: Exception if training fails.
Usage:
ETTrainer(context, meta, dataset).use { trainer ->
val result = trainer.train(config, dataset, modelData)
}
close()¶
Closes the trainer and releases resources.
override fun close()
DataSource Interface¶
Interface for providing training data to the FL system.
Interface Definition¶
interface DataSource {
fun getDataset(jobName: String, context: Context): Dataset
}
Methods¶
getDataset(jobName, context)¶
Retrieves a dataset for the specified job.
fun getDataset(jobName: String, context: Context): Dataset
Parameters:
* jobName: Name of the federated learning job.
* context: FLARE context.
Returns: Dataset instance for training.
Example Implementation:
class MyDataSource : DataSource {
override fun getDataset(jobName: String, context: Context): Dataset {
return when (jobName) {
"cifar10_job" -> CIFAR10Dataset(context)
"xor_job" -> XORDataset("train")
else -> throw IllegalArgumentException("Unknown job: $jobName")
}
}
}
Dataset Interface¶
Interface for providing training examples to the trainer.
Interface Definition¶
interface Dataset {
fun size(): Int
fun getBatch(batchSize: Int): List<Map<String, Any>>
}
Methods¶
size()¶
Returns the total number of examples in the dataset.
fun size(): Int
Returns: Number of examples.
getBatch(batchSize)¶
Retrieves a batch of training examples.
fun getBatch(batchSize: Int): List<Map<String, Any>>
Parameters:
* batchSize: Number of examples to return.
Returns: List of training examples.
Example Implementation:
class MyDataset : Dataset {
private val data = mutableListOf<Map<String, Any>>()
override fun size(): Int = data.size
override fun getBatch(batchSize: Int): List<Map<String, Any>> {
return data.shuffled().take(batchSize)
}
}
TrainingConfig¶
Configuration class for training parameters.
Properties¶
val localEpochs: Int
// Number of local training epochs
val localBatchSize: Int
// Batch size for local training
val localLearningRate: Float
// Learning rate for local training
val localMomentum: Float
// Momentum for local training
val inFilters: List<Filter>?
// Input filters
val outFilters: List<Filter>?
// Output filters
Usage Examples¶
Basic Setup¶
class MainActivity : AppCompatActivity() {
private lateinit var flareRunner: AndroidFlareRunner
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// Create connection
val connection = Connection(this)
connection.setScheme("https")
connection.setAllowSelfSignedCerts(false) // Use true for development only
// Create data source
val dataSource = MyDataSource()
// Create FlareRunner
flareRunner = AndroidFlareRunner(
context = this,
connection = connection,
jobName = "my_fl_job",
dataSource = dataSource,
deviceInfo = mapOf(
"device_id" to getDeviceId(),
"platform" to "android",
"app_version" to getAppVersion()
),
userInfo = mapOf("user_id" to getUserId()),
jobTimeout = 30.0f
)
// Start federated learning
lifecycleScope.launch {
flareRunner.run()
}
}
}
Custom Data Source¶
class CIFAR10DataSource : DataSource {
override fun getDataset(jobName: String, context: Context): Dataset {
return CIFAR10Dataset(context)
}
}
Custom Dataset¶
class XORDataset(private val split: String) : Dataset {
private val data = generateXORData()
override fun size(): Int = data.size
override fun getBatch(batchSize: Int): List<Map<String, Any>> {
return data.shuffled().take(batchSize)
}
private fun generateXORData(): List<Map<String, Any>> {
// Generate XOR training data
return listOf(
mapOf("input" to floatArrayOf(0f, 0f), "label" to 0f),
mapOf("input" to floatArrayOf(0f, 1f), "label" to 1f),
mapOf("input" to floatArrayOf(1f, 0f), "label" to 1f),
mapOf("input" to floatArrayOf(1f, 1f), "label" to 0f)
)
}
}
Error Handling¶
The Android SDK provides comprehensive error handling through exceptions and logging.
Common Exceptions¶
NVFlareError(com.nvidia.nvflare.sdk.core.NVFlareError): Custom base exception for FLARE-related errors.IOException(java.io.IOException): Standard Java exception for network communication errors.RuntimeException(java.lang.RuntimeException): Standard Java exception for general runtime errors.
Exception Hierarchy¶
The SDK uses a custom exception hierarchy where NVFlareError extends Exception and provides specific error types. In practice, the Android app primarily handles ServerRequestedStop specifically, while other errors are handled generically:
sealed class NVFlareError : Exception() {
// Network related
data class JobFetchFailed(override val message: String) : NVFlareError()
data class TaskFetchFailed(override val message: String) : NVFlareError()
data class InvalidRequest(override val message: String) : NVFlareError()
data class AuthError(override val message: String) : NVFlareError()
data class ServerError(override val message: String) : NVFlareError()
data class NetworkError(override val message: String) : NVFlareError()
// Training related
data class InvalidMetadata(override val message: String) : NVFlareError()
data class InvalidModelData(override val message: String) : NVFlareError()
data class TrainingFailed(override val message: String) : NVFlareError()
object ServerRequestedStop : NVFlareError()
}
Error Handling Best Practices¶
The Android SDK uses a simplified error handling approach that catches generic exceptions and provides specific handling for NVFlareError.ServerRequestedStop:
try {
val result = flareRunner.run()
} catch (e: Exception) {
Log.e("FLARE", "Training failed with error: $e")
// Check for specific NVFlareError types
if (e is NVFlareError.ServerRequestedStop) {
Log.i("FLARE", "Server requested stop")
// Gracefully stop training
} else {
// Handle other errors generically
Log.e("FLARE", "Error: ${e.message}")
}
}
Note
The Connection class does use more specific error handling, converting IOException to NVFlareError.NetworkError and throwing appropriate NVFlareError subtypes based on HTTP status codes. However, the main application code uses the simplified approach shown above.
Logging¶
The SDK uses Android’s standard logging system. Enable debug logging to see detailed information:
if (BuildConfig.DEBUG) {
Log.d("AndroidFlareRunner", "Starting federated learning")
}
Troubleshooting¶
Common Issues¶
Build Errors * Ensure all dependencies are properly linked. * Check ExecuTorch library compatibility. * Verify SDK files are correctly copied.
Runtime Errors * Check network connectivity. * Verify server configuration. * Review device logs for specific error messages.
Performance Issues * Monitor memory usage during training. * Optimize model architecture. * Adjust batch sizes and training parameters.
Certificate Errors * Use proper certificate validation in production. * Consider certificate pinning for enhanced security. * Test with self-signed certificates in development only.
Best Practices¶
Resource Management: Always use try-with-resources or
AutoCloseableforETTrainer.Error Handling: Implement comprehensive error handling and logging.
Security: Use proper certificate validation in production.
Performance: Monitor memory usage and optimize model size.
Testing: Test with various network conditions and device configurations.
For more information, see the mobile development guide and edge examples