Abstract:
This thesis investigates federated learning (FL) techniques for generating synthetic computed tomography (sCT) images from magnetic resonance imaging (MRI) data, aiming to support MRI-only workflows in radiotherapy planning. Traditionally, CT scans are necessary for calculating radiation doses, but these scans expose patients to additional ionizing radiation.An MRI-only approach offers superior soft tissue contrast, reduced radiation exposure, and streamlined clinical workflows. To achieve this, sCT images must be generated from MRI to provide CT-equivalent information safely and robustly.Centralized deep learning (DL) has shown promise for sCT generation, but clinical adoption is limited due to the scarcity of large, diverse datasets, as data sharing across institutionsraises privacy concerns. FL mitigates this by enabling collaborative model training across institutions without centralizing data, thus utilizing diverse datasets while preserving patient privacy.The primary research question is: How can federated model aggregation be optimized for performance and computational efficiency in MRI-to-sCT translation? The first hypothesis suggests that due to the complexity of medical image translation and inter-institutional data heterogeneity, more advanced aggregation strategies may be necessary for robust generalization.This study leverages MRI and CT datasets from multiple institutions with variations in imaging protocols, scanners, and patient demographics to simulate realistic clinical diversity.A robust pre-processing pipeline standardizes data, aligning image dimensions, intensity ranges, and anatomical landmarks to reduce inter-institutional variability and support model convergence. Multiple FL aggregation strategies—FedAvg, FedMedian, FedTrimmedAvg,FedAvgM, optimization-based methods, FedBN, and FedProx—were benchmarked for their ability to manage non-uniform data distributions and improve model generalization.Importantly, data remains on each client, and the global model’s generalization is tested onunseen data from an external institution. Model quality was evaluated using key performance metrics: masked mean absolute error, peak signal-to-noise ratio, and structural similarity index.The findings show that (i) FedAvg exceeded performance expectations by outperforming more complex tsrategies, (ii) FedMedian’s simplistic outlier filtering led to information loss, (iii) FedTrimmedAvg ranked between FedAvg and FedMedian, (iv) FedAvgM provided enhanced stability but has slower convergence, and (v) optimization-based strategies were instable and outperformed by simpler methods. The combination of FedAvg with FedProx and FedBN produced the best results, achieving a median masked mean absolute error of 96 HU on 23 unseen test patients.Contrary to the initial hypothesis, simpler aggregation strategies outperformed more complex methods for MRI-to-sCT translation. This may be attributed to the extensive pre-processing pipeline, which effectively reduced data heterogeneity, allowing FedAvg to perform well.These findings underscore FL’s potential for enabling MRI-only radiotherapy by facilitating sCT generation across decentralized datasets, preserving privacy while maintaining model performance. By demonstrating effective data harmonization and adaptable FL strategies in a multi-institutional setting, this work contributes to developing secure, generalizable DL applications in medical imaging, paving the way for broader clinical implementation.While this study demonstrates the feasibility of FL for privacy-preserving sCT generation, the main goal was to provide a first comprehensive benchmark analysis of various aggregation strategies for this task. Future work should explore additional pre-processing techniques and further refine FL approaches, such as combining FedAvg with emerging strategies like FedDG and FedCE. Moreover, practical challenges, including privacy-preserving aggregation, communication costs, and device variability in real-world federated learning settings, must be addressed to optimize federated learning’s effectiveness and scalability in clinical applications.