Medical Image Generation with Conditional LDM incorporating anatomy-compliant synthesis
Project Overview
We implemented a Conditional Latent Diffusion Model (LDM) for medical image generation, using edge detection and semantic mapping as conditional inputs. The project utilized the M&M’s Challenge dataset, achieving promising reconstruction results while identifying areas for improvement in generation quality.
Examples of Generated Images
Key Technologies
- Edge Detection: Canny Edge Algorithm with noise reduction
- Semantic Mapping: Four-channel one-hot encoded representation
- VQVAE: For efficient image encoding and reconstruction
- UNet: As the backbone for the LDM
- HDF5: For optimized data storage and access
Development Process
Data Preprocessing
- Processed 360 subjects from M&M’s Challenge dataset
- Split: 350 training, 10 validation
- Combined long-axis and short-axis cardiac images
- Normalized to 256×256 resolution
Model Architecture
- Latent Space: 32×32×3 with 16,834 codebook size
- Network Structure:
- Uniform block depth: 2
- Down-block channels: [256, 384, 512, 768]
- Mid-block channels: [768, 512]
- Self-Attention heads: 8
Training Configuration
- VQVAE learning rate: 8.0e-5
- UNet learning rate: 8.0e-6
- Batch size: 36
- Dropout rate: 0.1
- Diffusion steps: 1,000
Performance Evaluation
- FID: 116.09
- SSIM: 0.44
- NMSE: 1.14
Challenges and Insights
- Mixed View Challenge: Combining long-axis and short-axis views introduced spatial complexity
- Edge Map Practicality: Detailed edge maps may not reflect real-world user inputs
- Training Limitations: Hyperparameter exploration and training duration were constrained
Future Directions
Explore separating long-axis and short-axis image training
- Investigate natural language conditioning inspired by clinical Llama applications
- Further hyperparameter optimization and extended training
- Consider alternative conditioning approaches for practical deployment
Detailed Reports
The detailed report can be found here.