Validation¶
Evaluation metrics computation including IoU, F1, accuracy, and confusion matrices.
training.validation
¶
calculate_iou_scores(conf_matrix: torch.Tensor, num_classes: int, other_class_index: int = 13) -> tuple[float, dict[int, float]]
¶
Compute mean IoU (mIoU) and per-class IoU.
Source code in src/training/validation.py
compute_timing_metrics(inference_times: list[float], batch_sizes: list[int]) -> dict[str, float]
¶
Compute timing metrics from inference times and batch sizes.
Source code in src/training/validation.py
evaluate(model: nn.Module, device: torch.device, data_loader: DataLoader, num_classes: int, other_class_index: int = 13, *, output_size: int | None = None, log_eval_metrics: bool = True, log_confusion_matrix: bool = True, normalize_confusion_matrix: bool = True, sample_ids_to_plot: list[str] | None = None, warmup_runs: int = 10, visualization_labels: dict[str, str] | None = None, class_name_mapping: dict[int, str] | None = None, zone_mosaic_config: dict | None = None, zone_data_loader: DataLoader | None = None) -> dict[str, float]
¶
Evaluate model and log metrics and plots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Model to evaluate. |
required |
device
|
device
|
Torch device. |
required |
data_loader
|
DataLoader
|
Evaluation DataLoader. |
required |
num_classes
|
int
|
Number of classes. |
required |
other_class_index
|
int
|
Index of 'other' class to exclude from mIoU. |
13
|
log_eval_metrics
|
bool
|
Whether to log scalar metrics. |
True
|
log_confusion_matrix
|
bool
|
Whether to log confusion matrix plot and CSV. |
True
|
normalize_confusion_matrix
|
bool
|
Normalize confusion matrix rows. |
True
|
sample_ids_to_plot
|
list[str] | None
|
Optional list of sample ids for individual prediction plots. |
None
|
warmup_runs
|
int
|
Warmup forward passes (ignored in timing). |
10
|
visualization_labels
|
dict[str, str] | None
|
Optional dict overriding plot text labels. |
None
|
class_name_mapping
|
dict[int, str] | None
|
Mapping from class index to readable name. |
None
|
zone_mosaic_config
|
dict | None
|
Optional config for zone prediction mosaic visualization. Expected keys: 'enabled', 'zone_name', 'grid_size', 'patch_size'. |
None
|
zone_data_loader
|
DataLoader | None
|
Optional DataLoader for a specific zone (for mosaic visualization). |
None
|
Source code in src/training/validation.py
get_evaluation_metrics_dict(num_classes: int, device: torch.device, other_class_index: int | None = None) -> dict[str, Metric]
¶
Initialize TorchMetrics for multiclass classification.
Source code in src/training/validation.py
upsample_predictions(outputs: torch.Tensor, target_size: tuple[int, int], output_size: int | None = None) -> torch.Tensor
¶
Upsample model predictions to match the target mask size.
Uses bilinear interpolation on logits (before argmax) for smoother boundaries. When using context window, center-crops to output_size before upsampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outputs
|
Tensor
|
Model outputs with shape (B, C, H, W) where C is num_classes. |
required |
target_size
|
tuple[int, int]
|
Tuple of (height, width) to upsample to (mask size). |
required |
output_size
|
int | None
|
Expected output spatial size after center-crop. If provided and model output is larger, center-crops to this size before upsampling. Use sentinel_patch_size when using context window. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Upsampled logits with shape (B, C, target_size[0], target_size[1]). |