U-TAE++¶
Modernized U-TAE with ConvNeXt blocks and attention mechanisms for temporal segmentation.
models.architectures.utae_pp
¶
U-TAE++ Implementation - Modernized U-TAE with ConvNeXt blocks and Flash Attention.
Based on U-TAE by Vivien Sainte Fare Garnot (github/VSainteuf) Improvements: - ConvNeXt-style blocks with 7x7 depthwise conv - Flash Attention (PyTorch 2.0+) - Stochastic Depth (DropPath) - CBAM attention in decoder - Deep supervision - Layer Scale
CBAM(channels: int, reduction: int = 16, kernel_size: int = 7)
¶
Bases: Module
Convolutional Block Attention Module.
Source code in src/models/architectures/utae_pp.py
ChannelAttention(channels: int, reduction: int = 16)
¶
Bases: Module
Channel attention from CBAM.
Source code in src/models/architectures/utae_pp.py
ConvBlock(nkernels: list[int], pad_value: float | None = None, norm: str = 'batch', last_relu: bool = True, padding_mode: str = 'reflect')
¶
Bases: TemporallySharedBlock
Convolutional block with temporal sharing.
Source code in src/models/architectures/utae_pp.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
ConvLayer(nkernels: list[int], norm: Literal['batch', 'group', 'instance', 'layer'] = 'batch', k: int = 3, s: int = 1, p: int = 1, n_groups: int = 4, last_relu: bool = True, padding_mode: str = 'reflect')
¶
Bases: Module
Basic convolution layer with norm and activation.
Source code in src/models/architectures/utae_pp.py
ConvNeXtBlock(dim: int, expansion: int = 4, drop_path: float = 0.0, layer_scale_init: float = 1e-06, kernel_size: int = 7)
¶
Bases: Module
ConvNeXt block with depthwise conv, inverted bottleneck, and layer scale.
Source code in src/models/architectures/utae_pp.py
CoordinateAttention(channels: int, reduction: int = 16)
¶
Bases: Module
Coordinate Attention module (Hou et al., CVPR 2021).
Unlike CBAM which loses spatial information via global pooling, Coordinate Attention encodes channel relationships while preserving precise positional information via 1D horizontal and vertical pooling.
Reference: https://arxiv.org/abs/2103.02907
Source code in src/models/architectures/utae_pp.py
DownConvBlock(d_in: int, d_out: int, k: int, s: int, p: int, pad_value: float | None = None, norm: str = 'batch', padding_mode: str = 'reflect', drop_path: float = 0.0, use_convnext: bool = True)
¶
Bases: TemporallySharedBlock
Downsampling block with ConvNeXt-style processing.
Source code in src/models/architectures/utae_pp.py
LTAE2d(in_channels: int = 128, n_head: int = 16, d_k: int = 4, mlp: list[int] | None = None, dropout: float = 0.2, d_model: int = 256, T: int = 1000, return_att: bool = False, positional_encoding: bool = True)
¶
Bases: Module
Lightweight Temporal Attention Encoder for image time series.
Source code in src/models/architectures/utae_pp.py
MultiHeadAttention(n_head: int, d_k: int, d_in: int)
¶
Bases: Module
Multi-Head Attention with learnable query.
Source code in src/models/architectures/utae_pp.py
PositionalEncoder(d: int, T: int = 1000, repeat: int | None = None, offset: int = 0)
¶
Bases: Module
Sinusoidal positional encoding for temporal sequences.
Source code in src/models/architectures/utae_pp.py
ScaledDotProductAttention(temperature: float, attn_dropout: float = 0.1)
¶
Bases: Module
Scaled Dot-Product Attention with Flash Attention support.
Source code in src/models/architectures/utae_pp.py
forward(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, pad_mask: torch.Tensor | None = None, return_comp: bool = False) -> tuple[torch.Tensor, torch.Tensor] | tuple[torch.Tensor, torch.Tensor, torch.Tensor]
¶
Args: q: Query tensor (N, d_k) k: Key tensor (N, T, d_k) v: Value tensor (N, T, d_v) pad_mask: Padding mask (N, T) return_comp: Whether to return attention compatibility scores
Source code in src/models/architectures/utae_pp.py
SpatialAttention(kernel_size: int = 7)
¶
TemporalAggregator(mode: Literal['att_group', 'att_mean', 'mean'] = 'mean')
¶
TemporallySharedBlock(pad_value: float | None = None)
¶
UTAE(input_dim: int, encoder_widths: list[int] | None = None, decoder_widths: list[int] | None = None, out_conv: list[int] | None = None, str_conv_k: int = 4, str_conv_s: int = 2, str_conv_p: int = 1, agg_mode: str = 'att_group', encoder_norm: str = 'group', n_head: int = 16, d_model: int = 256, d_k: int = 4, encoder: bool = False, return_maps: bool = False, pad_value: float = 0, padding_mode: str = 'reflect', use_convnext: bool = True, attention_type: str = 'coord', drop_path_rate: float = 0.1, deep_supervision: bool = False)
¶
Bases: Module
U-TAE++ - Modernized U-TAE with ConvNeXt blocks and Flash Attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dim
|
int
|
Number of input channels |
required |
encoder_widths
|
list[int] | None
|
Channel widths for each encoder stage |
None
|
decoder_widths
|
list[int] | None
|
Channel widths for each decoder stage |
None
|
out_conv
|
list[int] | None
|
Output convolution channels [hidden, n_classes] |
None
|
str_conv_k
|
int
|
Kernel size for strided convolutions |
4
|
str_conv_s
|
int
|
Stride for strided convolutions |
2
|
str_conv_p
|
int
|
Padding for strided convolutions |
1
|
agg_mode
|
str
|
Temporal aggregation mode ('att_group', 'att_mean', 'mean') |
'att_group'
|
encoder_norm
|
str
|
Normalization type ('group', 'batch', 'instance') |
'group'
|
n_head
|
int
|
Number of attention heads in L-TAE |
16
|
d_model
|
int
|
Model dimension for L-TAE |
256
|
d_k
|
int
|
Key/query dimension for attention |
4
|
encoder
|
bool
|
If True, return feature maps instead of predictions |
False
|
return_maps
|
bool
|
If True, also return intermediate feature maps |
False
|
pad_value
|
float
|
Padding value for temporal sequences |
0
|
padding_mode
|
str
|
Padding mode for convolutions |
'reflect'
|
use_convnext
|
bool
|
Use ConvNeXt-style blocks |
True
|
attention_type
|
str
|
Decoder attention type ('coord', 'cbam', or 'none') |
'coord'
|
drop_path_rate
|
float
|
Stochastic depth rate |
0.1
|
deep_supervision
|
bool
|
Enable auxiliary outputs for deep supervision |
False
|
Source code in src/models/architectures/utae_pp.py
forward(input: torch.Tensor, batch_positions: torch.Tensor | None = None, pad_mask: torch.Tensor | None = None, return_att: bool = False) -> torch.Tensor | tuple
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Tensor
|
Input tensor (B, T, C, H, W) |
required |
batch_positions
|
Tensor | None
|
Temporal positions (B, T) |
None
|
pad_mask
|
Tensor | None
|
Boolean padding mask (B, T) where True indicates padded timesteps. If not provided, computed from input using self.pad_value. |
None
|
return_att
|
bool
|
Return attention maps |
False
|
Returns:
| Type | Description |
|---|---|
Tensor | tuple
|
Segmentation output and optionally attention/auxiliary outputs |
Source code in src/models/architectures/utae_pp.py
UpConvBlock(d_in: int, d_out: int, k: int, s: int, p: int, norm: str = 'batch', d_skip: int | None = None, padding_mode: str = 'reflect', attention_type: str = 'coord', use_convnext: bool = True, drop_path: float = 0.0)
¶
Bases: Module
Upsampling block with configurable attention (CBAM, Coordinate, or none).
Source code in src/models/architectures/utae_pp.py
build_attention(attention_type: str, channels: int, reduction: int = 16) -> nn.Module
¶
Factory function to build attention modules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attention_type
|
str
|
Type of attention ('cbam', 'coord', or 'none') |
required |
channels
|
int
|
Number of input channels |
required |
reduction
|
int
|
Channel reduction ratio |
16
|
Returns:
| Type | Description |
|---|---|
Module
|
Attention module or identity-like fallback |