gradslam.datasets¶

gradslam.datasets.icl¶

class ICL(basedir: str, trajectories: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True)[source]¶

A torch Dataset for loading in the ICL-NUIM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses the TUM RGB-D Compatible PNGs files and Global_RT_Trajectory_GT from here. Expects the following folder structure for the ICL dataset:

| ├── ICL
| │   ├── living_room_traj0_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom0n.gt.sim
| │   ├── living_room_traj1_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom1n.gt.sim
| │   ├── living_room_traj2_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom2n.gt.sim
| │   ├── living_room_traj3_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom3n.gt.sim
| │   ├── living_room_trajX_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoomXn.gt.sim
|

Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:

                                    sequence0
                ┎───────────────┲───────────────┲───────────────┒
                |               |               |               |
frame0  frame1  frame2  frame3  frame4  frame5  frame6  frame7  frame8  frame9  frame10  frame11 ...
                                        |               |               |                |
                                        └───────────────┵───────────────┵────────────────┚
                                                            sequence1

Parameters

basedir (str) –
Path to the base directory containing the living_room_trajX_frei_png/ directories from ICL-NUIM. Each trajectory subdirectory is assumed to contain depth/, rgb/, associations.txt and livingRoom0n.gt.sim.
```
├── living_room_trajX_frei_png
│   ├── depth/
│   ├── rgb/
│   ├── associations.txt
│   └── livingRoomXn.gt.sim
```
trajectories (str or tuple of str or None) – Trajectories to use from “living_room_traj0_frei_png”, “living_room_traj1_frei_png”, “living_room_traj2_frei_png” or “living_room_traj3_frei_png”. Can be path to a .txt file where each line is a trajectory name (living_room_traj0_frei_png), a tuple of trajectory names, or None to use all trajectories. Default: None
seqlen (int) – Number of frames to use for each sequence of frames. Default: 4
dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None
stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None
start (int or None) – Index of the frame from which to start extracting sequences for every trajectory. If None, will start from the first frame. Default: None
end (int) – Index of the frame at which to stop extracting sequences for every trajectory. If None, will continue extracting frames until the end of the trajectory. Default: None
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True

Examples:

>>> dataset = ICL(
    basedir="ICL-data/",
    trajectories=("living_room_traj0_frei_png", "living_room_traj1_frei_png")
    )
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))

gradslam.datasets.scannet¶

class Scannet(basedir: str, seqmetadir: str, scenes: Optional[Union[tuple, str]], start: Optional[int] = 0, end: Optional[int] = - 1, height: int = 480, width: int = 640, seg_classes: str = 'scannet20', channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_labels: bool = True)[source]¶

A torch Dataset for loading in the Scannet dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrices, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of sequences, and semantic segmentation labels.

Parameters

basedir (str) – Path to the base directory containing the sceneXXXX_XX/ directories from ScanNet. Each scene subdirectory is assumed to contain color/, depth/, intrinsic/, label-filt/ and pose/ directories.
seqmetadir (str) – Path to directory containing sequence associations. Directory is assumed to contain metadata .txt files (one metadata per sequence): e.g. sceneXXXX_XX-seq_Y.txt .
scenes (str or tuple of str) – Scenes to use from sequences (used for creating train/val/test splits). Can be path to a .txt file where each line is a scene name (sceneXXXX_XX), a tuple of scene names, or None to use all scenes.
start (int) – Index of the frame from which to start for every sequence. Default: 0
end (int) – Index of the frame at which to end for every sequence. Default: -1
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
seg_classes (str) – The palette of classes that the network should learn. Either “nyu40” or “scannet20”. Default: “scannet20”
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0, 1]\) or leave it at range \([0, 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True
return_labels (bool) – Determines whether to return segmentation labels. Default: True

Examples:

>>> dataset = Scannet(
    basedir="ScanNet-gradSLAM/extractions/scans/",
    seqmetadir="ScanNet-gradSLAM/extractions/sequence_associations/",
    scenes=("scene0000_00", "scene0001_00")
    )
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names, labels = next(iter(loader))

gradslam.datasets.tum¶

class TUM(basedir: str, sequences: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_timestamps: bool = True)[source]¶

A torch Dataset for loading in the TUM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses extracted .tgz sequences downloaded from here. Expects similar to the following folder structure for the TUM dataset:

| ├── TUM
| │   ├── rgbd_dataset_freiburg1_rpy
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── accelerometer.txt
| │   │   ├── depth.txt
| │   │   ├── groundtruth.txt
| │   │   └── rgb.txt
| │   ├── rgbd_dataset_freiburg1_xyz
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── accelerometer.txt
| │   │   ├── depth.txt
| │   │   ├── groundtruth.txt
| │   │   └── rgb.txt
| │   ├── ...
|
|

Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:

                                    sequence0
                ┎───────────────┲───────────────┲───────────────┒
                |               |               |               |
frame0  frame1  frame2  frame3  frame4  frame5  frame6  frame7  frame8  frame9  frame10  frame11 ...
                                        |               |               |                |
                                        └───────────────┵───────────────┵────────────────┚
                                                            sequence1

Parameters

basedir (str) –
Path to the base directory containing extracted TUM sequences in separate directories. Each sequence subdirectory is assumed to contain depth/, rgb/, accelerometer.txt, depth.txt and groundtruth.txt and rgb.txt, E.g.:
```
├── rgbd_dataset_freiburgX_NAME
│   ├── depth/
│   ├── rgb/
│   ├── accelerometer.txt
│   ├── depth.txt
│   ├── groundtruth.txt
│   └── rgb.txt
```
sequences (str or tuple of str or None) – Sequences to use from those available in basedir. Can be path to a .txt file where each line is a sequence name (e.g. rgbd_dataset_freiburg1_rpy), a tuple of sequence names, or None to use all sequences. Default: None
seqlen (int) – Number of frames to use for each sequence of frames. Default: 4
dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None
stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None
start (int or None) – Index of the rgb frame from which to start extracting sequences for every sequence. If None, will start from the first frame. Default: None
end (int) – Index of the rgb frame at which to stop extracting sequences for every sequence. If None, will continue extracting frames until the end of the sequence. Default: None
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True
return_timestamps (bool) – Determines whether to return rgb, depth and pose timestamps. Default: True

Examples:

>>> dataset = TUM(
    basedir="TUM-data/",
    sequences=("rgbd_dataset_freiburg1_rpy", "rgbd_dataset_freiburg1_xyz"))
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))

gradslam.datasets.datautils¶

normalize_image(rgb: Union[torch.Tensor, numpy.ndarray])[source]¶

Normalizes RGB image values from \([0, 255]\) range to \([0, 1]\) range.

Parameters: rgb (torch.Tensor or numpy.ndarray) – RGB image in range \([0, 255]\)
Returns: Normalized RGB image in range \([0, 1]\)
Return type: torch.Tensor or numpy.ndarray

Shape:

rgb: \((*)\) (any shape)
Output: Same shape as input \((*)\)

channels_first(rgb: Union[torch.Tensor, numpy.ndarray])[source]¶

Converts from channels last representation \((*, H, W, C)\) to channels first representation \((*, C, H, W)\)

Parameters: rgb (torch.Tensor or numpy.ndarray) – \((*, H, W, C)\) ordering (*, height, width, channels)
Returns: \((*, C, H, W)\) ordering
Return type: torch.Tensor or numpy.ndarray

Shape:

rgb: \((*, H, W, C)\)
Output: \((*, C, H, W)\)

scale_intrinsics(intrinsics: Union[numpy.ndarray, torch.Tensor], h_ratio: Union[float, int], w_ratio: Union[float, int])[source]¶

Scales the intrinsics appropriately for resized frames where \(h_\text{ratio} = h_\text{new} / h_\text{old}\) and \(w_\text{ratio} = w_\text{new} / w_\text{old}\)

Parameters

intrinsics (numpy.ndarray or torch.Tensor) – Intrinsics matrix of original frame
h_ratio (float or int) – Ratio of new frame’s height to old frame’s height \(h_\text{ratio} = h_\text{new} / h_\text{old}\)
w_ratio (float or int) – Ratio of new frame’s width to old frame’s width \(w_\text{ratio} = w_\text{new} / w_\text{old}\)

Returns

Intrinsics matrix scaled approprately for new frame size

Return type

numpy.ndarray or torch.Tensor

Shape:

intrinsics: \((*, 3, 3)\) or \((*, 4, 4)\)
Output: Matches intrinsics shape, \((*, 3, 3)\) or \((*, 4, 4)\)

pointquaternion_to_homogeneous(pointquaternions: Union[numpy.ndarray, torch.Tensor], eps: float = 1e-12)[source]¶

Converts 3D point and unit quaternions \((t_x, t_y, t_z, q_x, q_y, q_z, q_w)\) to homogeneous transformations [R | t] where \(R\) denotes the \((3, 3)\) rotation matrix and \(T\) denotes the \((3, 1)\) translation matrix:

\[\begin{split}\left[\begin{array}{@{}c:c@{}} R & T \\ \hdashline \begin{array}{@{}ccc@{}} 0 & 0 & 0 \end{array} & 1 \end{array}\right]\end{split}\]

Parameters

pointquaternions (numpy.ndarray or torch.Tensor) – 3D point positions and unit quaternions \((tx, ty, tz, qx, qy, qz, qw)\) where \((tx, ty, tz)\) is the 3D position and \((qx, qy, qz, qw)\) is the unit quaternion.
eps (float) – Small value, to avoid division by zero. Default: 1e-12

Returns

Homogeneous transformation matrices.

Return type

numpy.ndarray or torch.Tensor

Shape:

pointquaternions: \((*, 7)\)
Output: \((*, 4, 4)\)

poses_to_transforms(poses: Union[numpy.ndarray, List[numpy.ndarray]])[source]¶

Converts poses to transformations w.r.t. the first frame in the sequence having identity pose

Parameters

poses (numpy.ndarray or list of numpy.ndarray) – Sequence of poses in numpy.ndarray format.

Returns

Sequence of frame to frame transformations where initial: frame is transformed to have identity pose.

Return type

numpy.ndarray or list of numpy.ndarray

Shape:

poses: Could be numpy.ndarray of shape \((N, 4, 4)\), or list of numpy.ndarray`s of shape :math:`(4, 4)
Output: Of same shape as input poses

create_label_image(prediction: numpy.ndarray, color_palette: collections.OrderedDict)[source]¶

Creates a label image, given a network prediction (each pixel contains class index) and a color palette.

Parameters

prediction (numpy.ndarray) – Predicted image where each pixel contains an integer, corresponding to its class label.
color_palette (OrderedDict) – Contains RGB colors (uint8) for each class.

Returns

Label image with the given color palette

Return type

numpy.ndarray

Shape:

prediction: \((H, W)\)
Output: \((H, W)\)