gradslam.datasets

gradslam.datasets.icl

class ICL(basedir: str, trajectories: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True)[source]

A torch Dataset for loading in the ICL-NUIM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses the TUM RGB-D Compatible PNGs files and Global_RT_Trajectory_GT from here. Expects the following folder structure for the ICL dataset:

| ├── ICL
| │   ├── living_room_traj0_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom0n.gt.sim
| │   ├── living_room_traj1_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom1n.gt.sim
| │   ├── living_room_traj2_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom2n.gt.sim
| │   ├── living_room_traj3_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoom3n.gt.sim
| │   ├── living_room_trajX_frei_png
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── associations.txt
| │   │   └── livingRoomXn.gt.sim
|

Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:

                                    sequence0
                ┎───────────────┲───────────────┲───────────────┒
                |               |               |               |
frame0  frame1  frame2  frame3  frame4  frame5  frame6  frame7  frame8  frame9  frame10  frame11 ...
                                        |               |               |                |
                                        └───────────────┵───────────────┵────────────────┚
                                                            sequence1
Parameters
  • basedir (str) –

    Path to the base directory containing the living_room_trajX_frei_png/ directories from ICL-NUIM. Each trajectory subdirectory is assumed to contain depth/, rgb/, associations.txt and livingRoom0n.gt.sim.

    ├── living_room_trajX_frei_png
    │   ├── depth/
    │   ├── rgb/
    │   ├── associations.txt
    │   └── livingRoomXn.gt.sim
    

  • trajectories (str or tuple of str or None) – Trajectories to use from “living_room_traj0_frei_png”, “living_room_traj1_frei_png”, “living_room_traj2_frei_png” or “living_room_traj3_frei_png”. Can be path to a .txt file where each line is a trajectory name (living_room_traj0_frei_png), a tuple of trajectory names, or None to use all trajectories. Default: None

  • seqlen (int) – Number of frames to use for each sequence of frames. Default: 4

  • dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None

  • stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None

  • start (int or None) – Index of the frame from which to start extracting sequences for every trajectory. If None, will start from the first frame. Default: None

  • end (int) – Index of the frame at which to stop extracting sequences for every trajectory. If None, will continue extracting frames until the end of the trajectory. Default: None

  • height (int) – Spatial height to resize frames to. Default: 480

  • width (int) – Spatial width to resize frames to. Default: 640

  • channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False

  • normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False

  • return_depth (bool) – Determines whether to return depths. Default: True

  • return_intrinsics (bool) – Determines whether to return intrinsics. Default: True

  • return_pose (bool) – Determines whether to return poses. Default: True

  • return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True

  • return_names (bool) – Determines whether to return sequence names. Default: True

Examples:

>>> dataset = ICL(
    basedir="ICL-data/",
    trajectories=("living_room_traj0_frei_png", "living_room_traj1_frei_png")
    )
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))

gradslam.datasets.scannet

class Scannet(basedir: str, seqmetadir: str, scenes: Optional[Union[tuple, str]], start: Optional[int] = 0, end: Optional[int] = - 1, height: int = 480, width: int = 640, seg_classes: str = 'scannet20', channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_labels: bool = True)[source]

A torch Dataset for loading in the Scannet dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrices, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of sequences, and semantic segmentation labels.

Parameters
  • basedir (str) – Path to the base directory containing the sceneXXXX_XX/ directories from ScanNet. Each scene subdirectory is assumed to contain color/, depth/, intrinsic/, label-filt/ and pose/ directories.

  • seqmetadir (str) – Path to directory containing sequence associations. Directory is assumed to contain metadata .txt files (one metadata per sequence): e.g. sceneXXXX_XX-seq_Y.txt .

  • scenes (str or tuple of str) – Scenes to use from sequences (used for creating train/val/test splits). Can be path to a .txt file where each line is a scene name (sceneXXXX_XX), a tuple of scene names, or None to use all scenes.

  • start (int) – Index of the frame from which to start for every sequence. Default: 0

  • end (int) – Index of the frame at which to end for every sequence. Default: -1

  • height (int) – Spatial height to resize frames to. Default: 480

  • width (int) – Spatial width to resize frames to. Default: 640

  • seg_classes (str) – The palette of classes that the network should learn. Either “nyu40” or “scannet20”. Default: “scannet20”

  • channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False

  • normalize_color (bool) – Normalize color to range \([0, 1]\) or leave it at range \([0, 255]\). Default: False

  • return_depth (bool) – Determines whether to return depths. Default: True

  • return_intrinsics (bool) – Determines whether to return intrinsics. Default: True

  • return_pose (bool) – Determines whether to return poses. Default: True

  • return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True

  • return_names (bool) – Determines whether to return sequence names. Default: True

  • return_labels (bool) – Determines whether to return segmentation labels. Default: True

Examples:

>>> dataset = Scannet(
    basedir="ScanNet-gradSLAM/extractions/scans/",
    seqmetadir="ScanNet-gradSLAM/extractions/sequence_associations/",
    scenes=("scene0000_00", "scene0001_00")
    )
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names, labels = next(iter(loader))

gradslam.datasets.tum

class TUM(basedir: str, sequences: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_timestamps: bool = True)[source]

A torch Dataset for loading in the TUM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses extracted .tgz sequences downloaded from here. Expects similar to the following folder structure for the TUM dataset:

| ├── TUM
| │   ├── rgbd_dataset_freiburg1_rpy
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── accelerometer.txt
| │   │   ├── depth.txt
| │   │   ├── groundtruth.txt
| │   │   └── rgb.txt
| │   ├── rgbd_dataset_freiburg1_xyz
| │   │   ├── depth/
| │   │   ├── rgb/
| │   │   ├── accelerometer.txt
| │   │   ├── depth.txt
| │   │   ├── groundtruth.txt
| │   │   └── rgb.txt
| │   ├── ...
|
|

Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:

                                    sequence0
                ┎───────────────┲───────────────┲───────────────┒
                |               |               |               |
frame0  frame1  frame2  frame3  frame4  frame5  frame6  frame7  frame8  frame9  frame10  frame11 ...
                                        |               |               |                |
                                        └───────────────┵───────────────┵────────────────┚
                                                            sequence1
Parameters
  • basedir (str) –

    Path to the base directory containing extracted TUM sequences in separate directories. Each sequence subdirectory is assumed to contain depth/, rgb/, accelerometer.txt, depth.txt and groundtruth.txt and rgb.txt, E.g.:

    ├── rgbd_dataset_freiburgX_NAME
    │   ├── depth/
    │   ├── rgb/
    │   ├── accelerometer.txt
    │   ├── depth.txt
    │   ├── groundtruth.txt
    │   └── rgb.txt
    

  • sequences (str or tuple of str or None) – Sequences to use from those available in basedir. Can be path to a .txt file where each line is a sequence name (e.g. rgbd_dataset_freiburg1_rpy), a tuple of sequence names, or None to use all sequences. Default: None

  • seqlen (int) – Number of frames to use for each sequence of frames. Default: 4

  • dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None

  • stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None

  • start (int or None) – Index of the rgb frame from which to start extracting sequences for every sequence. If None, will start from the first frame. Default: None

  • end (int) – Index of the rgb frame at which to stop extracting sequences for every sequence. If None, will continue extracting frames until the end of the sequence. Default: None

  • height (int) – Spatial height to resize frames to. Default: 480

  • width (int) – Spatial width to resize frames to. Default: 640

  • channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False

  • normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False

  • return_depth (bool) – Determines whether to return depths. Default: True

  • return_intrinsics (bool) – Determines whether to return intrinsics. Default: True

  • return_pose (bool) – Determines whether to return poses. Default: True

  • return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True

  • return_names (bool) – Determines whether to return sequence names. Default: True

  • return_timestamps (bool) – Determines whether to return rgb, depth and pose timestamps. Default: True

Examples:

>>> dataset = TUM(
    basedir="TUM-data/",
    sequences=("rgbd_dataset_freiburg1_rpy", "rgbd_dataset_freiburg1_xyz"))
>>> loader = data.DataLoader(dataset=dataset, batch_size=4)
>>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))

gradslam.datasets.datautils

normalize_image(rgb: Union[torch.Tensor, numpy.ndarray])[source]

Normalizes RGB image values from \([0, 255]\) range to \([0, 1]\) range.

Parameters

rgb (torch.Tensor or numpy.ndarray) – RGB image in range \([0, 255]\)

Returns

Normalized RGB image in range \([0, 1]\)

Return type

torch.Tensor or numpy.ndarray

Shape:
  • rgb: \((*)\) (any shape)

  • Output: Same shape as input \((*)\)

channels_first(rgb: Union[torch.Tensor, numpy.ndarray])[source]

Converts from channels last representation \((*, H, W, C)\) to channels first representation \((*, C, H, W)\)

Parameters

rgb (torch.Tensor or numpy.ndarray) – \((*, H, W, C)\) ordering (*, height, width, channels)

Returns

\((*, C, H, W)\) ordering

Return type

torch.Tensor or numpy.ndarray

Shape:
  • rgb: \((*, H, W, C)\)

  • Output: \((*, C, H, W)\)

scale_intrinsics(intrinsics: Union[numpy.ndarray, torch.Tensor], h_ratio: Union[float, int], w_ratio: Union[float, int])[source]

Scales the intrinsics appropriately for resized frames where \(h_\text{ratio} = h_\text{new} / h_\text{old}\) and \(w_\text{ratio} = w_\text{new} / w_\text{old}\)

Parameters
  • intrinsics (numpy.ndarray or torch.Tensor) – Intrinsics matrix of original frame

  • h_ratio (float or int) – Ratio of new frame’s height to old frame’s height \(h_\text{ratio} = h_\text{new} / h_\text{old}\)

  • w_ratio (float or int) – Ratio of new frame’s width to old frame’s width \(w_\text{ratio} = w_\text{new} / w_\text{old}\)

Returns

Intrinsics matrix scaled approprately for new frame size

Return type

numpy.ndarray or torch.Tensor

Shape:
  • intrinsics: \((*, 3, 3)\) or \((*, 4, 4)\)

  • Output: Matches intrinsics shape, \((*, 3, 3)\) or \((*, 4, 4)\)

pointquaternion_to_homogeneous(pointquaternions: Union[numpy.ndarray, torch.Tensor], eps: float = 1e-12)[source]

Converts 3D point and unit quaternions \((t_x, t_y, t_z, q_x, q_y, q_z, q_w)\) to homogeneous transformations [R | t] where \(R\) denotes the \((3, 3)\) rotation matrix and \(T\) denotes the \((3, 1)\) translation matrix:

\[\begin{split}\left[\begin{array}{@{}c:c@{}} R & T \\ \hdashline \begin{array}{@{}ccc@{}} 0 & 0 & 0 \end{array} & 1 \end{array}\right]\end{split}\]
Parameters
  • pointquaternions (numpy.ndarray or torch.Tensor) – 3D point positions and unit quaternions \((tx, ty, tz, qx, qy, qz, qw)\) where \((tx, ty, tz)\) is the 3D position and \((qx, qy, qz, qw)\) is the unit quaternion.

  • eps (float) – Small value, to avoid division by zero. Default: 1e-12

Returns

Homogeneous transformation matrices.

Return type

numpy.ndarray or torch.Tensor

Shape:
  • pointquaternions: \((*, 7)\)

  • Output: \((*, 4, 4)\)

poses_to_transforms(poses: Union[numpy.ndarray, List[numpy.ndarray]])[source]

Converts poses to transformations w.r.t. the first frame in the sequence having identity pose

Parameters

poses (numpy.ndarray or list of numpy.ndarray) – Sequence of poses in numpy.ndarray format.

Returns

Sequence of frame to frame transformations where initial

frame is transformed to have identity pose.

Return type

numpy.ndarray or list of numpy.ndarray

Shape:
  • poses: Could be numpy.ndarray of shape \((N, 4, 4)\), or list of numpy.ndarray`s of shape :math:`(4, 4)

  • Output: Of same shape as input poses

create_label_image(prediction: numpy.ndarray, color_palette: collections.OrderedDict)[source]

Creates a label image, given a network prediction (each pixel contains class index) and a color palette.

Parameters
  • prediction (numpy.ndarray) – Predicted image where each pixel contains an integer, corresponding to its class label.

  • color_palette (OrderedDict) – Contains RGB colors (uint8) for each class.

Returns

Label image with the given color palette

Return type

numpy.ndarray

Shape:
  • prediction: \((H, W)\)

  • Output: \((H, W)\)