gradslam.datasets¶
gradslam.datasets.icl¶
- class ICL(basedir: str, trajectories: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True)[source]¶
A torch Dataset for loading in the ICL-NUIM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses the TUM RGB-D Compatible PNGs files and Global_RT_Trajectory_GT from here. Expects the following folder structure for the ICL dataset:
| ├── ICL | │ ├── living_room_traj0_frei_png | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── associations.txt | │ │ └── livingRoom0n.gt.sim | │ ├── living_room_traj1_frei_png | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── associations.txt | │ │ └── livingRoom1n.gt.sim | │ ├── living_room_traj2_frei_png | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── associations.txt | │ │ └── livingRoom2n.gt.sim | │ ├── living_room_traj3_frei_png | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── associations.txt | │ │ └── livingRoom3n.gt.sim | │ ├── living_room_trajX_frei_png | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── associations.txt | │ │ └── livingRoomXn.gt.sim |
Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:
sequence0 ┎───────────────┲───────────────┲───────────────┒ | | | | frame0 frame1 frame2 frame3 frame4 frame5 frame6 frame7 frame8 frame9 frame10 frame11 ... | | | | └───────────────┵───────────────┵────────────────┚ sequence1
- Parameters
basedir (str) –
Path to the base directory containing the living_room_trajX_frei_png/ directories from ICL-NUIM. Each trajectory subdirectory is assumed to contain depth/, rgb/, associations.txt and livingRoom0n.gt.sim.
├── living_room_trajX_frei_png │ ├── depth/ │ ├── rgb/ │ ├── associations.txt │ └── livingRoomXn.gt.sim
trajectories (str or tuple of str or None) – Trajectories to use from “living_room_traj0_frei_png”, “living_room_traj1_frei_png”, “living_room_traj2_frei_png” or “living_room_traj3_frei_png”. Can be path to a .txt file where each line is a trajectory name (living_room_traj0_frei_png), a tuple of trajectory names, or None to use all trajectories. Default: None
seqlen (int) – Number of frames to use for each sequence of frames. Default: 4
dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None
stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None
start (int or None) – Index of the frame from which to start extracting sequences for every trajectory. If None, will start from the first frame. Default: None
end (int) – Index of the frame at which to stop extracting sequences for every trajectory. If None, will continue extracting frames until the end of the trajectory. Default: None
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True
Examples:
>>> dataset = ICL( basedir="ICL-data/", trajectories=("living_room_traj0_frei_png", "living_room_traj1_frei_png") ) >>> loader = data.DataLoader(dataset=dataset, batch_size=4) >>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))
gradslam.datasets.scannet¶
- class Scannet(basedir: str, seqmetadir: str, scenes: Optional[Union[tuple, str]], start: Optional[int] = 0, end: Optional[int] = - 1, height: int = 480, width: int = 640, seg_classes: str = 'scannet20', channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_labels: bool = True)[source]¶
A torch Dataset for loading in the Scannet dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrices, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of sequences, and semantic segmentation labels.
- Parameters
basedir (str) – Path to the base directory containing the sceneXXXX_XX/ directories from ScanNet. Each scene subdirectory is assumed to contain color/, depth/, intrinsic/, label-filt/ and pose/ directories.
seqmetadir (str) – Path to directory containing sequence associations. Directory is assumed to contain metadata .txt files (one metadata per sequence): e.g. sceneXXXX_XX-seq_Y.txt .
scenes (str or tuple of str) – Scenes to use from sequences (used for creating train/val/test splits). Can be path to a .txt file where each line is a scene name (sceneXXXX_XX), a tuple of scene names, or None to use all scenes.
start (int) – Index of the frame from which to start for every sequence. Default: 0
end (int) – Index of the frame at which to end for every sequence. Default: -1
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
seg_classes (str) – The palette of classes that the network should learn. Either “nyu40” or “scannet20”. Default: “scannet20”
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0, 1]\) or leave it at range \([0, 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True
return_labels (bool) – Determines whether to return segmentation labels. Default: True
Examples:
>>> dataset = Scannet( basedir="ScanNet-gradSLAM/extractions/scans/", seqmetadir="ScanNet-gradSLAM/extractions/sequence_associations/", scenes=("scene0000_00", "scene0001_00") ) >>> loader = data.DataLoader(dataset=dataset, batch_size=4) >>> colors, depths, intrinsics, poses, transforms, names, labels = next(iter(loader))
gradslam.datasets.tum¶
- class TUM(basedir: str, sequences: Optional[Union[tuple, str]] = None, seqlen: int = 4, dilation: Optional[int] = None, stride: Optional[int] = None, start: Optional[int] = None, end: Optional[int] = None, height: int = 480, width: int = 640, channels_first: bool = False, normalize_color: bool = False, *, return_depth: bool = True, return_intrinsics: bool = True, return_pose: bool = True, return_transform: bool = True, return_names: bool = True, return_timestamps: bool = True)[source]¶
A torch Dataset for loading in the TUM dataset. Will fetch sequences of rgb images, depth maps, intrinsics matrix, poses, frame to frame relative transformations (with first frame’s pose as the reference transformation), names of frames. Uses extracted .tgz sequences downloaded from here. Expects similar to the following folder structure for the TUM dataset:
| ├── TUM | │ ├── rgbd_dataset_freiburg1_rpy | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── accelerometer.txt | │ │ ├── depth.txt | │ │ ├── groundtruth.txt | │ │ └── rgb.txt | │ ├── rgbd_dataset_freiburg1_xyz | │ │ ├── depth/ | │ │ ├── rgb/ | │ │ ├── accelerometer.txt | │ │ ├── depth.txt | │ │ ├── groundtruth.txt | │ │ └── rgb.txt | │ ├── ... | |
Example of sequence creation from frames with seqlen=4, dilation=1, stride=3, and start=2:
sequence0 ┎───────────────┲───────────────┲───────────────┒ | | | | frame0 frame1 frame2 frame3 frame4 frame5 frame6 frame7 frame8 frame9 frame10 frame11 ... | | | | └───────────────┵───────────────┵────────────────┚ sequence1
- Parameters
basedir (str) –
Path to the base directory containing extracted TUM sequences in separate directories. Each sequence subdirectory is assumed to contain depth/, rgb/, accelerometer.txt, depth.txt and groundtruth.txt and rgb.txt, E.g.:
├── rgbd_dataset_freiburgX_NAME │ ├── depth/ │ ├── rgb/ │ ├── accelerometer.txt │ ├── depth.txt │ ├── groundtruth.txt │ └── rgb.txt
sequences (str or tuple of str or None) – Sequences to use from those available in basedir. Can be path to a .txt file where each line is a sequence name (e.g. rgbd_dataset_freiburg1_rpy), a tuple of sequence names, or None to use all sequences. Default: None
seqlen (int) – Number of frames to use for each sequence of frames. Default: 4
dilation (int or None) – Number of (original trajectory’s) frames to skip between two consecutive frames in the extracted sequence. See above example if unsure. If None, will set dilation = 0. Default: None
stride (int or None) – Number of frames between the first frames of two consecutive extracted sequences. See above example if unsure. If None, will set stride = seqlen * (dilation + 1) (non-overlapping sequences). Default: None
start (int or None) – Index of the rgb frame from which to start extracting sequences for every sequence. If None, will start from the first frame. Default: None
end (int) – Index of the rgb frame at which to stop extracting sequences for every sequence. If None, will continue extracting frames until the end of the sequence. Default: None
height (int) – Spatial height to resize frames to. Default: 480
width (int) – Spatial width to resize frames to. Default: 640
channels_first (bool) – If True, will use channels first representation \((B, L, C, H, W)\) for images (batchsize, sequencelength, channels, height, width). If False, will use channels last representation \((B, L, H, W, C)\). Default: False
normalize_color (bool) – Normalize color to range \([0 1]\) or leave it at range \([0 255]\). Default: False
return_depth (bool) – Determines whether to return depths. Default: True
return_intrinsics (bool) – Determines whether to return intrinsics. Default: True
return_pose (bool) – Determines whether to return poses. Default: True
return_transform (bool) – Determines whether to return transforms w.r.t. initial pose being transformed to be identity. Default: True
return_names (bool) – Determines whether to return sequence names. Default: True
return_timestamps (bool) – Determines whether to return rgb, depth and pose timestamps. Default: True
Examples:
>>> dataset = TUM( basedir="TUM-data/", sequences=("rgbd_dataset_freiburg1_rpy", "rgbd_dataset_freiburg1_xyz")) >>> loader = data.DataLoader(dataset=dataset, batch_size=4) >>> colors, depths, intrinsics, poses, transforms, names = next(iter(loader))
gradslam.datasets.datautils¶
- normalize_image(rgb: Union[torch.Tensor, numpy.ndarray])[source]¶
Normalizes RGB image values from \([0, 255]\) range to \([0, 1]\) range.
- Parameters
rgb (torch.Tensor or numpy.ndarray) – RGB image in range \([0, 255]\)
- Returns
Normalized RGB image in range \([0, 1]\)
- Return type
- Shape:
rgb: \((*)\) (any shape)
Output: Same shape as input \((*)\)
- channels_first(rgb: Union[torch.Tensor, numpy.ndarray])[source]¶
Converts from channels last representation \((*, H, W, C)\) to channels first representation \((*, C, H, W)\)
- Parameters
rgb (torch.Tensor or numpy.ndarray) – \((*, H, W, C)\) ordering (*, height, width, channels)
- Returns
\((*, C, H, W)\) ordering
- Return type
- Shape:
rgb: \((*, H, W, C)\)
Output: \((*, C, H, W)\)
- scale_intrinsics(intrinsics: Union[numpy.ndarray, torch.Tensor], h_ratio: Union[float, int], w_ratio: Union[float, int])[source]¶
Scales the intrinsics appropriately for resized frames where \(h_\text{ratio} = h_\text{new} / h_\text{old}\) and \(w_\text{ratio} = w_\text{new} / w_\text{old}\)
- Parameters
intrinsics (numpy.ndarray or torch.Tensor) – Intrinsics matrix of original frame
h_ratio (float or int) – Ratio of new frame’s height to old frame’s height \(h_\text{ratio} = h_\text{new} / h_\text{old}\)
w_ratio (float or int) – Ratio of new frame’s width to old frame’s width \(w_\text{ratio} = w_\text{new} / w_\text{old}\)
- Returns
Intrinsics matrix scaled approprately for new frame size
- Return type
- Shape:
intrinsics: \((*, 3, 3)\) or \((*, 4, 4)\)
Output: Matches intrinsics shape, \((*, 3, 3)\) or \((*, 4, 4)\)
- pointquaternion_to_homogeneous(pointquaternions: Union[numpy.ndarray, torch.Tensor], eps: float = 1e-12)[source]¶
Converts 3D point and unit quaternions \((t_x, t_y, t_z, q_x, q_y, q_z, q_w)\) to homogeneous transformations [R | t] where \(R\) denotes the \((3, 3)\) rotation matrix and \(T\) denotes the \((3, 1)\) translation matrix:
\[\begin{split}\left[\begin{array}{@{}c:c@{}} R & T \\ \hdashline \begin{array}{@{}ccc@{}} 0 & 0 & 0 \end{array} & 1 \end{array}\right]\end{split}\]- Parameters
pointquaternions (numpy.ndarray or torch.Tensor) – 3D point positions and unit quaternions \((tx, ty, tz, qx, qy, qz, qw)\) where \((tx, ty, tz)\) is the 3D position and \((qx, qy, qz, qw)\) is the unit quaternion.
eps (float) – Small value, to avoid division by zero. Default: 1e-12
- Returns
Homogeneous transformation matrices.
- Return type
- Shape:
pointquaternions: \((*, 7)\)
Output: \((*, 4, 4)\)
- poses_to_transforms(poses: Union[numpy.ndarray, List[numpy.ndarray]])[source]¶
Converts poses to transformations w.r.t. the first frame in the sequence having identity pose
- Parameters
poses (numpy.ndarray or list of numpy.ndarray) – Sequence of poses in numpy.ndarray format.
- Returns
- Sequence of frame to frame transformations where initial
frame is transformed to have identity pose.
- Return type
numpy.ndarray or list of numpy.ndarray
- Shape:
poses: Could be numpy.ndarray of shape \((N, 4, 4)\), or list of numpy.ndarray`s of shape :math:`(4, 4)
Output: Of same shape as input poses
- create_label_image(prediction: numpy.ndarray, color_palette: collections.OrderedDict)[source]¶
Creates a label image, given a network prediction (each pixel contains class index) and a color palette.
- Parameters
prediction (numpy.ndarray) – Predicted image where each pixel contains an integer, corresponding to its class label.
color_palette (OrderedDict) – Contains RGB colors (uint8) for each class.
- Returns
Label image with the given color palette
- Return type
- Shape:
prediction: \((H, W)\)
Output: \((H, W)\)