Plane Tracking: We annotate the four corners of the paint manually for the first frame, and then find the corresponding four corners for the rest of the frames with COTR. We visualize the outcome by augmenting a virtual paint on top using homography warping.
Two Views Reconstruction: Given 2 images from calibrated camera, we triangulate a dense point cloud with the correspondences retrieved with COTR.
Facial Landmarks Tracking: We use an off-the-shelf detector to detect the facial landmarks for the first frame, and then find them in the rest of the frames with COTR. Note that COTR is trained on static outdoor scenes only, and was never trained for deformable objects nor faces — it still works well.
Correspondence Transformer on Transformers: We annotate the grille of Optimus Prime for the first frame, and find them in the rest of the frames with COTR. As the appearances change drastically, we reset the reference points with previous frame when COTR filters out any uncertain correspondences. Again, note that COTR was never trained for these tasks, and yet works very well.
The supplementary videos are encoded by FFMPEG with h.264 codec. If you can't play the video, please download the VLC player at: http://www.videolan.org/vlc/index.html