@Parskatt Did you check the recent progress in point tracking?
@Parskatt They look like a more elegant version of multi-view feature matching for me, and much more memory-efficient.
@jianyuan_wang As in VGGSfM? I think its quite similar, yes. Why is it more elegant?
@Parskatt yes as used in VGGSfM. It is more elegant (in my humble opinion) because 1. it does not need to build a graph between one keypoint and all keypoints in the other images any more
@jianyuan_wang I agree that it's more efficient, at least for long tracks. But, 1. It does not seem possible to update the feature maps. 2. Init is from duplicating the query coord, and local corr. Probably this can be fixed though, and I'm all for "latent" things that scale better.
@Parskatt 1. Probably I get lost here, does it refer to the feature maps of images? It should be okay to change the image feature extractor of a pretrained tracking network, because the model mostly cares about correlation only (have not tested it though) 2. Yes I agree this is a problem.
@jianyuan_wang Regarding 1. I mean that if you have a Transformer directly on the images and exchange messages you update the feature maps of the images themselves. From what I understand from point tracking the underlying feature-map is fixed during the iterations (perhaps my misunderstanding)
@Parskatt ah I get it now. Yes in tracking people currently do not update the feature maps during the iterations. Instead, they update the feature of tracks for each iteration, an example is below github.com/facebookresear…