Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

Jonathan Taylor, Lucas Bordeaux, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, Arran Topalian, Erroll Wood, Sameh Khamis, Pushmeet Kohli, Shahram Izadi, Richard Banks, Andrew Fitzgibbon, Jamie Shotton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

90 Citations (Scopus)

Abstract

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.
Original languageEnglish
Title of host publicationACM Transaction on Graphics (TOG)
Subtitle of host publication Proceedings of ACM SIGGRAPH
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Volume35
DOIs
Publication statusPublished - 31 Jul 2016
Externally publishedYes

Publication series

NameACM Transactions on Graphics (TOG)
PublisherACM
Number4
Volume35
ISSN (Print)0730-0301
ISSN (Electronic)1557-7368

Fingerprint

Cameras
Program processors
Learning systems
Recovery
Costs
Graphics processing unit

Cite this

Taylor, J., Bordeaux, L., Corish, B., Keskin, C., Sharp, T., Soto, E., ... Shotton, J. (2016). Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. In ACM Transaction on Graphics (TOG) : Proceedings of ACM SIGGRAPH (Vol. 35). (ACM Transactions on Graphics (TOG); Vol. 35, No. 4). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/2897824.2925965
Taylor, Jonathan ; Bordeaux, Lucas ; Corish, Bob ; Keskin, Cem ; Sharp, Toby ; Soto, Eduardo ; Sweeney, David ; Valentin, Julien ; Luff, Benjamin ; Topalian, Arran ; Wood, Erroll ; Khamis, Sameh ; Kohli, Pushmeet ; Izadi, Shahram ; Banks, Richard ; Fitzgibbon, Andrew ; Shotton, Jamie. / Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Transaction on Graphics (TOG) : Proceedings of ACM SIGGRAPH. Vol. 35 New York : Association for Computing Machinery (ACM), 2016. (ACM Transactions on Graphics (TOG); 4).
@inproceedings{50ec83ed0f374d37b631a00672ebda60,
title = "Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences",
abstract = "Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.",
author = "Jonathan Taylor and Lucas Bordeaux and Bob Corish and Cem Keskin and Toby Sharp and Eduardo Soto and David Sweeney and Julien Valentin and Benjamin Luff and Arran Topalian and Erroll Wood and Sameh Khamis and Pushmeet Kohli and Shahram Izadi and Richard Banks and Andrew Fitzgibbon and Jamie Shotton",
year = "2016",
month = "7",
day = "31",
doi = "10.1145/2897824.2925965",
language = "English",
volume = "35",
series = "ACM Transactions on Graphics (TOG)",
publisher = "Association for Computing Machinery (ACM)",
number = "4",
booktitle = "ACM Transaction on Graphics (TOG)",
address = "United States",

}

Taylor, J, Bordeaux, L, Corish, B, Keskin, C, Sharp, T, Soto, E, Sweeney, D, Valentin, J, Luff, B, Topalian, A, Wood, E, Khamis, S, Kohli, P, Izadi, S, Banks, R, Fitzgibbon, A & Shotton, J 2016, Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. in ACM Transaction on Graphics (TOG) : Proceedings of ACM SIGGRAPH. vol. 35, ACM Transactions on Graphics (TOG), no. 4, vol. 35, Association for Computing Machinery (ACM), New York. https://doi.org/10.1145/2897824.2925965

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. / Taylor, Jonathan; Bordeaux, Lucas; Corish, Bob; Keskin, Cem; Sharp, Toby; Soto, Eduardo; Sweeney, David; Valentin, Julien; Luff, Benjamin; Topalian, Arran; Wood, Erroll; Khamis, Sameh; Kohli, Pushmeet; Izadi, Shahram; Banks, Richard; Fitzgibbon, Andrew; Shotton, Jamie.

ACM Transaction on Graphics (TOG) : Proceedings of ACM SIGGRAPH. Vol. 35 New York : Association for Computing Machinery (ACM), 2016. (ACM Transactions on Graphics (TOG); Vol. 35, No. 4).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

AU - Taylor, Jonathan

AU - Bordeaux, Lucas

AU - Corish, Bob

AU - Keskin, Cem

AU - Sharp, Toby

AU - Soto, Eduardo

AU - Sweeney, David

AU - Valentin, Julien

AU - Luff, Benjamin

AU - Topalian, Arran

AU - Wood, Erroll

AU - Khamis, Sameh

AU - Kohli, Pushmeet

AU - Izadi, Shahram

AU - Banks, Richard

AU - Fitzgibbon, Andrew

AU - Shotton, Jamie

PY - 2016/7/31

Y1 - 2016/7/31

N2 - Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

AB - Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

U2 - 10.1145/2897824.2925965

DO - 10.1145/2897824.2925965

M3 - Conference contribution

VL - 35

T3 - ACM Transactions on Graphics (TOG)

BT - ACM Transaction on Graphics (TOG)

PB - Association for Computing Machinery (ACM)

CY - New York

ER -

Taylor J, Bordeaux L, Corish B, Keskin C, Sharp T, Soto E et al. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. In ACM Transaction on Graphics (TOG) : Proceedings of ACM SIGGRAPH. Vol. 35. New York: Association for Computing Machinery (ACM). 2016. (ACM Transactions on Graphics (TOG); 4). https://doi.org/10.1145/2897824.2925965