Synthesizing realistic human-object interaction motions is a critical problem in VR/AR and human animation. Unlike the commonly studied scenarios involving a single human or hand interacting with one object, we address a more generic multi-body setting with arbitrary numbers of humans, hands, and objects. This complexity introduces two significant challenges due to the high correlations and mutual influences among bodies, to which we propose corresponding solutions. First, to satisfy the high demands for synchronization of different body motions, we mathematically derive a new set of alignment scores during the training process, and use maximum likelihood sampling on a dynamic graphical model for explicit synchronization during inference. Second, the high-frequency interactions between objects are often overshadowed by the large-scale low-frequency movements. To address this, we introduce frequency decomposition and explicitly represent high-frequency components in the frequency domain. Extensive experiments across five datasets with various multi-body configurations demonstrate the superiority of SyncDiff over existing state-of-the-art motion synthesis methods.
We will present qualitative results in the order of TACO, CORE4D, OAKINK2, GRAB, and BEHAVE. The statistics are as follows:
TACO: 30 samples with comparison to baselines (MACS, DiffH2O) or ablation study results, 24 single samples of our method (Gallery). 54 samples in total.
CORE4D: 28 samples with comparison to baselines (OMOMO, CG-HOI) or ablation study results, 24 single samples of our method (Gallery). 52 samples in total.
OAKINK2: 18 samples with comparison to baselines (MACS, DiffH2O) or ablation study results, 24 single samples of our method (Gallery). 42 samples in total.
GRAB: 8 samples with comparison to baselines (MACS, DiffH2O), 4 samples with two different synthesized interactions (Gallery). 12 samples in total.
BEHAVE: 12 samples with comparison to baselines (OMOMO, CG-HOI).
w/o exp sync or w/o align loss lead to contact loss, unsynchronization, or abnormal shakings.
Among the two baseline methods, OMOMO with its stagewise diffusion, and CG-HOI with cross-attention between bodies and contact maps, relatively ensures the hand-object alignment. But they still fall short compared to our synchronization strategies, especially when the timing of cooperation between two individuals needs to be perfectly orchestrated. This is because the two baseline methods lack the joint optimization within a single diffusion model in both training and inference time. This might cause object trajectories to be unadvantageous, further leading to ineffective collaboration for two humans.
w/o decompose induces unnatural walking poses (e.g., sliding on ground), and sometimes unnatural joint rotations.
Synchronization mechanisms still play a significant role in human body-object interaction synthesis.
OAKINK2 poses higher demand on fine-grained control of motions, which require the combination of our two synchronization mechanisms.
Postgrasp setting. GRAB has slightly lower requirements for synchronization. DiffH2O performs better than MACS, but it is outperformed by our method, particularly in scenarios involving small objects or tricky grasping areas.
Here SyncDiff needs to synthesize complete motion sequences rather than just post-grasp ones. We sample two different trajectories using different initialization, to demonstrate the diversity of our method.
In general, due to the relatively simple setting of one-human-one-object, and the limited motion semantics (where the samples mostly consist of basic actions such as picking up, putting down, lateral/rotational movement), the disparity between our method and the baselines is not as pronounced as it is in the other four datasets. In the samples above, the advantages of our method are mostly manifested in more extensive motion completion, fewer interpenetration and contact loss, and more harmonious human postures.
The code will be made public on appropriate time (Expected 2025.7).
Wenkun He: wenkunhe2003@hotmail.com
Li Yi: ericyi0124@gmail.com