Research on human attraction frequently makes use of single-modality stimuli such as neutral-expression facial photographs as proxy indicators of an individual’s attractiveness. However, we know little about how judgments of these single-modality stimuli correspond to judgments of stimuli that incorporate multi-modal cues of face, body and speech. In the present study, ratings of attractiveness judged from videos of participants introducing themselves were independently predicted by judgments of the participant’s facial attractiveness (a neutral-expression facial photograph masked to conceal the hairstyle), body attractiveness (a photograph of the upper body), and speech attractiveness (the soundtrack to the video). We also found that ratings of the face, body and speech were positively related to each other. Our results support the assumption that the single-modality stimuli used in much attractiveness research are valid proxy indicators of overall attractiveness in ecologically valid contexts, and complement literature showing cross-modality concordance of trait attractiveness, but also recommend that research relying on assessments of individual attractiveness take account of both visual and vocal attractiveness where possible.