Face perception depends on a dynamic interplay of a “holistic” Interactive Feature Processing (IFP) and a Local Feature Processing (LFP) style. However, it is unclear whether features are processed locally before they are integrated into a holistic percept (Fine-to-Coarse strategy), or whether local feature processing occurs only after a holistic percept is established (Coarse-to-Fine strategy). The present Event-Related Potentials study investigates whether IFP precedes LFP (Coarse-to-Fine) or vice versa (Fine-to-Coarse). Participants matched target features within face pairs (here the eye region), in which distracter features (nose and mouth) called for the same or a different response (congruent and incongruent, respectively). Psychophysical results replicated previous findings. That is, dissimilar target features are locally processed (LFP), which minimizes interference from surrounding incongruent distracters. Conversely, an IFP mode is elicited when similar target features are embedded in dissimilar contexts. In IFP mode, incongruent distracters do interfere with the processing of similar target features, thereby deteriorating task performance. Face inversion, which preserves input properties but disrupts high-level face perception, annihilated these incongruency effects. Psychophysical observations were reflected at the neural level. The IFP and LFP modes of face perception elicited distinct time-courses in occipito-temporal cortex. IFP was affected by inversion as soon as 176 ms post-stimulus onset (coinciding with the N170 peak). In contrast, the first robust indications of LFP occurred 120 ms later, at 296 ms. Thus, the contribution of IFP to high-level face perception appears to temporally precede LFP. Moreover, results showed that the IFP and LFP modes did not only operate in distinct time intervals, but also in different brain areas: activity associated with the IFP mode was right-lateralized, whereas the LPF mode engaged the left hemisphere. In sum, interactive “holistic” encoding of facial features temporally precedes their local analysis. This agrees with models suggesting a Coarse-to-Fine strategy for face perception, in line with generic descriptions of visual perception in which global scene analysis precedes the examination of local details.