BACKGROUND: Previous studies suggest that auditory evoked potentials (AEP) may be used to monitor anaesthetic depth. However, during surgery and anaesthesia, the quality of AEP recordings may be reduced by artefacts. This can affect the interpretation of the data and complicate the use of the method. We assessed differences in expert ratings of the signal quality of perioperatively recorded AEPs. METHODS: Signal quality of 180 randomly selected AEP, recorded perioperatively during a European multicentre study, was rated independently by five experts as 'invalid' (0), 'poor' (1), or 'good' (2). Average (n=5) quality rating was calculated for each signal. Differences between quality ratings of the five experts were calculated for each AEP: inter-rater variability (IRV) was calculated as the difference between the worst and best classification of a signal. RESULTS: Average signal quality of 57% of the AEPs was rated as 'invalid', 39% as 'poor', and only 4% as 'good'. IRV was 0 in only 6%, 1 in 62%, and 2 in 32% of the AEP, that is in 32% one expert said signal quality was good, whereas a different expert thought the identical signal was invalid. CONCLUSIONS: There is poor agreement between experts regarding the signal quality of perioperatively recorded AEPs and, as a consequence, results obtained by one expert may not easily be reproduced by a different expert. This limits the use of visual AEP analysis to indicate anaesthetic depth and may affect the comparability of AEP studies, where waveforms were analysed by different experts. An objective automated method for AEP analysis could solve this problem.