We propose the Hierarchical Detection Network (HDN) for the detection of facial palsy syndrome. This can be the first deep-learning based approach for the facial palsy detection. The proposed HDN consists of three component networks, the first detects faces, the second detects the landmarks on the detected faces, and the third detects the local palsy regions. The first and the third component networks are built on the Darknet framework, but with fewer layers of convolutions for shorter processing speed. The second component network employs the latest 3D face alignment network for locating the landmarks. The first component network employs a Na × Na grid over the overall input image, while the third component network employs a Nb × Nb grid over each detected face, making the HDN capable of efficiently locating the affected palsy regions. As previous approaches were evaluated on proprietary databases, we have collected 32 videos from YouTube and made the first public database for facial palsy study. To enhance the robustness against expression variations, we include the CK+ facial expression database in the training and testing phases. We show that the HDN does not just detect the local palsy regions, but also captures the frequency of the intensity, enabling the video-to-description diagnosis of the syndrome. Experiments show that the proposed approach offers an accurate and efficient solution for facial palsy detection/diagnosis.