3D object detection is an important part of the road environment perception task of autonomous driving. Its mainstream framework is a detection solution that uses multiple sensing devices to obtain multi-modal data to achieve multi-sensor fusion. The geometric distortions and unequal information priorities in the traditional sensor fusion process result in insufficient 3D object detection performance of sensor fusion. A multi-sensor fusion 3D object detection algorithm is proposed based on bird's-eye view (BEV). The lift-splat-shot (LSS) method obtains the potential depth distribution of the image and establishes the feature map of the image in the BEV space. The set abstraction method of PV-RCNN is used to establish the feature map of the point cloud in BEV space. A low-complexity feature encoding network is designed for fusing multi-modal features in a unified BEV space in the proposed method to achieve 3D object detection. Experimental results show that the proposed method improves accuracy by 4.8% compared to the pure LiDAR methods, reduces parameters by 47% compared to the traditional fusion methods, and maintains similar accuracy. The proposed method meets the detection requirements of the road environment perception task of the autonomous driving system.