Aerial imagery has been increasingly adopted in mission-critical tasks, such as traffic surveillance, smart cities,and disaster assistance. However, identifying objects from aerial images faces the following challenges: 1) objects of interests are often too small and too dense relative to the images; 2)objects of interests are often in different relative sizes; and3) the number of objects in each category is imbalanced. A novel network structure,Points Estimated Network (PENet), is proposed in this work to answer these challenges. PENet uses a Mask Resampling Module (MRM)to augment the imbalanced datasets, a coarse anchor-free detector (CPEN) to effectively predict the center points of the small object clusters, and a fine anchor-free detector FPEN to locate the precise positions of the small objects. An adaptive merge algorithm Non-maximum Merge (NMM)is implemented in CPEN to address the issue of detecting dense small objects, and a hierarchical loss is defined in FPEN to further improve the classification accuracy. Our extensive experiments on aerial datasets visDrone [1] and UAVDT[2] showed that PENet achieved higher precision results thane xisting state-of-the-art approaches. Our best model achieved8.7%improvement on visDrone and20.3%on UAVDT.