Occupancy estimation is an important primitive for a wide range of applications including building energy efficiency, safety, and security. In this paper, we explore the potential of using depth sensors to detect, estimate, identify, and track occupants in buildings. While depth sensors have been widely used for human detection and gesture recognition, computer vision algorithms are typically run on a powerful computer like XBOX or IntelR CoreTM i7 processor. In this work, we develop a prototype system called FORK using off-the-shelf components that performs the entire depth data processing on a cheaper and low power ARM processor in real-time. As ARM processors are extremely weak in running computer vision algorithms, FORK is designed to detect humans and track them in a very efficient way by leveraging a novel lightweight model based approach instead of traditional approaches based on histogram of oriented gradients (HOG) features. Unlike other camera based approaches, FORK is much less privacy invasive (even if the sensor is compromised). Based on a complete implementation, real-world deployment, and extensive evaluation at realistic scenarios, we observe that FORK achieves over 99% accuracy in real-time (4-9 FPS) in occupancy estimation.