Project Name: Predicting datacenter hard drive failures using Spark and Python
The goal of this project was to predict and classify which hard drives models have high or low reliability. Working in a team of four, my project partners and I utilized data provided by BackBlaze. Identified the primary drivers for early failures in hard drives that BackBlaze uses to store customer data in their data center. Together developed a model to predict early failures using SMART Stats features found in the data. Used our findings to make clear recommendations regarding hard drive reliability based on a given hard drive's model type, manufacturer, and other criteria. Our deliverables included presentation slides, an analysis notebook, and a hard drive models reliability index.
I am eager to showcase my newly acquired skillset and expertise across the full data science pipeline. With 10 years’ experience in government analytics, I possess a well-rounded perspective of a project’s high-level impact on an organization, as well as a strong comprehension of the technical details. Finally, I am a highly effective communicator who will provide actionable insights at the time of project completion.
approaching problem solving with purpose and recognizing errors as one of the pathways to achievement.
San Antonio, Texas
Instinctive, disciplined, and determined
Team environment, engaged work culture, open communication