The value of analytics is clear, right? Maybe not. Value-creation from data is perhaps the biggest challenge working data scientists face. After all, many highlight the inability of analytics to produce tangible business value. Plenty of articles pinpoint how and why projects go wrong due to factors such as the lack of stakeholder buy-in, inadequate talent, poor data quality, and unclear direction. Given this environment, it’s hard to overestimate the value of incorporating minimum viable products into the project life cycle as soon as possible.
A minimum viable product is a fully operational product containing only the necessary functionality to highlight its core usefulness. The product could be an application, report, or predictive model. Minimal viable products often make the most persuasive case for highlighting the value of analytics to an organization. This also forces data scientists to reckon with data-related challenges earlier, which could affect the project’s value proposition. Value, for all parties, should be the project’s true north. And where value exists, investment follows.
However, generating minimum viable products is a challenging venture. Organizational skepticism often translates into tight budgets, strict deadlines, and clear value-add communication. Under such working conditions, engineering creativity is paramount. It’s in this suspenseful stage of the process that I’ve found tremendous value in tools like Docker and cloud products like Azure Container Instances and Logic Apps. These tools help data scientists stay nimble in producing minimum viable products that can handle production environments. I’d like to share one generalizable use case.
A lot of my analyses start in either R Studio or a Jupyter Notebook and usually take some variant of the following pattern:
The current walk-through example followed that pattern:
When the entire machine-learning workflow is housed in a single file, the goal is often to schedule the file on a recurrent basis. While useful options exist in Azure for executing machine learning workflows, many of the use cases shared by Microsoft and others assume large-scale infrastructure requirements, relatively straightforward modeling, and a desire to integrate a full-blown CI/CD pipeline. These options are all legitimate areas of development but are not conducive to a minimum viable product.
Analytics in a Behavioral Health Care Organizations
The value proposition is straightforward: When a clinician needs to create a personalized service plan for an individual seeking services, they’d like to view what treatment options were recommended to individuals in the past with a similar clinical and demographic profile.
A clustering algorithm is employed to group clients together with similar characteristics. The resulting model output and custom metrics are merged with other datasets and staged in a SQL database for use in PowerBI. However, the value proposition requires that the process run daily.
Running model scripts daily
The following setup tackled the challenges of running jobs with database credentials, custom compute, cost control, and highly customized analyses all in a fast and cost-effective manner.
A few notes on the architecture:
Conclusion
Value-creation is the goal to which all work efforts should be aimed, and minimum-viable products help to achieve this purpose. The workflow presented is generalizable to a wide range of use cases faced by working data scientists.
Sources