JDSAT

Description: The team constructed a Machine Learning Operations (MLOps) Pipeline.

Overview: MLOps serves as the intersection of machine learning, data engineering, and DevOps and is process of deploying a machine learning model into production.

Goals:

Develop MLOps Pipeline

Deploy tools in a multi-cloud environment

Align to current delivery efforts

Problem Statement: Given aggregated information through year N-1 of an officer’s Naval Medical Corps Career, predict the probability they will leave in year N.

The team performed the following activities:

· Explored BUMIS II data set.

· Removed PII for cloud use through data anonymization.

· Conducted extensive data cleaning to remove variables that were not populated adequately.

· Combined single rows of data containing an individual’s year in their career into an individual row summarizing career(~10K records).

· Parsed the data using 90% for the training setand 10% as a test set.

· Performed model sensitivity analysis throughinvestigating model output impacts based on individual parameter changes.

· Incorporated Key Performance Indicators (KPIs)in user feedback loops to support approval/disapproval of model deployment in a production environment.

Used JDSAT’s AWS Dev Sandbox Account to train and test the ML Model

· AWS Service: Sagemaker used to train the model

· Model: Classic logistic regression

· Algorithm: XGBoost

Pros of Sagemaker:

· Multiple options available for modelling a variety of problems

· Model versioning

· Data/Model Drift Scenarios

· Hyper Parameter Tuning

In addition to AWS Sagemaker, the team explored similar technologies inside of the Google Cloud Platform service BigQuery.

Pros of BigQuery ML:

· Serverless data warehouse with SQL-like querying and built in ML capabilities

· Great for ad-hoc requests

Cons of BigQuery ML:

· Less robust

· Lacks portability

Results: The team successfully created a functional and interactive MLOps pipeline showcasing ML capabilities across multiple cloud platforms (AWS and GoogleCloud Platform). In addition, the team evaluated costs for ML Services and considered data security when deploying confidential data to the cloud.

Case Study

Experiment

Essay

White Paper

Strategic Innovation Group: MLOps Pipeline in Cloud Environment

CONTRACTOR DATA
CAGE: 6ZF65
DUNS: 07-912-0844
SAM UEI: ZEJYK3U8KCB7

Primary NAICS:
541330: Engineering Services
541614: Operations Research
541690: Other Scientific Consulting
541511: Custom Programming

Process

Company

Forum

More