MLOps & Moving to Scripts

Huanfa Chen - huanfa.chen@ucl.ac.uk

13/12/2025

Intro to MLOps

Image Credit: https://ml-ops.org/content/mlops-principles

Manual process: everything is done by hand; using Rapid Application Development (RAD) tools like Jupyter Notebooks.
ML pipline automation: with continuous training (when new data comes in, retrain the model automatically).
CI/CD pipeline automation: to perform fast & reliable ML model deployment, usually on Cloud (AWS/GCP/Azure)

Preerquisites: using VSCode (or other IDE) instead of Jupyter Notebooks
Why? VScode better supports managing many files, with functions like Go to Definition, Refactor, etc.
VSCode works well with conda, docker, Podman, git, etc.
VSCode is industry standard IDE; good to get used to it early
vSCode works with LLM plugins: Github copilot, Gemini, etc.

Notebooks have been great so far for development and testing new ideas.

stateless: each run starts fresh; have to explicitly pass variables to functions and classes
linear: we have to run from top to bottom
modular: we can split code into functions and classes
testable: we can write unit tests and integration tests

Use configuration files (YAML, JSON, config.py) to manage parameters; avoid hardcoding values in other .py files
Avoid duplicated code; don’t write the same code in multiple places
use functions and utils.py to promote code reuse
Write unit tests to ensure code correctness and reliability

Avoid very long outputs (e.g. print 100 lines of dataframe)
Avoid unnecessary EDA (e.g. df.head(), df.describe()). If a notebook contains data processing, only call describe() ONCE after processing
Make sure plots are properly labelled with titles, axis labels, legends
If some outputs are important, save them to files (images -> jpeg/png, models -> pickle/joblib) instead of relying on notebook state
Restart kernel and rerun all cells before committing changes or sharing; avoid non-linear execution issues

sklearn pipelines help organise code, avoid data leakage, and improve reproducibility

Image Credit: sci-learn.org

MLOps ensures reliable and maintainable ML models in production
Moving from notebooks to scripts improves code quality, testability, and deployability
Using IDEs like VSCode enhances productivity and code management
Ensure good coding practice in notebooks