Principal Data Scientist · Dublin, Ireland

Building Impactful Scalable solutions

I'm Nagesh Yadav. I spend my time making AI useful for large organisations, from strategy and data foundations all the way through to production systems. PhD researcher, with patents, and ten years of shipping things that matter.

Nagesh Yadav
Location
Dublin, Ireland
Current Role
Principal Data Scientist, Optum / UHG
Education
PhD Computer Science, UCD Ireland
Sectors
Healthcare, Finance, Technology

AI/ML scientist who builds things

I've spent the last decade working with Fortune 100 companies across healthcare, finance, and tech. I hold a PhD in Computer Science from University College Dublin, and I've always been more interested in making things work than writing about theory.

Right now I'm a Principal Data Scientist at Optum (UnitedHealth Group) in Dublin, where I work on AI and Machine Learning projects. That means building data foundations, shipping ML models, and figuring out where AI genuinely helps.

I also build things on the side. Clinical transcription tools for GPs, voice translation apps, Android applications, local LLM setups. I like solving real problems with whatever tools make sense.

Generative AILLMs & RAGAgentic AINLP / NERKnowledge GraphsHealthcare AIDeep LearningPyTorchDatabricksSnowflakeKubernetesAndroid / Kotlin

Where I've worked

A mix of research labs, global banks, and healthcare companies. Always at the intersection of AI, data science and real business problems.

Apr 2024 - Present
Principal Data Scientist
Optum, UnitedHealth Group, Dublin
Built an Agentic RAG system for sales analytics that handles unstructured data mining and next-best-action recommendations.
Developed disease risk prediction and patient segmentation models using claims and EHR data.
Championed synthetic data generation across the team.
Partner with business and tech leaders to identify AI opportunities and translate them into technical roadmaps.
2022 - 2024
Senior Vice President, Data Science
Bank of New York (BNY), Dublin
Led data science projects across multiple business lines and drove adoption of AI/ML across the organisation.
Built RAG pipelines using LangChain and Llama models to improve operational efficiency.
Engaged with universities for research collaborations and gave external talks.
2016 - 2022
Senior Data Scientist
IBM, Dublin
Led end-to-end AI/NLP solutions across healthcare, policy intelligence, and population health.
Shortlisted for the US-Ireland Research Innovation Award (American Chamber of Commerce / Royal Irish Academy) for NLP work in healthcare claims adjudication.
Built knowledge graphs and NLP pipelines using SpaCy, CoreNLP, and custom NER models for extracting insights from complex policy documents.
Developed large-scale recommender systems for healthcare decision support using collaborative filtering, PCA, and deep learning.
2014 - 2016
Software Engineer (R&D)
Zinc Software, Dublin
Developed algorithms for motion pattern detection from accelerometer and magnetometer data.
Built a pattern recognition system for exercise characterisation and exposed it via REST endpoints.
2013 - 2014
Postdoctoral Researcher
Dublin Institute of Technology
Developed real-time position and orientation tracking using sensor fusion (inertial + ultrasonic) with extended Kalman filters.

Few things I built (in my own time)

I like working on things outside of my day job. Here are some of the projects I've been building recently.

Android App

ClinicalScribe

A GP clinical transcription and documentation app that runs entirely on-device. Uses MediaPipe for local LLM inference, Android SpeechRecognizer for transcription, and Room for persistence. No cloud calls, complete patient privacy.

KotlinJetpack ComposeMediaPipeMaterial 3Room DB
Voice AI

3-Way Voice Translator

Real-time voice translation between English, Hindi, and Slovak. Built on Whisper for speech recognition, Ollama with Qwen 2.5 for translation, and Piper TTS for voice synthesis. Hindi falls back to gTTS when needed.

PythonWhisperOllamaPiper TTSQwen 2.5
Healthcare AI

MedScan

An Android app for medical document OCR and named entity recognition. Scans physical documents, extracts text, and identifies clinical entities like medications, conditions, and procedures.

AndroidML KitNEROCR
Enterprise AI

Agentic RAG for Sales Analytics

An agentic retrieval-augmented generation system built for unstructured data mining and next-best-action recommendations.

PythonLangChainDatabricksRAGAgentic AI
Web App

ClinicalScribe Web

A Raspberry Pi hosted web version of ClinicalScribe, making clinical transcription accessible through a browser. Same privacy-first approach, different form factor.

Raspberry PiWebWhisperLocal LLM

Selected Publications

Peer-reviewed work spanning NLP, healthcare informatics, sensor fusion, and AI systems.

2021
Towards Protecting Vital Healthcare Programs by Extracting Actionable Knowledge from Policy
ACL-IJCNLP 2021 Findings
2020
Exploring the Social Drivers of Health During a Pandemic: Leveraging Knowledge Graphs and Population Trends in COVID-19
Studies in Health Technology and Informatics
2020
Mitigating Vocabulary Mismatch on Multi-domain Corpus using Word Embeddings and Thesaurus
NLPinAI 2020
2017
Note Highlights: Surfacing Relevant Concepts from Unstructured Data for Health Professionals
IEEE ICHI 2017
2016
Fast calibration of a 9-DOF IMU using a 3 DOF position tracker and a semi-random motion sequence
Elsevier Measurement
2014
Accurate IMU-Based Orientation Estimation Under Conditions of Magnetic Distortion
Sensors
2012
Hybrid Bayesian Fusion of Range-based and Sourceless Location Estimates Under Varying Observability
IEEE Intelligent Systems
2011
Two Stage Kalman Filtering for Position Estimation using Dual Inertial Measurement Units
IEEE Sensors

Selected Patents

Real-time Analysis of Predictive Audience Feedback During Content Creation
U.S. Patent No. 10,169,713 · Jan 2019
System and Method for Augmenting Questionnaires
U.S. Patent No. 11,033,216 · Jun 2021

Certifications

Level 2 (Expert) Master Certified Data Scientist
The Open Group
Azure Data Scientist Associate
Microsoft
ML with TensorFlow on Google Cloud
Coursera / Google
Build, Train & Deploy ML Pipelines using BERT
Coursera / Amazon SageMaker
Enterprise Design Thinking Practitioner
IBM

Academic background

Ph.D.
Computer Science
University College Dublin, Ireland, 2013
M.Tech
Intelligent Systems, Information Technology
IIIT Allahabad, India
B.Tech
Computer Science
VBS Purvanchal University, India

Let's talk

Whether it's about a collaboration, a role, or just a conversation about something interesting.