Brief about the role : Candidate is expected to productionalize & deploy the model code (base code will be developed by Data Science). This role will have to spend good amount of time in Scaling the pre-developed model training and inference (multi-GPU/CPU). We expect the candidate to have Strong Python/SQL , Kubernetes, Scaled Model deployment and ML experience.
About the Role:
We are seeking a highly skilled MLOps Engineer to play a pivotal role in bringing cutting-edge AI models to production. You will collaborate closely with our Data Science and Machine learning teams to optimize, scale, and deploy AI/ML models.
Responsibilities:
- Model Productionization and Deployment:
- Translate complex machine learning models into robust and scalable production systems.
- Deploy models to production environments using Kubernetes or other container orchestration tools.
- Ensure seamless integration of models with existing infrastructure and applications.
- Performance Optimization:
- Identify and implement strategies to optimize model training and inference performance.
- Leverage techniques like GPU acceleration, distributed training, and model quantization to improve efficiency.
- Monitor model performance in production and proactively address any performance bottlenecks.
- Scalability Engineering:
- Design and implement scalable solutions for handling large-scale data and model workloads.
- Optimize data pipelines and model serving infrastructure to meet growing demands.
- Collaborate with infrastructure teams to ensure adequate resources and capacity.
- ML Operations (MLOps):
- Establish and maintain robust MLOps practices to streamline model development, deployment, and monitoring.
- Implement automated pipelines for model training, testing, and deployment.
- Monitor model performance in production and take corrective actions as needed.
Qualifications:
5+ years of experience
- Strong proficiency in Python programming language and SQL.
- In-depth knowledge of machine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn).
- Expertise in Kubernetes or other container orchestration tools. – Good to have
- Deploying Code & Optimizing
- Experience with MLOps tools and frameworks (e.g., MLflow, Kubeflow).
- Version Controlling – Azure blob storage
- Experience with cloud platforms (e.g., AWS, GCP, Azure). – Azure is preferred
- Solid understanding of distributed computing and parallel programming.
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
Preferred Qualifications: 1 Good to have
- Knowledge of big data technologies (e.g., Hadoop, Spark).
- Experience with model optimization techniques (e.g., pruning, quantization, distillation).