Rishin Rahim

Senior Machine Learning Engineer


2021 - Present
As a Sr. Machine Learning engineer at Blueoptima,I currently lead the the Code Authorship Detection (CAD) Project: We build models to identify developer signatures based on stylistic and structural patterns in code, effectively creating unique "signature embeddings" for distinct authorship identification by utilising siamese networks with transformer encoders and pre-trained models like CodeBert. Within CAD, we also explore methods to detect AI-generated code and assess its impact on developer productivity and quality. Our models extract relevant features from code representations to capture style, structure, and naturalness, effectively identifying AI-generated code. We have built dedicated models for different programming languages and currently support 8 Programming languages including C, Java and Python.

Previously, I also successfully led the Beta Release of the Pure Coding Time Estimation Project, achieving an impressive correlation of over 0.4 between estimated and actual coding effort. This involved building Neural hidden Markov Models trained on 1.3Bn data points across 160k developers , extracting their generic coding behaviour patterns. My role also extends to managing and maintaining the end-to-end deployment of ML models in production environments, using Kubernetes and Docker for containerized deployments, monitoring model performance using tools like Prometheus/Grafana and MLflow for model lifecycle management and retraining strategies.
2020 - 2021
I was a Sr. Machine Learning Engineer at Unisys. At Unisys, I developed and implemented ML models for intent identification and conversation triaging within IntelliServe™, a conversational customer support AI tool. Leveraged NLP techniques like RNNs and NER for model development. Also, Contributed significantly to the development of utterance models leveraging knowledge graphs to create synthetic training data for IntelliServe™ conversation engine.This enabled automated conversation generation and facilitated faster model training and iteration.

I also played a significant role in the development of CloudForte™ (cloud resource optimization platform) where we developed time series forecasting models, leading to significant resource optimization in cloud environments, aiding users in making informed decisions about upscaling or downscaling of resources.All development activities were done within the Databricks ecosystem, and the models were deployed in the Azure cloud.
2015 - 2020
I Joined TCS right after college as a Junior ML Engineer. I was part of the TCS Robotics unit. During my tenure, I actively participated in numerous projects and proof-of-concepts (POCs). Two noteworthy projects are Contract Digitization and SmartQE. In Contract Digitisation, an AI platform for Legal Documents, I designed and implemented a robust pipeline for collecting, storing, cleaning, and transforming legal documents. I also developed models for extracting standard agreement/contract clauses using various NLP techniques. For SmartQE, a test suite optimization model, I significantly enhanced test coverage and reduced redundancy. I created a Defect Prediction model to forecast the number of defects in future application releases. Additionally, I developed a Comparison model to measure the degree of similarity between different test step executions.
2012 - 2014
Master of Science Degree (MS), specialised in information technology, from Indian Institute of Information Technology & Management Kerala (Upgraded to Digital University Kerala). It was in my master’s program that I got interested in ML. My masters thesis was titled Threshold logic Object Detection using FPGAs where we proposed 2 novel techniques for object detection using FPGAs. The design and verification was done using Verilog HDL. The design is then synthesised and mapped into FPGA. Thesis Advisor: Dr. Alex P James.
2007 - 2011
Bachelor of Technology Degree (B.Tech), specialised in information technology from Cochin University of Science and Technology
My Tech writings

These writings comprises a curated collection of my observations and insights as I delve into the intricate realms of machine learning. While much of the content remains relevant, some entries might reflect the state of knowledge at the time they were written and should be interpreted within their historical context. The primary objective of sharing these notes is educational; they are not intended to disseminate misinformation

Pet projects
Neurosurgical E-log : A secure and private neurosurgical case log designed for surgeons to post, plan, share, and analyze their cases. The system, developed for the Sri Chithra Thirunal Institute of Medical Science, incorporates a detailed range of neurosurgical fields and features a multipurpose search tool. Developed using Python, Web.py framework, SQLite3, HTML5, CSS3, JavaScript, jQuery, and Ajax.and Ajax.
Cinemastat A data visualization experiment using Malayalam movie data is a project I'm passionate about and currently a work in progress. Unfortunately, due to work commitments, completing and releasing this has been challenging. I'm optimistic about finding time later this year. Fingers crossed!
CS231n-assignmentsI am a huge fan of Andrej Karpathy! This website is entirely inspired by his personal site. My introduction to Karpathy was through this course, a high-quality program that he both conceived and taught. It details the foundations of machine learning in an intuitive and simplified manner. This GitHub repo contains my course notes and assignment works. Although it was released in 2016, this course remains relevant and highly informative.
HF-experiments HuggingFace is the github for machine learning. This project is my experiments using huggingface libraries, datasets and models. It majorly includes all the exercises mentioned in the book Natural Language Processing with Transformers
Miscellaneous