Job description for Data Scientist Internship (Student) at Simplism.io Development Pte Ltd
Graduation Assignment: Developing an AI Agent for Automated Lead
Generation Workflows | Assignment for two AI students
Background and objective
Organizations use many communication channels to generate and convert leads, including
email, LinkedIn, phone, WhatsApp, paid advertising, webinars, events, website forms,
chatbots and partner referrals. Selecting the right workflow for each lead is complex
because performance depends on company type, industry, job title, country, company size,
lead source, intent signals, previous interactions, historical conversion data, timing,
channel costs and sales capacity.
The goal of Simplism.io is to develop an AI agent that supports sales and marketing teams
by automatically recommending and generating lead generation workflows. Instead of
manually deciding which lead to contact, which channel to use, which message to send and
when to follow up, Simplism.ai combines predictive AI and generative AI to create datadriven workflows.
This graduation assignment focuses on designing and validating a proof of concept for a
multi-model AI agent that transforms lead data into actionable, personalized and
performance-driven lead generation workflows.
How can a multi-model AI agent be designed and evaluated to automatically
recommend, generate and optimize lead generation workflows based on CRM data,
communication channel data and historical performance data?
Research areas
The assignment covers four connected research areas. First, the students investigate how
CatBoost can be used for lead scoring and conversion prediction. Relevant outcomes
include the probability of becoming a marketing qualified lead, sales qualified lead, booked
meeting, opportunity or customer. The research should identify which lead features are
most predictive, including industry, company size, country, job title, seniority, buying role,
lead source, intent score and previous engagement.
Second, the students investigate how LightGBM Ranker can rank communication channels
and workflow templates for a lead or segment. Ranking should not only be based on
engagement, but also on commercial value such as expected meeting rate, expected SQL
rate, expected opportunity rate, expected deal value and channel cost.
Third, the students explore how Vowpal Wabbit Contextual Bandit can support continuous
optimization. The system should improve over time by learning which next-best-action
performs best in a specific context, such as sending a follow-up email, sending a LinkedIn
message, making a phone call, sending a case study, moving a lead to nurture or notifying
sales.
Fourth, the students investigate how Gemma and/or Qwen3 can be used locally to generate
workflow descriptions, outreach copy, CRM notes, sales handoff summaries and structured
workflow output. The LLM should not make the commercial decisions independently; it
should generate output based on recommendations produced by the predictive models.
Scope
The assignment focuses on a proof of concept for a full production platform. The students
are expected to design the architecture, prepare or simulate a suitable dataset, train and
evaluate the models, and build a simple demonstration in which a user can enter lead
information and receive an automatically recommended workflow.
The proof of concept should include a data model for leads, accounts, touchpoints,
workflows and outcomes. Data should be based on metadata en metrics provided by the
Simplism.io platform. It should also include a lead scoring component, a channel or
workflow ranking component, a next-best-action optimization concept and a local LLMbased workflow generation component. Full CRM integration, live outreach to real
prospects and production-ready communication channel automation are outside the scope.
Division between the two students
Student 1 focuses on the predictive and optimization layer of Simplism.ai. This includes
CatBoost for lead scoring, LightGBM Ranker for channel and workflow ranking, and Vowpal
Wabbit Contextual Bandit for continuous optimization. The expected output is a model
layer that can produce lead scores, conversion probabilities, ranked channels, ranked
workflows and suggested next-best-actions. The student should also make the output
explainable by showing which features contributed most to the recommendation.
Student 2 focuses on the generative and agent layer of Simplism.ai. This includes Gemma
and/or Qwen3 as local LLMs for generating workflow descriptions, outreach messages,
CRM notes and sales handoff summaries. The student investigates which local LLM is most
suitable for the use case, with attention to output quality, structured output, multilingual
capability, hardware requirements and reliability. This student also designs guardrails for
tone of voice, channel restrictions, output validation, compliance checks and prevention of
unsuitable outreach suggestions.
Functional description of the prototype
The prototype should demonstrate how Simplism.ai could work in practice. A user should
be able to provide information about a lead or lead segment, such as industry, company
size, country, job title, lead source, intent score and previous engagement. Based on this
input, the system calculates a lead score, estimates conversion probability, recommends
relevant communication channels, selects or ranks suitable workflow templates and
suggests the next best action.
The generative component then turns these recommendations into a practical workflow.
The workflow should include the target segment, recommended channels, sequence logic,
timing, objective, call-to-action, sales handoff rule and message direction. The generated
output should be structured enough to be used in a CRM, sales engagement tool or
marketing automation platform in a later product phase.
Expected deliverables and evaluation
The students are expected to deliver a research report, a working proof of concept and a
final presentation. The research report should describe the problem, the model choices, the
data structure, the architecture, the experiments, the evaluation results, limitations and
recommendations for further development.
The proof of concept should demonstrate the end-to-end flow from lead input to workflow
recommendation. It should show how CatBoost, LightGBM Ranker, Vowpal Wabbit and
Gemma or Qwen3 can work together as part of one AI agent architecture. The prototype
does not need to be production-ready, but it should be technically sound and suitable for
demonstrating the feasibility of Simplism.ai.
For the predictive models, relevant metrics may include AUC, precision, recall, calibration,
lift, ranking quality and expected business value. For the LLM component, relevant
evaluation criteria may include output completeness, relevance, consistency, structured
output validity, tone quality and usefulness for sales or marketing users. The students
should also address limitations such as data quality, data availability, bias, privacy
requirements, compliance risks and the reliability of LLM-generated output.
Desired student profile and final summary
This assignment is suitable for two AI, data science or machine learning students with an
interest in applied AI, sales technology, marketing automation and local language models.
Relevant knowledge areas include supervised learning, ranking models, recommendation
systems, contextual bandits, prompt engineering, local LLM deployment, evaluation metrics
and basic software prototyping.
The end result should be a working proof of concept for Simplism.io, an AI agent for
automated lead generation workflow creation, recommendation and optimization.
CatBoost predicts lead quality and conversion probability, LightGBM Ranker ranks
channels and workflows, Vowpal Wabbit explores continuous next-best-action
optimization, and Gemma or Qwen3 generates workflow descriptions, outreach messages,
CRM notes and structured output. Together, these models demonstrate how predictive AI
and generative AI can collaborate inside one agent system to help sales and marketing
teams create more effective, data-driven workflows
