Data Scientist Internship (Student)

Simplism.io Development Pte Ltd

SGD1,250 - 1,500

Computer & Software Backend Developer

Freelance · On-site

Job Requirements

On-site

Job description for Data Scientist Internship (Student) at Simplism.io Development Pte Ltd

Graduation Assignment: Developing an AI Agent for Automated Lead

Generation Workflows | Assignment for two AI students

Background and objective

Organizations use many communication channels to generate and convert leads, including

email, LinkedIn, phone, WhatsApp, paid advertising, webinars, events, website forms,

chatbots and partner referrals. Selecting the right workflow for each lead is complex

because performance depends on company type, industry, job title, country, company size,

lead source, intent signals, previous interactions, historical conversion data, timing,

channel costs and sales capacity.

The goal of Simplism.io is to develop an AI agent that supports sales and marketing teams

by automatically recommending and generating lead generation workflows. Instead of

manually deciding which lead to contact, which channel to use, which message to send and

when to follow up, Simplism.ai combines predictive AI and generative AI to create datadriven workflows.

This graduation assignment focuses on designing and validating a proof of concept for a

multi-model AI agent that transforms lead data into actionable, personalized and

performance-driven lead generation workflows.

How can a multi-model AI agent be designed and evaluated to automatically

recommend, generate and optimize lead generation workflows based on CRM data,

communication channel data and historical performance data?

Research areas

The assignment covers four connected research areas. First, the students investigate how

CatBoost can be used for lead scoring and conversion prediction. Relevant outcomes

include the probability of becoming a marketing qualified lead, sales qualified lead, booked

meeting, opportunity or customer. The research should identify which lead features are

most predictive, including industry, company size, country, job title, seniority, buying role,

lead source, intent score and previous engagement.

Second, the students investigate how LightGBM Ranker can rank communication channels

and workflow templates for a lead or segment. Ranking should not only be based on

engagement, but also on commercial value such as expected meeting rate, expected SQL

rate, expected opportunity rate, expected deal value and channel cost.

Third, the students explore how Vowpal Wabbit Contextual Bandit can support continuous

optimization. The system should improve over time by learning which next-best-action

performs best in a specific context, such as sending a follow-up email, sending a LinkedIn

message, making a phone call, sending a case study, moving a lead to nurture or notifying

sales.

Fourth, the students investigate how Gemma and/or Qwen3 can be used locally to generate

workflow descriptions, outreach copy, CRM notes, sales handoff summaries and structured

workflow output. The LLM should not make the commercial decisions independently; it

should generate output based on recommendations produced by the predictive models.

Scope

The assignment focuses on a proof of concept for a full production platform. The students

are expected to design the architecture, prepare or simulate a suitable dataset, train and

evaluate the models, and build a simple demonstration in which a user can enter lead

information and receive an automatically recommended workflow.

The proof of concept should include a data model for leads, accounts, touchpoints,

workflows and outcomes. Data should be based on metadata en metrics provided by the

Simplism.io platform. It should also include a lead scoring component, a channel or

workflow ranking component, a next-best-action optimization concept and a local LLMbased workflow generation component. Full CRM integration, live outreach to real

prospects and production-ready communication channel automation are outside the scope.

Division between the two students

Student 1 focuses on the predictive and optimization layer of Simplism.ai. This includes

CatBoost for lead scoring, LightGBM Ranker for channel and workflow ranking, and Vowpal

Wabbit Contextual Bandit for continuous optimization. The expected output is a model

layer that can produce lead scores, conversion probabilities, ranked channels, ranked

workflows and suggested next-best-actions. The student should also make the output

explainable by showing which features contributed most to the recommendation.

Student 2 focuses on the generative and agent layer of Simplism.ai. This includes Gemma

and/or Qwen3 as local LLMs for generating workflow descriptions, outreach messages,

CRM notes and sales handoff summaries. The student investigates which local LLM is most

suitable for the use case, with attention to output quality, structured output, multilingual

capability, hardware requirements and reliability. This student also designs guardrails for

tone of voice, channel restrictions, output validation, compliance checks and prevention of

unsuitable outreach suggestions.

Functional description of the prototype

The prototype should demonstrate how Simplism.ai could work in practice. A user should

be able to provide information about a lead or lead segment, such as industry, company

size, country, job title, lead source, intent score and previous engagement. Based on this

input, the system calculates a lead score, estimates conversion probability, recommends

relevant communication channels, selects or ranks suitable workflow templates and

suggests the next best action.

The generative component then turns these recommendations into a practical workflow.

The workflow should include the target segment, recommended channels, sequence logic,

timing, objective, call-to-action, sales handoff rule and message direction. The generated

output should be structured enough to be used in a CRM, sales engagement tool or

marketing automation platform in a later product phase.

Expected deliverables and evaluation

The students are expected to deliver a research report, a working proof of concept and a

final presentation. The research report should describe the problem, the model choices, the

data structure, the architecture, the experiments, the evaluation results, limitations and

recommendations for further development.

The proof of concept should demonstrate the end-to-end flow from lead input to workflow

recommendation. It should show how CatBoost, LightGBM Ranker, Vowpal Wabbit and

Gemma or Qwen3 can work together as part of one AI agent architecture. The prototype

does not need to be production-ready, but it should be technically sound and suitable for

demonstrating the feasibility of Simplism.ai.

For the predictive models, relevant metrics may include AUC, precision, recall, calibration,

lift, ranking quality and expected business value. For the LLM component, relevant

evaluation criteria may include output completeness, relevance, consistency, structured

output validity, tone quality and usefulness for sales or marketing users. The students

should also address limitations such as data quality, data availability, bias, privacy

requirements, compliance risks and the reliability of LLM-generated output.

Desired student profile and final summary

This assignment is suitable for two AI, data science or machine learning students with an

interest in applied AI, sales technology, marketing automation and local language models.

Relevant knowledge areas include supervised learning, ranking models, recommendation

systems, contextual bandits, prompt engineering, local LLM deployment, evaluation metrics

and basic software prototyping.

The end result should be a working proof of concept for Simplism.io, an AI agent for

automated lead generation workflow creation, recommendation and optimization.

CatBoost predicts lead quality and conversion probability, LightGBM Ranker ranks

channels and workflows, Vowpal Wabbit explores continuous next-best-action

optimization, and Gemma or Qwen3 generates workflow descriptions, outreach messages,

CRM notes and structured output. Together, these models demonstrate how predictive AI

and generative AI can collaborate inside one agent system to help sales and marketing

teams create more effective, data-driven workflows

Simplism.io Development Pte Ltd

Glints Safety Tips

Legitimate employers won’t ask for contact Telegram or any kind of top-ups or payment. Do not provide your messaging app contacts, bank details, or credit card information.

Learn More