Summary
Data Scientist at Forto, a logistics tech company, to own production ML systems for document intelligence. Requires 2+ years experience in ML engineering, Python, LLMs, and classical data science to drive accuracy improvements and build end-to-end ML pipelines.
- Location
- Berlin
- Type
- fulltime
- Level
- Mid-Level
About Us
What if your work could drive change in a globally established industry, shaping processes that touch every corner of the world? At Forto, we are at the forefront of change, harnessing the power of AI to revolutionise logistics. We want to reinvent digital supply chains to be transparent, frictionless and sustainable. From day one, our mission has been to simplify global trade – creating a seamless and efficient logistics process.
Your Role & Mission
As a Data Scientist at Forto, you will take ownership of production ML systems that extract structured intelligence from unstructured logistics data. You will working closely with the Engineering Manager across three core workstreams — document data extraction (FlashDoc), vocabulary mapping, and classical ML. Your immediate priority is ensuring continuity of existing production systems and setting up evaluation pipelines, but equally important is driving step-change improvements in accuracy through disruptive methods and new technologies when the opportunity arises. Beyond document automation, the team's roadmap extends into traditional data science territory — demand forecasting, churn prediction, route optimization, and predictive analytics for logistics operations. You will bring both the ML engineering depth to maintain and innovate on current systems and the classical data science foundation to tackle these broader challenges as the team grows.
What You Will Do
Design, build, and maintain end-to-end ML pipelines for document extraction, classification, and data enrichment in production.
Build prompt evaluation frameworks and feedback-based optimization loops to systematically improve extraction accuracy.
Train custom in-house models using human-in-the-loop (HITL) data to move from assisted to fully automated extraction.
Build and maintain semantic similarity models for free-text to standardized TMS vocabulary across ports, terminals, container types, legal entities, and line items.
Improve pipeline reliability through redesign, testing, monitoring, and alerting for non-deterministic ML systems.
Evaluate and introduce disruptive approaches (new model architectures, fine-tuning strategies, novel evaluation methods) to achieve step-change accuracy improvements when incremental optimization plateaus.
Partner with Product Managers to identify where DS can solve real user pain points, proactively surface opportunities from the data, and shape product roadmaps with a data-informed perspective.
Collaborate closely with Engineering teams on integration, infrastructure, and API design to ensure DS outputs are consumed reliably by downstream systems.
Manage stakeholder expectations: communicate what is feasible given capacity, set realistic timelines, flag risks early, and negotiate prioritization trade-offs across teams.
Required Skills and Experience
2+ years of professional experience in data science or machine learning engineering
Ability to design, deploy, and maintain ML systems in production. Go beyond model development — it includes pipeline architecture, monitoring, reliability, and handling non-deterministic outputs at scale.
Ability to quickly get onboarded with new tools/ technologies/ problem space
Strong use of agentic tools for coding
Strong proficiency in Python
Hands-on experience with LLMs (prompting, fine-tuning, evaluation) and understanding of their limitations in production environments.
Strong foundation in classical data science and statistics: regression, classification, time series analysis, data leakage, experimental design, and hypothesis testing.
Strong analytical and problem-solving skills with the ability to work independently on ambiguous, research-oriented problems.
Demonstrated ability to evaluate when existing approaches are insufficient and propose disruptive alternatives — not just incremental tuning.
Strong stakeholder management skills: ability to identify problems and opportunities proactively, manage expectations on timelines and feasibility, and negotiate prioritization across competing demands.
Ability to commit fully to a direction after healthy debate, even when it wasn't their preferred approach
Don’t fit all of our criteria? That’s okay! We know that you might be hesitant to apply if you don’t meet all our requirements, but here at Forto, we pride ourselves on embracing diverse perspectives and celebrating potential. If you are passionate about this position and the Forto values, please apply anyway. There could be a place for you in this role - or another one that’s a perfect fit!
Why work with us?
Our team is hard-working, constantly seeking to maximise the impact of their work, but we put our people first, always winning with care. We value efficient systems and swift, direct communication. We want everyone to have their time to speak, so that we can embrace diverse perspectives to help drive towards solutions always.
Data Scientist
Forto · Berlin