Software Engineering (Data Platform)

San Francisco, United States

Be a founding engineer building production-grade AI systems for drug toxicity prediction, with the explicit goal of replacing lab and animal experiments. The output of your work will influence real drug programs, real capital allocation, and real patient outcomes. This is not a research toy.


The challenge


Drug toxicity is responsible for ~50% of drug program failures. Most “AI for drug safety” platforms fail because their data foundations are brittle, poorly curated, and non-reproducible.

Our Client is building the opposite:

deterministic, auditable, scalable data and ML systems that scientists and regulators can trust.

This role exists because this problem is too important to outsource to mediocre infrastructure.


The role


This is not a “data engineer who wires tools together.”

You will own the end-to-end data and inference systems that convert raw chemical, biological, and clinical data into:

  • ML-ready training datasets
  • Large-scale LLM-powered literature and clinical data pipelines
  • Customer-facing insights used to make irreversible decisions


You will work directly with ML researchers, lab scientists, and product but you are the technical authority on systems correctness, scalability, and reliability.

If you’ve never built infrastructure that had to be right, this will be uncomfortable.

 

What you’ll actually build


  • Design and operate high-throughput ingestion, processing, and serving systems for chemical, biological, and clinical data
  • Build distributed systems to run large-scale LLM and ML inference jobs (not notebooks, not demos)
  • Architect LLM-driven data curation pipelines with observability, evaluation, and failure modes explicitly designed for
  • Create simple, composable APIs so scientists can access complex datasets without breaking things
  • Implement data validation, testing, lineage, and monitoring so errors are caught before they propagate into models or decisions


The level of expectation


You’ve likely done multiple of the following:


  • Built or led large-scale data platforms used by multiple teams or external customers
  • Designed systems that process terabytes to petabytes of data with predictable performance
  • Written distributed systems from scratch, not just configured managed services
  • Deployed LLM- or ML-powered systems into production research or decision-making workflows
  • Owned infrastructure at an early-stage company where you couldn’t hide behind process


If your experience tops out at dashboards, pipelines glued together with fragile scripts, or academic prototypes, this is not the role.


Technical requirements (non-negotiable)


  • Expert-level Python and deep familiarity with the Python data ecosystem
  • Hands-on experience with distributed compute frameworks (e.g. Ray, Spark, Dask, Slurm, Kubernetes-based systems, or equivalents)
  • Strong systems intuition: you understand memory, latency, throughput, failure modes, and cost tradeoffs
  • Real DevOps competence: CI/CD, cloud infrastructure (AWS/GCP/Azure), Terraform, and compute provisioning
  • Comfort debugging issues that span data, infrastructure, ML models, and user behaviour


Who thrives here


  • Engineers who get bored in big tech because the problems aren’t existential enough
  • People who obsess over correctness, elegance, and leverage
  • Builders who want ownership, not tickets
  • Those who enjoy working at the boundary of software, ML, and real science
  • People who want their work to matter, and accept that this makes the job harder, not easier


Final reality check


This will be:

  • Technically brutal
  • Ambiguous
  • High-responsibility


But if you’re the kind of engineer who wants to look back and say “I built the system that made this possible”, this is the right room.


Your consultant


As a Senior Recruitment Consultant at Aspire Life Sciences, Julien Funes' expertise lies at the nexus of technology and life sciences. He recruits top Machine Learning and data talent for Biotech and life sciences startups across Europe and North America. He is committed to advancing the industry by sourcing and securing top-tier talent for roles in these critical sectors. His approach enables him to effectively match candidates with opportunities where technological innovation meets life science excellence.  


Apply Now