Freelance Agent Evaluation Engineer

Job Overview

This position offers a unique opportunity for experienced software developers and engineers to work on AI-related projects as a Freelance Agent Evaluation Engineer. The role is based at Mindrift, a company that connects specialists with project-based AI opportunities for leading tech companies, specifically focused on testing, evaluating, and improving AI systems. The work will predominantly take place in Brazil and is structured on a freelance basis, meaning that it is project-based rather than permanent employment.

Responsibilities

As a Freelance Agent Evaluation Engineer, your main responsibility will include building a dataset to evaluate AI coding agents by creating practical tasks and evaluating how well these models perform real-world developer tasks. Your tasks and responsibilities will involve:

Creating challenging tasks within simulated environments. This includes building virtual companies following a high-level plan. You will need to define the codebase, infrastructure, and context required for a realistic environment that reflects development history.
Assembling and calibrating tasks using intermediate states of the virtual company. You'll craft prompts, define evaluation criteria, and ensure that the tasks are both solvable and fair in terms of evaluation.
Designing tasks that simulate isolated environments like a developer's workstation. This will include setting up a Linux machine, development tools, and an entire web application codebase.
Writing tests to ensure accuracy, accepting all correct solutions while rejecting incorrect ones. It is crucial that tests are neither overly strict nor excessively lenient.
Reviewing code written by AI agents, analyzing the reasons for their success or failure, and designing various edge cases and adversarial scenarios to thoroughly evaluate the AI agents.
Iterating on feedback received from expert QA reviewers who will score your work according to established quality criteria.

What This Position Is NOT

It’s important to clarify that this role does not involve data labeling, prompt engineering, or writing code from scratch, as the AI agent is responsible for most of the coding. Instead, your role will focus on guiding and evaluating the AI's output.

Ideal Candidate Profile

The ideal candidate for this position should possess a firm foundation in software development and have an inclination toward AI testing. Here are the key qualifications we seek:

A degree in Computer Science, Software Engineering, or related fields.
At least 5 years of experience in software development, particularly with Python, including frameworks such as FastAPI and tools like pytest.
A background in full-stack development, including experience with React-based interfaces and robust backend systems.
Experience in writing tests (functional and integration) is essential, rather than simply running them.
Familiarity with Docker containers and infrastructure tools such as Postgres, Kafka, and Redis, as well as a good understanding of Continuous Integration and Continuous Deployment (CI/CD) processes, preferably with GitHub Actions.
Proficiency in English at the B2 level is required to communicate effectively and understand documentation.

Challenges of the Role

Creating tasks that effectively challenge frontier models is not a trivial endeavor. The complexity lies in understanding where AI models typically fail, as well as designing scenarios that exhibit the differences between valid and invalid solutions. Writing tests that can accurately evaluate multiple valid solutions while rejecting incorrect ones is also a challenging aspect of this role.

How the Process Works

The application process is simple and straightforward:

Apply for the position.
Go through qualification checks, if necessary.
Join a project once selected.
Complete assigned tasks.
Get paid upon successful submission.

Time Commitment

It is estimated that tasks within this context will require around 20 hours to complete, although this can vary depending on individual complexity. Importantly, freelancers can choose when and how to work. However, all tasks must meet the set deadlines and acceptance criteria to be considered valid and compensated.

Compensation Details

Compensation for this role is competitive and can reach up to $17 per hour, depending on the individual’s experience and the pace of contributions. The compensation structure may vary based on the scope and complexity of specific projects, as well as the required level of expertise.

Overall, this opportunity with Mindrift offers experienced professionals in the tech field the chance to not only contribute to cutting-edge AI projects but also to enjoy the flexibility that comes with freelance work while being compensated fairly for their expertise.

This job offer was originally published on himalayas.app

Mindrift

Brazil

Software development

Freelance

April 28, 2026

1 views

0 clicks on Apply Now

This job offer summary has been generated using automated technology. While we strive for accuracy, it may not always fully capture the nuances and details of the original job posting. We recommend reviewing the complete job listing before making any decisions or applications.