Quality Assurance Engineer
See yourself being part of a large, transformational change? This could be the role for you!
Role purpose
The Quality Assurance Engineer is responsible for ensuring the stability, functionality, and effectiveness of the Hermes suite of products, an advanced institutional equity trading system for South African stockbrokers.
While the core function of this role is Quality Assurance, it operates with the breadth and depth of a Systems Analyst. This role is much broader than pure testing; it requires an analytical thinker who can look beyond individual bugs to understand system-wide behavior, complex interactions, and the broader financial infrastructure.
This is a high-impact role operating in a live trading environment. The Quality Assurance Engineer will act as the primary point of contact for investigating complex production issues, often under significant time constraints. The ability to maintain composure and think clearly under pressure during critical incidents is essential to success in this position..
Operational Context & Technical Complexity
This position requires a candidate capable of navigating a sophisticated, high-performance trading environment. The Hermes suite is built on a Delphi-based client/server architecture, utilizing proprietary TCP/IP protocols for communication between multiple distributed server components and the frontend.
Deep Diagnostic Capability: This role demands more than functional UI testing. The candidate will be required to analyze application logs and FIX messages to diagnose communication breakdowns and data inconsistencies between distributed components.
Pattern Recognition in Complex Failures: The candidate must be capable of identifying patterns in intermittent or "non-deterministic" issues (often caused by race conditions or network timing) by correlating events across multiple log files.
Performance Awareness: In institutional trading, speed matters. The candidate is expected to recognize performance degradation and help investigate whether delays are due to network lag, database execution, or application processing.
Strategic Manual Validation: In the absence of a pre-existing automation framework, the candidate must design rigorous manual validation strategies. This requires the ability to think abstractly about "what could go wrong" beyond the happy path.
Critical Operational Standards: Operating within a live institutional trading environment means that system failure carries significant financial and regulatory risk. Precision, thoroughness, and a "zero-defect" mindset are required to manage these operational risks effectively..
Accountabilities and deliverables
Pre-Market Operational Support: During onboarding, conduct daily pre-market system health checks (07:00 AM) to master system topology. Post-onboarding, provide standby escalation support to the Support Team to resolve critical issues before the market opens.
Internal Toolsmithing: Identify inefficiencies in the testing process and build bespoke scripts or small utility applications (e.g., in Python or PowerShell) to automate log parsing, data generation, or message injection.
Broad Systems Analysis: Apply a "Systems Analyst" mindset to assess system behavior, identify trends, and recommend improvements to enhance performance and maintainability.
Deep Domain Expertise: Develop expert knowledge in financial markets, including institutional trading and equity trading concepts.
Expert FIX Protocol Analysis: Master the FIX protocol to an expert level, including the ability to read, interpret, and follow FIX message flows to diagnose order routing and execution behavior.
Production Investigation: Troubleshoot system issues as the first point of contact for production incidents, using knowledge of networking, infrastructure, application logs, functional workflows, and data flows to identify root causes.
Documentation Excellence: Create and maintain detailed technical documentation to a professional standard, including product specifications, data flow diagrams, comprehensive user guides, and detailed release notes for clients.
Release Safety: Test and document the upgrade and rollback processes. Execute detailed mapping of version dependencies and support deployments after market close to ensure minimal disruption.
Testing Strategy: Design, execute, and refine comprehensive manual test suites to validate both new and existing functionality. Note: As the core application is built in Delphi using proprietary protocols, the focus is on rigorous manual, functional, and backend validation rather than UI automation.
Client Onboarding: Support the technical client onboarding process, including FIX certification testing and client workflow configuration.
Training: Conduct and develop remote training sessions (via Zoom/Teams) for internal support teams on system functionality and new features.
Professional skills and competencies
First-Principles Thinking: The ability to deconstruct complex problems into their fundamental parts. The company seeks individuals who seek to understand the "why" behind a failure rather than simply applying patches.
Resilience & Clarity Under Pressure: Demonstrates the ability to stay calm and think clearly during high-stakes production incidents. The candidate must be able to effectively prioritize tasks and communicate accurately when time is critical and pressure is high.
FIX Forensics: Ability to manually parse raw log files, debug session-level issues (sequence gaps, resend requests), and interpret complex custom tags (User-Defined Fields). Reliance on GUI tools alone will be insufficient for this role.
Technical Documentation: Develops clear and comprehensive documentation, including system architectures, workflows, user guides, and release notes.
Autonomy & Work Ethic: A highly self-motivated individual who takes ownership of tasks and can work independently without needing micromanagement, consistently delivering high-quality work.
Technical Communication: Clearly conveys complex technical concepts to both technical and non-technical stakeholders.
Technology and Tools Experience
Windows Command Line / PowerShell.
Log analysis and monitoring tools (e.g., Splunk, ELK Stack, Grafana, Geneos).
Network and infrastructure tools for debugging connectivity and latency issues.
Database querying for data analysis (SQL Server).
Financial market protocols and integration (FIX messaging), specifically reading and analyzing raw message logs.
Scripting languages (Python, PowerShell, Batch) for support tools and analysis.
Confluence, SharePoint, or similar documentation platforms
Qualifications and Certifications
Bachelor’s degree in Computer Science, Engineering, or a related field (required).
5+ years of experience in financial markets, trading systems, or enterprise financial technology.
Expert-level knowledge of financial workflows and trading systems, with a deep, practical ability to read, interpret, and trace FIX message flows for diagnostics.
Distributed Systems Aptitude: Experience troubleshooting complex client/server architectures where communication relies on custom/proprietary TCP/IP protocols.
Legacy System Aptitude: Demonstrated ability to navigate and understand complex, monolithic legacy architectures (e.g., Delphi, C++, or older Java).
System Resource Monitoring: Ability to use Windows administrative tools (e.g., Resource Monitor, Event Viewer, Task Manager) to observe system health, identify high CPU/Memory usage, and correlate these metrics with application behavior.
Windows Server Proficiency: Strong practical experience with Windows Server environments, including service management and file system navigation.
Scripting & Data Manipulation: Proficiency in using scripting languages (e.g., Python, PowerShell) or advanced Excel/Text manipulation tools to parse logs and prepare data is highly advantageous.
Experience troubleshooting networking, infrastructure, and system integration issues.
Familiarity with SQL databases and querying data for analysis.
Experience with professional-standard technical documentation and process mapping, specifically writing user guides and release notes.
Key Relationships
Product Managers
Other Engineers (Cross-team collaboration)
Networks & Infrastructure Teams
Site Reliability Engineering (SRE) Team
Information Security Team
External Vendors & Exchange Connectivity Partners
Client-facing Support Teams
Working Hours and flexibility
Onboarding Immersion: During the initial training period, the successful candidate will be required to start at 07:00 AM to shadow and learn the critical pre-market system health checks.
Ongoing Availability: Once onboarding is complete, the candidate is not required to perform these daily checks personally but must remain reachable via Phone/Zoom from 07:00 AM to assist the Support Team if critical pre-market issues arise.
After-Hours Upgrades: System upgrades and deployments are performed strictly after market close. The candidate must be available to support these maintenance windows as required.
Interview Process
Please Note: As part of the interview process, shortlisted candidates will be required to complete a comprehensive practical assessment.