AI Research Scientist, Audio-Visual Understanding, FAIR

Menlo Park, CA; New York, NY; San Francisco, CA March 13, 2026 Meta Careers (Proprietary Ats)

Meta is seeking a Research Scientist to join Fundamental AI Research (FAIR), a research organization focused on making significant advances in AI. Our organization is driven by advancing the science of intelligence and developing technology toward achieving superintelligence. We are seeking researchers with experience in computer vision, speech and multimodal learning to join our team and help build the perceptual foundations for real-time embodied conversational agents. This role offers the opportunity to collaborate with a highly interdisciplinary team of scientists, engineers, and cross-functional partners, with access to cutting-edge technology, resources, and research facilities.

Responsibilities

  • Develop joint audio-visual understanding systems that integrate visual and auditory signals for advanced perception
  • Build and evaluate audiovisual language models for social interactions and understanding, including predicting social intent, semantic function, and reasoning from human-centric inputs
  • Contribute to benchmarks and evaluation frameworks for visual social understanding and interactions
  • Train and optimize state-of-the-art machine learning and neural network methodologies
  • Conduct and collaborate on research projects within a globally-based team

Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • A PhD in AI, computer science, data science, or related technical fields
  • Experience holding an industry, postdoctoral, faculty, or government researcher position
  • Research background in machine learning, artificial intelligence, computational statistics, or applied mathematics, or related areas
  • Research publications reflecting experience in theoretical or empirical research
  • Experience in developing and debugging in Python or similar programming languages
  • Experience in analyzing and collecting data from various sources
  • Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment Demonstrated research and software engineering experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)
  • Experience with audio-visual learning or multimodal fusion techniques
  • Familiarity with human action recognition, social signal processing, or human-centric video understanding
  • Experience with long-form video understanding, video-language models, or streaming perception systems
  • Experience with vision-language models (VLMs) such as LLaVA, GPT-4V, Gemini, or similar architectures
  • Experience with temporal modeling, video transformers, or recurrent architectures for sequential data
Apply on company site

How to Get Hired at Meta

  • Target one specific role on metacareers.com rather than carpet-bombing multiple postings — Meta's ATS tracks cross-applications, and a focused application with a tailored resume outperforms a scattered approach every time
  • Reverse-engineer the job description into your resume by incorporating its exact technical terms, team references, and skill requirements as naturally integrated keywords
How to apply to Meta

How well do you match this role?

Check My Resume