Extracting an OTP code from an email sounds simple. In practice, OTP emails come in hundreds of formats. Here is what a robust OTP extraction system has to handle.

The Variety Problem

Different services phrase OTP emails differently:

  • "Your verification code is: 847291"
  • "Use 847291 to complete your login"
  • "847291 is your one-time password"
  • "Code: 847291 (expires in 10 minutes)"
  • Just the code alone on its own line: "847291"

A single regex pattern catches maybe 60% of these. You need a set of patterns to be reliable.

The False Positive Problem

Not every 6-digit number is an OTP. Prices ($84.72), dates (032024), phone numbers, and reference codes all look like OTPs. Good extraction filters out numbers that are clearly not codes. Year patterns like 2024 and 2025 are common false positives that need explicit exclusion.

HTML vs Text

OTP emails usually send both HTML and plain text versions. HTML can contain the code inside styled elements like <b>, <strong>, or a custom styled <div>. Extracting from plain text is more reliable. HTML should be stripped to text before running regex patterns.

How AgentMailr Does It

AgentMailr runs six regex patterns against the email text. Each pattern targets a different phrasing common in production OTP emails. Results are deduplicated and filtered against false positive patterns. The extracted codes are returned as a structured JSON array in the parsed.otp_codes field.

You never write a regex. You call the API and read the result.

When There Is No OTP

If the email does not contain an OTP, parsed.otp_codes is an empty array. Your code checks for an empty array and handles the no-code case. The parsed.category field also tells you what kind of email it is, so you can route it appropriately.