How to validate email with PHP

validate email with PHP“How to validate email with PHP” is conceptually part of a bigger problem: accurately validating web forms including emails without sacrificing good UX. I’ll blog about that some other time.

This post presents and dissects PHP code to validate email. The explanations and my reasoning should allow you to adjust the code to your needs if necessary.

First, let me give you some context.

Why I need to validate email with PHP

nodemcu-build.com start pagenodemcu-build.com has been up for little over a year now and maintaining it has really been a smooth ride so far. Even though the cloud service is churning out way over a thousand NodeMCU firmware builds per week manual intervention is rarely need. It’s all fully automated and extremely reliable.

The only thing that bugs me are those “Delivery Status Notification” messages the mail server sends me when it can’t deliver build status emails to users. That happens a few times every day. Since the site does not require registration it’s important users enter valid non-bogus (existing) email addresses. My strategies to fend off bot-triggered form SPAM seem really good. However, I don’t do anything against users who deliberately enter syntactically correct but non-existing email addresses. I’m clueless as to why someone would actually do that but I can confirm it does happen.

Hence, I’ve been thinking and reading a lot about how to validate email with PHP to filter out users who enter non-existing email addresses. Even though I’m really satisfied with the PHP code presented below I won’t integrate it into nodemcu-build.com for the reasons explained in the last chapter.

Characteristics of a valid email address

When is an email address valid? Let’s try… It’s valid when

  • it contains exactly one @
  • it is made up of two parts separated by that @
  • the substring on the right side of the @ represents a hostname
  • yadayada…

That won’t get us anywhere. And don’t add a regular expression to the list as it’s by definition impossible to verify email addresses with regular expressions 100% accurately.

It’s a lot simpler: an email address is valid if you can send an email to it and it does not bounce back.

How to validate email with PHP

As we learned you need to send an email to an address to be 100% sure it’s valid. The second best option is to connect to the responsible mail server (MX host, MX = mail exchange) but to stop short of actually sending the message. Of course that won’t give you 100% accuracy but sometimes that’s good enough. A telnet session with the required commands to verify the email address bing@bong.com looks something like this:

/tmp > nslookup -q=mx bong.com
Server:		xxx
Address:	xxx#53

Non-authoritative answer:
bong.com	mail exchanger = 20 cluster3a.eu.messagelabs.com.
bong.com	mail exchanger = 10 cluster3.eu.messagelabs.com.
/tmp > telnet cluster3.eu.messagelabs.com. 25
Trying 194.106.220.35...
Connected to cluster3.eu.messagelabs.com.
Escape character is '^]'.
220 server-12.tower-139.messagelabs.com ESMTP
EHLO mx1.validemail.com
250-server-12.tower-139.messagelabs.com
250-STARTTLS
250-PIPELINING
250 8BITMIME
MAIL FROM:<>
250 OK
RCPT TO:<bing@bong.com>
250 OK
RSET
250 flushed
QUIT
221 server-12.tower-139.messagelabs.com

Response code 250 for the RCPT TO command means the recipient address is accepted but that doesn’t necessarily mean it’s also valid. For explanations please refer to e.g. https://www.port25.com/how-to-check-an-smtp-connection-with-a-manual-telnet-session-2/.

Without further ado let’s dive into code.

Extract the FQDN

The first step is to extract the fully qualified domain name (FQDN) from the email address.

function extractFullyQualifiedDomainFromEmail($email)
{
$mailSegments = explode("@", $email);
$domain = $mailSegments[1];
// http://stackoverflow.com/q/14065946/131929
// fully qualified domain names should end with a '.', DNS resolution may otherwise take a very long time
if (substr($domain, -1) != ".") {
return $domain . ".";
}
return $domain;
}

Find the preferred MX host

Using the extracted FQDN you can now lookup the registered MX hosts for that domain. DNS will give you a weighted list of hosts. They’re copied to an associative array and then sorted by weight. Only the host with the highest priority (i.e. the primary MX host) is used.

function findPreferredMxHostForDomain($domain)
{
$mxRecordsAvailable = getmxrr($domain, $mxRecords, $mxWeight);
if ($mxRecordsAvailable) {
// copy mx records and weight into array $mxHosts
$mxHosts = array();
for ($i = 0; $i < count($mxRecords); $i++) {
$mxHosts[$mxRecords[$i]] = $mxWeight[$i];
}
asort($mxHosts, SORT_NUMERIC);
reset($mxHosts);
return array_keys($mxHosts)[0];
} else {
return null;
}
}

Open socket to MX host and send commands

The only tricky part is sending the SMTP commands EHLO, MAIL FROM, RCPT TO and reading the server response correctly. You can send a command terminated by carriage return and line feed and the server will, or might, respond. It’s just that the server doesn’t explicitly announce when it’s done with its response and that’s a bit of a problem.

If you read just a single response line and continue sending the next command you’ll completely garble the logic. Since the remaining lines of the previous response have not yet been read you’d interpret the 2nd line of the 1st response as the 1st line of the 2nd response. The output of my initial example would look like so:

EHLO mx1.validemail.com
250-server-12.tower-139.messagelabs.com
MAIL FROM:<>
250-STARTTLS
RCPT TO:<bing@bong.com>
250-PIPELINING
RSET
250 8BITMIME

See how the “response” to the RCPT TO command is actually the 3rd line of response to the initial EHLO command. AND since it happens to start with 250 you’d interpret this as “all good, address accepted”.

Ok, reading a single line is not a good idea. What if you attempted to read all response lines in a loop until there’s no more data? That won’t work either as it will block once all responses have been read. Your client is simply waiting for more and doesn’t know the server is already done. Therefore, you need to set a timeout on the socket after which the client should stop waiting for a server response. The reading is interrupted and your client can continue.

// reads all lines with a timeout of 1s
stream_set_timeout($mxSocket, 1);
while ($line = fgets($mxSocket)) {
  $response .= $line;
}

There’s one serious problem with that approach, though. It may happen that the server takes a little longer than the timeout you defined. In that case you’d return an empty response – only to read the actual but delayed response after the next command was sent!

The only feasible approach that worked for all my test cases was to run the whole socket reading while-loop nested inside another loop and to check whether the response after the timeout was empty.

fwrite($mxSocket, $command . "\r\n");
$response = "";
stream_set_timeout($mxSocket, 1);
// Wait at most 10 * timeout for the server to respond.
// In most cases the response arrives within timeout time and, therefore, there's no need to wait any longer in such
// cases (i.e. keep timeout short). However, there are cases when a server is slow to respond and that for-loop will
// give it some slack. See http://stackoverflow.com/q/36961265/131929 for the whole story.
for ($i = 0; $i < 10; $i++) {
while ($line = fgets($mxSocket)) {
$response .= $line;
}
// Only continue the for-loop if the server hasn't sent anything yet.
if ($response != "") {
break;
}
}

So, if the server takes 5s to respond the client waits 5 x 1s (timeout) until it completes. If the server hasn’t returned anything in 10 x 1s it simply returns an empty response. You don’t want to increase the stream timeout to 10s because then you’d wait 10s after every command, regardless of whether the server sent all data within a few 100 milliseconds or within several seconds.

RFC 822

Since this is about how to validate email with PHP there’s more thing we should talk about. PHP has got a built in email syntax filter

filter_var($email, FILTER_VALIDATE_EMAIL)

That statement verifies the email address against RFC 822 and returns a boolean. RFC 822 is from 1982, defines the Internet email message format and is updated and obsoleted by newer RFCs. I don’t use filter_var because it validates only the format and because it validates against a rather old format definition.

Full code


<?php
// inspired by a note at http://php.net/manual/en/function.getmxrr.php
// further inspiration from https://github.com/webdigi/SMTP-Based-Email-Validation
function validateEmail($email)
{
$emailValid = false;
$domain = extractFullyQualifiedDomainFromEmail($email);
$mxHost = findPreferredMxHostForDomain($domain);
if ($mxHost) {
// echo $mxHost . "<br>";
$mxSocket = @fsockopen($mxHost, 25, $errno, $errstr, 2);
if ($mxSocket) {
$response = "";
// say HELO to mailserver
$response .= sendCommand($mxSocket, "EHLO mx1.validemail.com");
// initialize sending mail
$response .= sendCommand($mxSocket, "MAIL FROM:<info@validemail.com>");
// try recipient address, will return 250 when ok..
$rcptText = sendCommand($mxSocket, "RCPT TO:<" . $email . ">");
$response .= $rcptText;
if (substr($rcptText, 0, 3) == "250") {
$emailValid = true;
}
// quit mail server connection
sendCommand($mxSocket, "QUIT");
fclose($mxSocket);
}
}
return $emailValid;
}
function extractFullyQualifiedDomainFromEmail($email)
{
$mailSegments = explode("@", $email);
$domain = $mailSegments[1];
// http://stackoverflow.com/q/14065946/131929
// fully qualified domain names should end with a '.', DNS resolution may otherwise take a very long time
if (substr($domain, -1) != ".") {
return $domain . ".";
}
return $domain;
}
function findPreferredMxHostForDomain($domain)
{
$mxRecordsAvailable = getmxrr($domain, $mxRecords, $mxWeight);
if ($mxRecordsAvailable) {
// copy mx records and weight into array $mxHosts
$mxHosts = array();
for ($i = 0; $i < count($mxRecords); $i++) {
$mxHosts[$mxRecords[$i]] = $mxWeight[$i];
}
asort($mxHosts, SORT_NUMERIC);
reset($mxHosts);
return array_keys($mxHosts)[0];
} else {
return null;
}
}
function sendCommand($mxSocket, $command)
{
// print htmlspecialchars($command) . "<br>";
fwrite($mxSocket, $command . "\r\n");
$response = "";
stream_set_timeout($mxSocket, 1);
// Wait at most 10 * timeout for the server to respond.
// In most cases the response arrives within timeout time and, therefore, there's no need to wait any longer in such
// cases (i.e. keep timeout short). However, there are cases when a server is slow to respond and that for-loop will
// give it some slack. See http://stackoverflow.com/q/36961265/131929 for the whole story.
for ($i = 0; $i < 10; $i++) {
while ($line = fgets($mxSocket)) {
$response .= $line;
}
// Only continue the for-loop if the server hasn't sent anything yet.
if ($response != "") {
break;
}
}
// print nl2br($response);
return $response;
}
?>

A number of caveats

I took the code above to a number of thorough test runs. My disappointing conclusion is that I eventually won’t validate email with PHP, at least not for nodemcu-build.com. Here’s why this all is still not reliable enough:

  • Mail servers may simply accept all recipient addresses, they’d return 250 to the RCPT TO command even for invalid addresses.
  • If you run the script on a host which is connected to an ISP that has had troubles with spammers in the past the mail server may refuse connection:
    /tmp > telnet mxzhh.bluewin.ch. 25
    Trying 195.186.227.50...
    Connected to mxzhh.bluewin.ch.
    Escape character is '^]'.
    554 Service unavailable from IP: xxxxx. Please refer to https://www.spamhaus.org/query/ip/xxxxx if you feel this is in error
    Connection closed by foreign host.
    

    Or the server rejects you for some other reason:

    /tmp > telnet cluster3.eu.messagelabs.com. 25
    Trying 85.158.136.3...
    Connected to cluster3.eu.messagelabs.com.
    Escape character is '^]'.
    450 Requested action aborted [7.2] 12303, please visit www.messagelabs.com/support for more details about this error message.
    Connection closed by foreign host.
  • The mail server may allow you to connect but then it could reject the RCPT TO command:
    /tmp > telnet cluster3.eu.messagelabs.com. 25
    Trying 85.158.136.3...
    ...
    RCPT TO:<bing@bong.com>
    553-mail rejected because your IP is in the PBL. See
    553 http://www.spamhaus.org/pbl
  • A number of other “errors” may occur.

If you simply assume email addresses to be invalid until proven otherwise you may turn down perfectly legit users. Needless to say that this is the worst you can do.

Leave a Reply