Go back to the first part of this article
MAIL FROM Implications to Source Server Environment
In order to be able to send RCPT TO command, which is our goal from the very beginning, we need to send MAIL FROM command to the server and receive OK response (code 250) from it. However, this action has big consequences. Here is where another bunch of anti-spam measures are implemented by many mail servers.
MAIL FROM command requires an argument, in which we specify the sender of an email we want to send. We will refer to the domain part of the sender email address as the source domain. Let’s assume that our source domain is example.com and our sender email address is verifier@example.com and see how our SMTP session continues:
C: MAIL FROM:<verifier@example.com>
S: 250 2.1.0 OK 91si19491992ioi.66 - gsmtp
This does not look very complicated, so what’s the catch? Once we have used the MAIL FROM command, we have gave away our source domain. This is what many spam-fighting techniques are waiting for. Once they got your source domain, they start to check it. Some of them perform very complex checks. Note that some mail servers may perform some of these checks (#5, #6, #7) even before MAIL FROM commands is received.
Source Check #1 – Is There Mail Server?
We have identified our sender as verifier@example.com. The target mail server may thus want to know if there are MX records for domain example.com. The target mail server thus obtains DNS MX records of example.com. If there are no MX servers available, it may consider the incoming message as spam or fake and refuses to further communicate with our client.
Source Check #2 – Is It Alive?
Some servers do not consider the presence of MX records to be good enough. They attempt to connect to those MX servers to find out whether they are alive. If there are no running MX servers then again, they may refuse to talk to you again.
Source Check #3 – Is There Postmaster Account?
The SMTP RFC prescribes that every domain that accepts emails via SMTP must have the Postmaster account. In our example, this means that mailbox postmaster@example.com must exist. Some SMTP servers requires the source servers that attempt to send them emails to comply with RFC at this point. This is why they perform an email verification process to check that the mailbox postmaster@example.com really exists.
Source Check #4 – Is Sender Address Valid?
Another anti-spam technique is to verify the sender email address. This is done similarly to how Postmaster account is checked. If the sender’s mailbox does not exist on the source domain mail server then they reply with an error status to your MAIL FROM command.
Source Check #5 – Is Client Blacklisted?
A very common method of fighting spam is using one or more blacklists. The client’s IP (198.51.100.123 in our example) and/or the source domain’s IP addresses are checked and if they are blacklisted, further communication is disallowed. For a manual blacklists lookup, you can try online blacklist checker.
Source Check #6 – Is There Good Reverse Record?
This method checks the client’s IP address (198.51.100.123 in our example). The mail server attempts to obtain its reverse DNS record. Trustworthy mail servers operate on IP address with a valid reverse DNS record. If it is missing, it is more likely that a spam is incoming. Some mail servers go even further and only the presence of a rDNS record is not enough for them. They require the reverse domain to start with mail, mx, or smtp. In our example, mail.example.com would be a good domain name.
Source Check #7 – Is There Good SPF Record?
A very common technique is checking SPF records. SPF stands for Sender Policy Framework, which is a definition of who is allowed to send emails on behalf of a domain in question. SPF records are TXT records in DNS with a special syntax that define list of mail servers that are allowed to send mails, but they also define how non-listed servers should be treated if they attempt to send emails on behalf of the specific domain. In our example, we run our client from IP address 198.51.100.123. If this IP address is not found among servers allowed to send emails on behalf of the source domain example.com, then our attempt will be rejected.
Various mail servers will implement different subsets of these checks, so you may succeed to verify/send emails on/to one server and fail on another, if you do not have everything configured properly. If you want to pass all these checks, you will have to:
- Choose a sender email address such that its domain exists.
- Make sure there is an MX record for that domain.
- Make sure the MX records points to a running SMTP server.
- Have the SMTP server operating properly. It must accept emails for Postmaster mailbox and for the sender email address.
- Not to have the server, from which you are running your verification software, blacklisted.
- Make sure its IP address has a valid reverse record that matches the sender email address domain and its FQDN starts with mail, mx, or smtp.
If the target mail server implements any of these checks, it may or may not let you continue in the SMTP session. Some of the mail servers may let you allow to continue and just silently send emails from you to trash, if you send any. For email verification purposes, this situation would not be a problem. Other mail servers will report an error to the MAIL FROM command. In any case, if you send MAIL FROM and receive status code 250, you can finally send RCPT TO.
Handling Greylisting
Now we are finally allowed to send our RCPT TO command and there are several responses we can get. As we mentioned already, codes 250 and 251 means that the recipients address was accepted. In such a case, we can mark the email address as valid. We can also receive a permanent error in form of 5xx code. In this case, we can mark the email as invalid. However, we can also also receive a temporary error in form of 4xx code. What does this mean actually? It means that at this very moment, the server tells us that it will be unable to deliver our message for whatever reason. It is thus unlikely that the recipient’s mailbox is invalid, in which case the server could sent us a permanent error. So, it is likely that the server recognized that the mailbox is valid, but also detected a problem that prevents it to continue. We use the terms likely and unlikely on purpose – to really express uncertainty of these thoughts because we do not have any guarantee that the server performed validation of the mailbox before it run into the problem it reports. If greylisting did not exist, our verification work would be finished, and we would have to produce a verdict over the email address after we received a temporary error. In that case, we would say that email is probably valid.
Greylisting is a popular anti-spam method that exploits temporary error responses to RCPT TO command. Greylisting creates an artificial deliver problem to verify that the sender is a real message transfer agent that complies with the SMTP protocol, and not any simple spam sending program. When a proper MTA receives a temporary error, it does not give up entirely. It waits several minutes or hours and then tries again. This is something that spammers can rarely afford themselves to do. A mail server that implements greylisting maintains a database of triplets consisting of the client’s IP address, the sender’s address, and the recipient’s address. When a specific triplet is being seen for the first time, a temporary error is returned. If the client tries again after some time, it may be allowed, depending on how much time elapsed since the first attempt. Some servers are configured to allow attempts after a couple of minutes, some require longer periods of time, up to several hours.
If we want to support verification of emails managed by servers with greylisting, we have to react to temporary error response codes to RCPT TO by disconnecting from the server, waiting a certain period of time, and then try again. Note that trying another MX server, if it is available, will not help us here. Greylisting is usually implemented so that if one mail server has it enabled then all the domain’s mail servers have it. The database of triplets is shared among all the target servers, which means there is no benefit for us to try another server. We really need to wait and try again later. How long should we wait? A couple of minutes is reasonable before we try for the second time. But since some servers will require up to several hours of delay, it is a good idea to set a limit for ourselves after which we simply give up and finish with uncertain result of “email is probably valid”.
Many servers, when they return temporary error due to greylisting, provide an information that greylisting was performed and some servers even provide information on when they will accept a next attempt. However, these additional data may or may not be present, and most importantly, their format differs from one server software to another. So, it is not very beneficial for us to try to somehow parse them.
Here are some examples with RCPT TO commands:
C: RCPT TO:<test@gmail.com>
S: 550-5.1.1 The email account that you tried to reach does not exist. Please try
S: 550-5.1.1 double-checking the recipient's email address for typos or
S: 550-5.1.1 unnecessary spaces. Learn more at
S: 550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 q17si5372562igi.1 - gsmtp
C: RCPT TO:<goodaddress@gmail.com>
S: 250 2.1.5 OK q17si5372562igi.1 - gsmtp
In this example, we can see that test@gmail.com is not a valid email address, since permanent error code 550 is returned. Then we try goodaddress@gmail.com, which is confirmed to be a valid email address.
The second example show us a response of mail server that implements greylisting:
C: RCPT TO:<xxx@censored.pl>
S: 451 Temporary local problem - please try later
C: QUIT
... Connecting to server again after 2 minutes, sending EHLO, MAIL FROM ...
C: RCPT TO:<xxx@censored.pl>
S: 250 Accepted
In this case, an undisclosed Polish server implemented greylisting and reported temporary error 451 to our RCPT TO command. Then after 2 minutes the very same command was accepted.
Disposable Email Services and Handling Catch-all Address
If we determined that the given email address is valid, we might be interested to obtain a little more information about it. In our verifier we provide information whether the target email is hosted by disposable email service (also called 10-minute). This is done simply by looking up into a database of domains that these services use. Although this will never be 100% accurate as there always will be disposable email domains that we are not aware of, it is possible to cover the vast majority of these services.
Finally, we might want to recognize and report domains with enabled catch-all address. This is done simply using by sending an extra RCPT TO command with recipient address set to an email address that would not be valid unless catch-all mechanism is enabled. We simply generate a random string, long enough to avoid collision with any existing address. We use strings of 12 characters.
Here is an example of testing a normal email address followed by the test for catch-all address on mailinator.com, which is a disposable email service with catch-all address enabled.
S: 220 mail.mailinator.com ESMTP Postfix
C: EHLO mail.example.com
S: 250-mail.mailinator.com
S: 250-8BITMIME
S: 250-SIZE 150000
S: 250 Ok
C: MAIL FROM:<verifier@example.com>
S: 250 Ok
C: RCPT TO:<john@mailinator.com>
S: 250 Ok
C: RCPT TO:<ze6V7y6rcZEV@mailinator.com>
S: 250 Ok
C: QUIT
S: 221 Bye
At first the server confirmed that john@mailinator.com is a valid email address. Then we checked if catch-all address is enabled by verifying a random email address ze6V7y6rcZEV@mailinator.com. Since this also has been accepted, we conclude that catch-all address is enabled and thus we can not be sure whether john@mailinator.com was accepted because that mailbox existed or just because of the catch-all address.
Reference
The techniques described in this article were implemented in Bulk Email Verifier – a free email verification tool with API.