confused@rfc5322.com


Quick links: Source codeEmail address validators head-to-head

Do you think this is a valid email address?

“”@example.com

No, me neither. In one sense it clearly isn’t since there is no mailbox of that name at example.com’s mail host. Send mail to that address and you won’t find anybody reading it.

In another sense, however, it is a perfectly good email address. That is, if you believe the RFCs that define how email addresses should be constructed. Here’s the BNF from RFC5322 for example:

addr-spec       =       local-part "@" domain
local-part      =       dot-atom / quoted-string / obs-local-part
quoted-string   =       [CFWS]
                        DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                        [CFWS]

Don’t worry if you can’t follow the BNF syntax – I couldn’t either until this week. Focus on the asterisk before ([FWS] qcontent). That asterisk means the contents of that bracket can occur zero or more times.

So tracing it through the specification:

  1. An email address is a local-part followed by an @ sign followed by a domain
  2. A local-part can be a quoted-string
  3. A quoted-string is a pair of double quotes surrounding zero or more characters.

“”@example.com follows these rules perfectly. And yet common sense suggests this is a preposterous address. Where should the receiving host deliver it?

I have had some discussions about this with Cal Henderson and we are somewhat at a loss. Both of us have functions that validate email addresses and we cannot decide whether to follow the RFCs or common sense.

I could suggest the IETF add an erratum to their RFC but this format is also documented in RFC 5321 as well. No amount of published errata will correct the impression given by two published RFCs.

Perhaps we are wrong – can you think of a valid purpose for an email address of this format?

Quick links: Source codeEmail address validators head-to-head

Advertisements

27 Responses to “confused@rfc5322.com”


  1. 1 Andrew February 1, 2010 at 23:12

    If someone were to control the MX records for a domain name, could they not create an account with an empty local part?

    Or if not, presumably it would be a way to ensure an email was sent to the default account; that account which catches all unrouted mail to that domain.

    For example, let us say that I have the domain andrewsurname.com, and have it hosted on some popular hosting company’s surver(s). The default email account would be something like andrew123. This email account would catch all mail sent to FakeAccount@andrewsurname.com.

    Personally, I think that if you’re writing a validator that allows for commented obsolete quoted string local part email addresses with IPv6 address domain literals and folding white spaces, then you shouldn’t then decide not to allow for empty quoted strings because it’s not very practical.

    • 2 Dominic Sayers February 1, 2010 at 23:22

      Well it’s not the MX records, it’s the software running on the mailbox storage server. The Robustness Principle says that all the smtp servers on the way should pass along the message to the appropriate host and let the host deal with it. If the software lets you create a mailbox in violation of the RFCs then the internet should do its best to deliver it.

      I completely agree with your last paragraph. My intention was to write an RFC-compliant validator, but some days I just let common sense get the better of me. I should have stuck to one or the other. I hope to do a new version (when I have time) that lets you switch between absolute compliance and the real world.

      • 3 Andrew February 4, 2010 at 21:41

        As an added note, I’ve recently discovered that the maximum length of a domain name is 253 characters, not 255: “On the wire and in the internal binary storage format it can be at most 255 octets as per RFC 1034 section 3.1 … [which] can be represented in traditional dot notation as 253 characters”.

        And as the maximum length of an email address is 254 characters, as you yourself have mentioned in an errata to RFC 3696, a 253-character length domain name, plus the @ symbol, reaches the limit of 254. Thus, apparently, it cannot have email addresses associated with it. Unless, of course, an empty quoted string is used as the local-part (the double quotes are semantically invisible).

      • 4 Dominic Sayers February 5, 2010 at 09:21

        Hi Andrew. Where’s the quote in your first paragraph from? I’d like to read the rest of the article.

      • 5 Andrew February 6, 2010 at 18:04

        The quote is from the Wikipedia article on domain names. The reference is to http://www.ops.ietf.org/lists/namedroppers/namedroppers.2003/msg00964.html which received two responses, both confirming that 253 is the limit of a domain name.

        The 255 CHARACTER limit (as opposed to OCTET) is for HOSTnames, which are different to DOMAIN names (so some hostnames cannot be stored on the DNS). Unless the RFCs are not using the term “domain name” correctly when giving the format of email addresses, 253 is indeed the maximum length of the text after the @ symbol.

        This does, however, seem to conflict with domain names being able to have 128 labels. 128 labels of a single character, plus the 127 separator dots, makes 255. Which is too long. The only solution to this problem is if one of these labels is the (empty) root domain. Which makes the TLD the SECOND label, not the first. There could then be 127 (explicit) labels in a domain name, with 126 separator dots, brining the total to 253; the apparent limit.

  2. 6 Stu June 12, 2009 at 12:51

    I hope your validator (and your friends) works with ‘+’ as it’s annoying when I can’t use it.

    One school of thought that says you shouldn’t try and valdiate – just send an email and if it doesn’t work it’s invalid.

  3. 8 james woodyatt June 12, 2009 at 00:07

    FWIW, I can think of a legitimate reason to use the “”@[domain] email address.

    I know some people who deliberately use email addresses [not this one, but different ones] that are designed to exploit obscure corners of the address grammar precisely because they don’t want address harvesters written by poorly-trained skript-kiddies using brain-dead regular expression processors to see them so easily.

    I’m not sure you care about giving those people the hand and forcing them to use a more readily scanned and identifiable email address on your form, but it’s a consideration.

    • 9 Dominic Sayers June 12, 2009 at 07:54

      Interesting point. I agree with you in general (that an obscure address might have some security advantages) but I think we could afford to lose this particular backwater of the address namespace :-)

      Also the security advantages of an obscure email address might well be outweighed by the inconvenience of having it rejected by all those brain-dead validators.

      If you’re worried about people harvesting your address then just use a public one like I do (dominic_sayers@hotmail.com). I get buckets of spam to this address but I only use it for website registrations so who cares?.

  4. 10 james woodyatt June 11, 2009 at 19:09

    It seems to me that this is a question of how to properly apply Postel’s Robustness Principle to the problem at hand.

    You need to ascertain whether accepting “”@ as a valid email address in your application would encourage or discourage the use of non-compliant and possibly non-interoperable implementations of email elsewhere in the network.

    Would it? Maybe, if it has any effect at all, it will help discourage non-compliant implementations rather encourage them. If so, then you really aren’t doing anyone any favors by preventing the use of an otherwise RFC-compliant email address; you are simply introducing a failure mode that otherwise wouldn’t be there.

    On the other hand, are the deficiencies in the software of popular email implementations that will be exacerbated by allowing an obscure but legal form of email address? If so, then it may be sensible to discourage, if not completely forbid, the use of “”@ as an email address.

  5. 12 "" June 11, 2009 at 18:06

    “Common sense is the collection of prejudices acquired by age eighteen.”
    –Albert Einstein

    It seems pretty clear that the RFC intends that within a domain,
    mailboxes are identified by strings. The empty string is a perfectly valid string. Your common sense tells you that you shouldn’t use the empty string to name stuff, but someone else’s common sense may have no problem with that. The Windows registry, for example, contains tons of values whose names are the empty string.

    If an RFC disagress with “common sense”, you should assume that your common sense is wrong. It might not be, but that should be the starting point. I’m surprised you’d even consider saying that this address is invalid, given that you’re currently suffering under someone else’s misconception about what is or isn’t a valid address.

    p.s., How ironic: your blog refuses to accept “”@ as my email address, even though that address works fine. (I just tested it.)

    • 13 Dominic Sayers June 11, 2009 at 19:48

      Hello “” and thanks for your thoughts.

      I agree that my common sense may not anticipate all possible reasons for choosing an abnormal address. This is the mistake people make when they filter out addresses with a “+” sign in them (still all too common).

      I did start with the assumption that the RFC was right and my common sense was wrong, but now I’ve studied the evolution of this standard from Jon Postel’s original specification through RFCs 822, 2822 and 5322 I’m afraid I disagree with the RFC on this particular point. RFCs are occasionally wrong (check the errata if you don’t believe me). I think the quoted empty string is only there now for backwards compatibility, and it was only there originally because of a drafting error.

      I have had an erratum accepted by the IETF during my research into this issue: http://www.rfc-editor.org/errata_search.php?eid=1690

    • 14 Dominic Sayers June 11, 2009 at 19:51

      Regarding the irony of WordPress not accepting “”@ as your email address, I would say two things: firstly WordPress code is out of my hands and it’s quite a big job to move my blog just for this reason. Secondly “”@ isn’t a valid address because it doesn’t have a domain part. Here’s the EBNF extracted from RFC5322 if you want to check it: http://www.dominicsayers.com/isemail/isemail/RFC5322BNF.html

      • 15 "" June 11, 2009 at 21:40

        WordPress, presumably in an attempt to prevent XSS, ate part of my post. The email address I tried to use was “”@example.com, with example.com replaced by my own domain name.

  6. 16 Mike Tomasello June 11, 2009 at 17:44

    It depends on what the situation warrants:

    If you are doing the validation to make sure that you can send e-mail to the address given, then you might want to forget the RFC and check what addresses your mail tools can handle.

    You can also do ‘SMTP validation’ – connect to the given server and trying a RCPT TO to see if the address is accepted. This ensures it’s a real address at least, that the receiving server can handle. This approach is not reliable enough to check whether an address is *not* going to be accepted, but can be an acceptable guarantee that a given address *will* be accepted.

    You make allusions that this approach is to alert users when you think they have made mistakes. In that case this is a no-brainer: accept all valid addresses, but warn users if they enter something that doesn’t meet the usual standard for an e-mail address. Tell them something as simple as “this e-mail address is weird, are you sure it’s correct?” A little more work for the developer/UI designer, but it’s the best approach.

    The thing to take away is that validators for data input can often incorrectly be seen as things that take some data and either say ‘YES’ or ‘NO’. In fact, there are shades of grey where you still want to be able to let data through whilst letting the user know that you’re suspicious of its correctness.

    • 17 Dominic Sayers June 11, 2009 at 19:42

      Hi Mike and thanks for these thoughtful comments. The validator I eventually wrote does indeed do a DNS lookup on the MX record for the domain (it’s an option to the function) and the existence of the mailbox is checked by sending a verification email to it in the wider registration application. The validator is described here: http://www.dominicsayers.com/isemail.

      I fully agree with your other comments. If I get the time I plan to revise the validation function so it does an absolutely canonical RFC check, then optionally adds some business rules to check for common sense issues that are probably typos in the real world.

  7. 18 David Magda June 11, 2009 at 16:06

    > “”@example.com follows these rules perfectly. And yet
    > common sense suggests this is a preposterous address.
    > Where should the receiving host deliver it?

    What the receiving host does is nonoe of your concern (beyond idle curiosity). As long as it basses the BNF, just accept it.

    There are many places where I try to enter addresses of the form “user+foo@domain.com”. Many, many, many web forms do not accept the “+foo” form even though it’s been valid and used since the original e-mail standard back in the ’80s (RFC 822).

    It is not your job to try to guess what the user is doing. It’s your job to validate the input to see if it passes the BNF and that it won’t hose your own systems, but beyond that leave it alone.

    Another obscure, yet valid and sometimes useful, character for e-mail addresses is the percent symbol (‘%’), but I’m guessing it wouldn’t get by most filters as well.

    In this case, the domain-part (example.com) is invalid because it’s reserved, but beyond that you shouldn’t care.

    • 19 Dominic Sayers June 11, 2009 at 16:24

      Hi David and welcome to the conversation.

      You are absolutely correct as far as the RFCs are concerned, and I wrote about some of the validation mistakes people make here: http://www.dominicsayers.com/isemail

      My point in this article was that in the real world, what is the point of an address like this? I wrote my address validator to serve a real-world purpose which was to detect invalid email addresses entered during a website registration process. I did not want to reject valid addresses so I went for full RFC compliance.

      As I examined the BNF it became clear there were valid addresses that were vanishingly unlikely to be used in the real world. If somebody entered “”@example.com as their address it’s much more likely they made a typo than this is their real address.

      So my complaint is that the RFCs have their shortcomings. You can have RFC compliance or you can have common sense in your validator.

      If I get time to do another version I will introduce a switch that allows you to choose full compliance (the default) or add some business rules which reject addresses that are more likely to be typos.

      Thanka again for your contribution.

      • 20 David Magda June 11, 2009 at 19:09

        > My point in this article was that in the real world, what is
        > the point of an address like this?

        To keep spammers at bay for one.

        Very few people use ‘+’ as well, but if it’s not covered in a filter it’s very annoying to those of us that do (e.g., mailing list filtering)–and (e.g.) Gmail can use such addresses, though Exchange (to take one example) ignores them. Ditto for something like ‘^’ and ‘%’–useful for spam filtering and tracking whose giving trading my e-mail address.

        Just because you don’t use it doesn’t mean other people don’t. Follow the BNF (and the reserved domain RFCs), and don’t try to think you know best. :)

        As has been said by Doug Gwyn: “UNIX was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things.” The people who use these strange characters are probably thinking in clever (sometimes too clever) ways.

        As for typos, that’s why a lot of places ask you to type your address twice (just like password fields usually do).

      • 21 Dominic Sayers June 11, 2009 at 19:55

        Good advice, thanks David

  8. 22 John June 11, 2009 at 14:19

    I suppose the right thing to do is to send it to the mail server at example.com and let that server figure it out. If the remote mail server accepts the mail, then you assume it knows how to deliver it. You also assume that the remote server will give an error, e.g., 550, if it cannot deliver to that address.

    Unfortunately, with SMTP, it is impractical to write a client side function to catch all possible invalid e-mail addresses. In this case, it is best to be safe than sorry and allow the user to put in an potential made-up e-mail address (since users use fake addresses all the time anyway).

    • 23 Dominic Sayers June 11, 2009 at 14:30

      Hi John and thanks for your comment too.

      You make two very good points: firstly, it’s always good to try the address with its mail server if you can. That’s the definitive test of whether the address is valid. Secondly, you shouldn’t reject valid email addresses. My web host’s web mail doesn’t let me put “+” signs in an email address, for instance. This is just infuriating.

      By client-side function do you mean a browser-based function in, for example, javascript? I don’t believe this would be impossible – it should be possible to translate my PHP function into javascript without problem (I may do this myself if I have time).

      The point is, it only needs doing once. If the solution is open source then nobody ever needs to write that function again (until the RFCs change of course!).

      • 24 John June 11, 2009 at 17:38

        When I refer to client side, I mean validating the user input without the ability to connect to the destination mail server and ask the destination server to check it. So it could be Javascript, php, etc.,

        As far as then open source implementation of address validation, there is something like:

        http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

        I’m not sure if it does the kind of validation you need.

        From a practicality point of view – assuming there is an open source function that tells you whether an e-mail has an invalid format, it will not get rid of fake e-mails, e.g., “a@test.com”. Usually people (paying clients) are looking for a way to cut down on the number of “obviously fake” e-mails, instead of eliminating syntactically invalid e-mails.

      • 25 Dominic Sayers June 11, 2009 at 19:36

        When I was looking for a correct validator I saw the one you mention. It is one of the reasons I wrote my own: http://www.dominicsayers.com/isemail

        The clue is in the name: RFC822 was superseded by RFC2822 which was then obsoleted by RFC5322. And in any case, I don’t believe a regex, however complicated, can deal with the complex issues of nested comments embedded within the email address. And a regex this complicated is to all intents and purposes completely unmaintainable.

        The original purpose of my validator was to do an initial check on addresses entered into a website registration page. The issue of fake addresses was solved by sending a verification email to the address before the account was fully enabled.

  9. 26 Corey Connor June 11, 2009 at 14:02

    It’s not a valid email address.

    See RFC2606
    http://www.rfc-editor.org/rfc/rfc2606.txt

    • 27 Dominic Sayers June 11, 2009 at 14:16

      Hi Corey and thanks for your comment.

      You are right, of course, in that example.com doesn’t have any usable mailboxes at all.

      I presume you are being drole here, but for the benefit of people who might not get your joke it’s worth pointing out that I used example.com because that’s the convention when discussing internet hosts and domains.

      There are no real mailboxes at example.com. The point of my post was to discuss whether an empty quoted string was a valid address at any mail host.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: