<?xml version="1.0" encoding="US-ASCII"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
  <!ENTITY RFC2119 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY RFC4686 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4686.xml">
  <!ENTITY RFC5234 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml">
  <!ENTITY RFC5321 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5321.xml">
  <!ENTITY RFC5598 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5598.xml">
  <!ENTITY RFC6376 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6376.xml">
  <!ENTITY RFC8174 PUBLIC "" "https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
]>

<rfc ipr="trust200902" category="exp"
        docName="draft-kucherawy-dkim-anti-replay-01">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc tocdepth="4" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes" ?>
<?rfc subcompact="no"?>

<front>
	<title abbrev="DKIM Anti-Replay Canonicalization">
		Replay-Resistant DomainKeys Identified Mail (DKIM)
		Signatures
	</title>

	<author initials="M. S." surname="Kucherawy"
	        fullname="Murray S. Kucherawy"
		role="editor">

		<address>
			<email>superuser@gmail.com</email>
		</address>
	</author>

	<date year="2022"/>

	<area>Applications and Real-Time</area>

	<keyword>dkim</keyword>
	<keyword>email</keyword>
	<keyword>tag</keyword>

	<abstract>
		<t> DomainKeys Identified Mail (DKIM) provides a digital
		    signature mechanism for Internet messages, allowing a
		    domain name owner to affix its domain name in a way that
		    can be cryptographically validated. </t>

		<t> DKIM signatures protect the integrity of the message
		    header and body only.  By design, it decoupled
		    itself from the transport and storage mechanisms used to
		    handle messages.  This gives rise to a possible replay
		    attack, but the original DKIM specification fell short of
		    providing a mitigation strategy.  This document presents
		    an optional method for binding a signature to a specific
		    recipient or set of recipients so that broader replay
		    attacks can be mitigated. </t>
	</abstract>

</front>

<middle>
	<section anchor="intro" title="Introduction">
		<t> DomainKeys Identified Mail (DKIM) provides a digital
		    signature mechanism for Internet messages, allowing a
		    domain name owner to affix its domain name to a message
		    in a way that can be cryptographically validated. </t>

		<t> <xref target="RFC4686"/> presents the original threat
		    model DKIM was meant to address, and the environment in
		    which it was expected to work.  Notably, DKIM decoupled
		    itself from the transport of the message.  The theory
		    suggests it should be possible to validate a signature
		    whether a message is in situ (i.e., in an inbox on disk),
		    in transit between mail servers, or being retrieved through
		    a mailbox access protocol. </t>

		<t> In particular, this meant a DKIM signature can validate
		    irrespective of what is in the SMTP
		    <xref target="RFC5321"/> envelope containing it, or even
		    when there is no envelope to consider.  This means a message
		    and its signature can be re-sent to anyone simply by
		    changing the set of recipients in the envelope and
		    passing the message back to a Mail Transport Agent (MTA)
		    or Mail Submission Agent (MSA).
		    As the message itself is unaltered, any DKIM signature(s)
		    on it will continue to validate.  This is a form of replay
		    attack, and it relies for its success on the perceived
		    value (i.e., reputation) of the domain(s) named in the
		    signature(s). </t>
		
		<t> This document describes a mechanism by which a signature
		    and a message can be coupled such that successful replays to
		    other recipient sets are not possible, as the signature
		    will no longer validate.  </t>
	</section>

	<section anchor="definitions" title="Definitions">
		<section anchor="read_first" title="Recommended Reading">
			<t> Several terms used in this document are based on
			    their definitions in <xref target="RFC5598"/>. </t>

			<t> The term "envelope recipient" is, using the
			    notation proposed in that document, an
			    RFC5321.RcptTo address. </t>
		</section>

	<section anchor="keywords" title="Requirements Language">
			<t> The key words "MUST", "MUST NOT", "REQUIRED",
			    "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
			    "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
			    "OPTIONAL" in this document are to be interpreted
			    as described in BCP 14 <xref target="RFC2119"/>
			    <xref target="RFC8174"/> when, and only when, they
			    appear in all capitals, as shown here. </t>
		</section>
	</section>

	<section anchor="tag" title="The 'e' Tag">
		<section anchor="tag_syntax" title="Syntax">
			<t> Using ABNF <xref target="RFC5234"/>, the syntax
			    for the new tag is:
			    <figure> <artwork>
    sig-e-tag = %x65 [FWS] "=" %x79
			    </artwork> </figure> </t>
		</section>

		<section anchor="tag_general" title="General Definition">
			<t> This section introduces the "e" (for "envelope")
			    tag, a new DKIM signature tag that can be used
			    by a signer to indicate that signature will only
			    validate for a specific envelope recipient set,
			    namely the one associated with the
			    message at the time it was signed. </t>

			<t> DKIM signers and verifiers to date have no reason
			    to be interested in any aspect of the envelope
			    used to transport a message.  This
			    sort of verification is not possible without that
			    context being available, which may prove to be a
			    challenge to some operating environments.  Also,
			    this will make it impossible to validate a DKIM
			    signature using this algorithm in a context where
			    no envelope exists, such as when retrieving a
			    message from a mailbox. </t>

			<t> The expected value of the tag is simply the
			    character "y", though other values may be
			    introduced by future work.  The value has no
			    particular meaning; the presence of the tag
			    is the important signal. </t>

			<t> [FOR DISCUSSION] Maybe this should be "r",
			    indicating "recipients", to allow later extensions
			    to include other parts of the envelope that
			    might be helpful to include. </t>

			<t> The presence of this tag in a DKIM signature
			    indicates that the signer executed a modified
			    version of the algorithm described in Section 3.7
			    of <xref target="RFC6376"/>, and the verifier
			    MUST do the same.  The modification inserts
			    the envelope recipients available at signing
			    or verification time into the data fed to the
			    hash algorithm to either produce or verify the
			    DKIM signature. </t>

			<section anchor="tag_mods" title="Modified Algorithm">
				<t> This section specifies the modified version
				    of the algorithm defined in Section 3.7
				    of <xref target="RFC6376"/>. </t>

				<t> The pseudo-code of "data-hash" is replaced
				    as follows:
				    <figure> <artwork>
  OLD:

    data-hash = hash-alg (h-headers, D-SIG, body-hash)

  NEW:

    data-hash = hash-alg (recipients, h-headers, D-SIG, body-hash)
				    </artwork> </figure> </t>

				<t> The definition of "data-hash" is replaced
				    as follows:

				    <figure> <artwork>
  OLD:

    data-hash: is the output from using the hash-alg algorithm, to hash
               the header including the DKIM-Signature header, and the
               body hash.

  NEW:

    data-hash: is the output from using the hash-alg algorithm to hash
               the recipients, the header including the DKIM-Signature header,
               and the body hash.
				    </artwork> </figure> </t>

				<t> "recipients" is determined as follows:
				    <list style="numbers">
					<t> Collect all envelope recipients
					    into a list. </t>

					<t> Sort them in typical lexical ASCII
					    order. </t>

					<t> Format the list by concatenating
					    them all in this sorted order,
					    separated by CRLF strings (ASCII
					    13 followed by ASCII 10), and with
					    the last one terminated by a
					    CRLF. </t>
				    </list> </t>

				<t> The signing and verifying processes
				    defined for DKIM are otherwise
				    unmodified. </t>
			</section>
		</section>

		<section anchor="tag_example" title="Example">
			<t> Consider the following SMTP transaction, wherein
			    "C" denotes something sent by an SMTP client,
			    "S" denotes something sent by an SMTP server,
			    and terminating CRLFs in both directions are
			    omitted:

			    <figure><artwork>
  C: MAIL FROM:&lt;msk@example.net>
  S: 250 Sender OK
  C: RCPT TO:&lt;bob@example.com>
  S: 250 Recipient OK
  C: RCPT TO:&lt;alice@example.com>
  S: 250 Recipient OK
  C: DATA
  S: 354 Go ahead
  [message header omitted]

  [message body omitted]
  .
  C: 250 Message delivered
			    </artwork></figure></t>

			<t> Compared to the standard signatures that would
			    be generated or verified in the absence of this
			    tag, the process described above would work
			    the same way as the standard signing process
			    would, except that the content
			    fed to the hash algorithm would be preceded by:

			    <figure><artwork>
  alice@example.com&lt;CR>&lt;LF>
  bob@example.com&lt;CR>&lt;LF>
			    </artwork></figure></t>
		</section>
	</section>

	<section anchor="discussion" title="Discussion">
		<t> Use of this tag guarantees that a signature
		    will not verify unless sent to exactly the same set of
		    envelope recipients as was present in the envelope when
		    the message was prepared for signing.  The fact that the
		    recipient set is sorted allows verifiers to tolerate any
		    reordering of the envelope that may be done in
		    transit.  However, if any original recipient is removed,
		    or any new recipient added, the signature will not validate
		    because the content passed to the hash step at the verifier
		    will differ from what was done at the signer.  Thus, in
		    the replay scenario described in <xref target="intro"/>,
		    the signature no longer validates. </t>

		<t> Anecdotal evidence suggests that the bulk of Internet
		    message traffic is single-recipient traffic already,
		    which implies the success of this proposal.
		    However, since the messaging standards both permit and
		    even encourage this "common factoring" of traffic,
		    and this evidence has not been broadly verified, it
		    is appropriate to consider all possibilities. </t>

		<t> In the absence of an SMTP envelope in the verification
		    environment, the DKIM implementation SHOULD indicate that
		    the signature cannot be verified, as distinct from
		    considering such validation to have failed. </t>

		<t> If the need to be able to validate a signature from storage
		    (without an envelope) needs to be preserved, the signer
		    can still add a second signature not using this tag, which
		    therefore does not need the envelope context to verify.
		    This, however, requires the verifier to understand when it
		    is appropriate to use which signature. </t>

		<t> Since <xref target="RFC6376"/> stipulates that unknown tags
		    are to be ignored, there will be a possibly substantial
		    time period during which the tag is unknown to receivers.
		    Operators should expect these signatures to fail broadly
		    during any early deployment period, even for non-replay
		    messages, and it may be some time before meaningful signal
		    begins to appear. </t>

		<t> Note that this mechanism is fragile in the modern Internet
		    message ecosystem.  Some scenarios that will yield false
		    negatives with this method are described below. </t>

		<section anchor="discuss_rewrite" title="Recipient Mutations">
			<t> If a receiving MTA notes that one of the envelope
			    recipients refers to a mailbox in a domain for
			    which it has administrative authority, but is known
			    to be an alias, it may rewrite that envelope into
			    its canonical form.  For instance, if a receiving
			    MTA is officially known as the mail server for
			    "example.com", but also accepts mail for its users
			    when addressed to "example.net", it may alter that
			    latter address in the envelope to refer to its
			    canonical name.  This alters the recipient list,
			    and thus alters the content passed to the hash
			    algorithm when validating the signature,
			    leading to a failure. </t>

			<t> Since hostnames are generally case-insensitive on
			    the Internet, a relay MTA might (improperly) fold
			    a hostname to lowercase.  This too would invalidate
			    a signature making use of this protocol. </t>
		</section>

		<section anchor="discuss_split" title="Envelope Splitting">
			<t> If a message contains envelope recipients at
			    domains served by separate MTAs,
			    <xref target="RFC5321"/> compels the handling MTA
			    to split the message, creating two envelopes
			    containing identical content.  The first of these
			    will be addressed to one recipient and sent on
			    its way; the second will be addressed to the other
			    and sent via its own route. </t>

			<t> Upon arrival at either DKIM verifier, the 
			    recipient list has effectively been altered since
			    signing.  This alters the content passed to the hash
			    algorithm when validating the signature,
			    leading to a failure. </t>

			<t> This can be avoided by arranging that no envelope
			    ever has more than a single recipient, but this
			    renders useless an important "common factoring"
			    feature of SMTP.  In the case of a mailing list
			    server that may need to distribute a single message
			    to a very large number of recipients, this method
			    can impose significant compute or storage
			    costs. </t>
		</section>
	</section>

	<section anchor="iana" title="IANA Considerations">
		<t> IANA is asked to make the following entry in the
		    "DKIM-Signature Tag Specifications" sub-registry of
		    the "DKIM Parameters" registry group:

		    <list style="hanging">
			<t hangText="Type:"> e </t>
			<t hangText="Reference:"> [this document] </t>
			<t hangText="Status:"> active </t>
		    </list> </t>
	</section>

	<section anchor="security" title="Security Considerations">
		<t> All of the security considerations of
		    <xref target="RFC6376"/> apply when applying the
		    modification described here. </t>

		<t> A signer that is forced to generate independently signed
		    messages for each recipient in a situation where large
		    recipient lists are common could be exploited to cause
		    a denial-of-service attack simply from the fact that
		    there is an amplication of work being done. </t>

		<t> The loss of the ability to verify messages signed using
		    this tag when extracted from their mailboxes will have
		    unknown security impact.  Although DKIM intentionally
		    supports this capability, it is not known whether it
		    is widely used. </t>
	</section>
</middle>

<back>
	<references title="Normative References">
		&RFC2119;
		&RFC5234;
		&RFC5321;
		&RFC6376;
		&RFC8174;
	</references>

	<references title="Informative References">
		&RFC4686;
		&RFC5598;
	</references>

	<section anchor="acks" title="Acknowledgments">
		<t> The author wishes to thank
		Dave Crocker
		for his contributions to this work. </t>
	</section>
</back>

</rfc>
