A short story about the IP ID Field
The IP ID Field
Today I want to tell something about the Identification Field of the IP Header often called the IP ID, it is a longer post than I have mentioned. I have posted a lot of RFC stuff here which you don´t need to read, because I will summarize it in this post. But I wanted to have the stuff here, to get a round picture.
When I started with network administration and analysis around the year 2000 we used high-speed internet connection with the amazing speed of 5,5 kBit/s (I mean seconds not ms) and fragmentation was a common behaviour, because not every device has supported the “Path MTU Discovery” in these days. By the way you could be pleased, if a device supported IP at least.
But back to the main topic: When IP Fragmentation is in use the IP ID Field becomes needed, because the IP stack knows on the basis of the IP ID and the Fragmentation Offset field how the Packets should be reassembled. But then “Path MTU Discovery”(PMTUD) became more and more popular and I almost forgot the IP ID field.
But in the last years the IP ID becomes more and more important again, because nearly all implementations use this field and increments it in the one or other way and with the IP ID it is mostly an easy way to identify duplicated frames or missing frames.
So far so good. Not such a long time ago I have analyzed an IPv6 Trace and wanted to do my preliminary checks but then I relized that the IP ID is gone, because fragmentation is done with a special additional header. That circumstance had shown me how important this field has become beside of his main purpose nowadays. So I began to interest me for this tiny unimposing field which seems to be there, for gratuitous. Always?
At First I have read the RFC 791 Internet protocol of the year 1981:
IP-Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Page 12 Identification Field: 16 bits An identifying value assigned by the sender to aid in assembling the fragments of a datagram. ... Page 8f Fragmentation Fragmentation of an internet datagram is necessary when it originates in a local net that allows a large packet size and must traverse a local net that limits packets to a smaller size to reach its destination. ... The internet fragmentation and reassembly procedure needs to be able to break a datagram into an almost arbitrary number of pieces that can be later reassembled. The receiver of the fragments uses the identification field to ensure that fragments of different datagrams are not mixed. ... The identification field is used to distinguish the fragments of one datagram from those of another. The originating protocol module of an internet datagram sets the identification field to a value that must be unique for that source-destination pair and protocol for the time the datagram will be active in the internet system. The originating protocol module of a complete datagram sets the more-fragments flag to zero and the fragment offset to zero. ... To assemble the fragments of an internet datagram, an internet protocol module (for example at a destination host) combines internet datagrams that all have the same value for the four fields: identification, source, destination, and protocol. The combination is done by placing the data portion of each fragment in the relative position indicated by the fragment offset in that fragment's internet header. The first fragment will have the fragment offset zero, and the last fragment will have the more-fragments flag reset to zero. ... Page24 Fragmentation and Reassembly. The internet identification field (ID) is used together with the source and destination address, and the protocol fields, to identify datagram fragments for reassembly. ... The fields which may be affected by fragmentation include: (1) options field (2) more fragments flag (3) fragment offset (4) internet header length field (5) total length field (6) header checksum if the Don't Fragment flag (DF) bit is set, then internet fragmentation of this datagram is NOT permitted, although it may be discarded. This can be used to prohibit fragmentation in cases where the receiving host does not have sufficient resources to reassemble internet fragments. One example of use of the Don't Fragment feature is to down line load a small host. A small host could have a boot strap program that accepts a datagram stores it in memory and then executes it. Page 27 An Example Reassembly Procedure For each datagram the buffer identifier is computed as the concatenation of the source, destination, protocol, and identification fields. If this is a whole datagram (that is both the fragment offset and the more fragments fields are zero), then any reassembly resources associated with this buffer identifier are released and the datagram is forwarded to the next step in datagram processing. If no other fragment with this buffer identifier is on hand then reassembly resources are allocated. The reassembly resources consist of a data buffer, a header buffer, a fragment block bit table, a total data length field, and a timer. The data from the fragment is placed in the data buffer according to its fragment offset and length, and bits are set in the fragment block bit table corresponding to the fragment blocks received. ... Page28 Procedure: (1) BUFID <- source|destination|protocol|identification; Page29 Identification The choice of the Identifier for a datagram is based on the need to provide a way to uniquely identify the fragments of a particular datagram. The protocol module assembling fragments judges fragments to belong to the same datagram if they have the same source, destination, protocol, and Identifier. Thus, the sender must choose the Identifier to be unique for this source, destination pair and protocol for the time the datagram (or any fragment of it) could be alive in the internet. It seems then that a sending protocol module needs to keep a table of Identifiers, one entry for each destination it has communicated with in the last maximum packet lifetime for the internet. However, since the Identifier field allows 65,536 different values, some host may be able to simply use unique identifiers independent of destination. It is appropriate for some higher level protocols to choose the identifier. For example, TCP protocol modules may retransmit an identical TCP segment, and the probability for correct reception would be enhanced if the retransmission carried the same identifier as the original transmission since fragments of either datagram could be used to construct a correct TCP segment.
Ok what have we learned so far…
The IP ID field is needed for the fragmentation process and MUST be a unique tupel in for the following combination of source|destination|protocol|identification
My interpretation of the RFC 791is:
The IP ID is only mandatory, if FRAGMENTATION is ALLOWED. But then the IP ID MUST be unique in the combination of source|destination|protocol|identification for 2MSL. It is not allowed to generate the IP ID based at a Layer4-Session.
Also we have seen so far that it is not allowed for any device to ignore the “Do not Fragment Bit” (DNF), so with knowing that, an unique IP ID is not mandatory in cases where “Do not Fragment” is set.
Let´s see what has happened after the year 1981.
At the year 1989 we got the we have the RFC 1122 ” Requirements for Internet Hosts — Communication Layers
220.127.116.11 Identification: RFC-791 Section 3.2 When sending an identical copy of an earlier datagram, a host MAY optionally retain the same Identification field in the copy. DISCUSSION: Some Internet protocol experts have maintained that when a host sends an identical copy of an earlier datagram, the new copy should contain the same Identification value as the original. There are two suggested advantages: (1) if the datagrams are fragmented and some of the fragments are lost, the receiver may be able to reconstruct a complete datagram from fragments of the original and the copies; (2) a congested gateway might use the IP Identification field (and Fragment Offset) to discard duplicate datagrams from the queue. However, the observed patterns of datagram loss in the Internet do not favor the probability of retransmitted fragments filling reassembly gaps, while other mechanisms (e.g., TCP repacketizing upon retransmission) tend to prevent retransmission of an identical datagram [IP:9]. Therefore, we believe that retransmitting the same Identification field is not useful. Also, a connectionless transport protocol like UDP would require the cooperation of the application programs to retain the same Identification value in identical datagrams.
Here they repeat the behaviour of the RFC 791, that a datagramm could be send with the same IP ID. But they don´t suggest it anymore.
And now we have the RFC 6864 “Updated Specifications of the IPv4 ID Field” of the year 2013
With the the meaningful name “Updated Specification of the IPv4 ID Field”. Wow an own RFC for the IP ID Field.
I have posted here just the stuff which I think is very important for everybody who is is interested at the IP protocol but I really recommend to read the RFC 6864. He is not long but full of actual information and behaviour.
IP-Header Updated Specification of the IPv4 ID Field Abstract The IPv4 Identification (ID) field enables fragmentation and reassembly and, as currently specified, is required to be unique within the maximum lifetime for all datagrams with a given source address/destination address/protocol tuple. If enforced, this uniqueness requirement would limit all connections to 6.4 Mbps for typical datagram sizes. Because individual connections commonly exceed this speed, it is clear that existing systems violate the current specification. This document updates the specification of the IPv4 ID field in RFCs 791, 1122, and 2003 to more closely reflect current practice and to more closely match IPv6 so that the field's value is defined only when a datagram is actually fragmented. It also discusses the impact of these changes on how datagrams are used. Introduction In IPv4, the Identification (ID) field is a 16-bit value that is unique for every datagram for a given source address, destination address, and protocol, such that it does not repeat within the maximum datagram lifetime (MDL) [RFC791] [RFC1122]. As currently specified, all datagrams between a source and destination of a given protocol must have unique IPv4 ID values over a period of this MDL, which is typically interpreted as two minutes and is related to the recommended reassembly timeout [RFC1122]. This uniqueness is currently specified as for all datagrams, regardless of fragmentation settings. Uniqueness of the IPv4 ID is commonly violated by high-speed devices; if strictly enforced, it would limit the speed of a single protocol between two IP endpoints to 6.4 Mbps for typical MTUs of 1500 bytes (assuming a 2-minute MDL, using the analysis presented in [RFC4963]). It is common for a single connection to operate far in excess of these rates, which strongly indicates that the uniqueness of the IPv4 ID as specified is already moot. Further, some sources have been generating non-varying IPv4 IDs for many years (e.g., cellphones), which resulted in support for such in RObust Header Compression (ROHC) [RFC5225]. This document updates the specification of the IPv4 ID field to more closely reflect current practice and to include considerations taken into account during the specification of the similar field in IPv6. ... Page3f 3. The IPv4 ID Field IP supports datagram fragmentation, where large datagrams are split into smaller components to traverse links with limited maximum transmission units (MTUs). Fragments are indicated in different ways in IPv4 and IPv6: o In IPv4, fragments are indicated using four fields of the basic header: Identification (ID), Fragment Offset, a "Don't Fragment" (DF) flag, and a "More Fragments" (MF) flag [RFC791]. o In IPv6, fragments are indicated in an extension header that includes an ID, Fragment Offset, and an M (more fragments) flag similar to their counterparts in IPv4 [RFC2460]. IPv6 fragmentation differs from IPv4 fragmentation in a few important ways. IPv6 fragmentation occurs only at the source, so a DF bit is not needed to prevent downstream devices from initiating fragmentation (i.e., IPv6 always acts as if DF=1). The IPv6 fragment header is present only when a datagram has been fragmented, or when the source has received a "packet too big" ICMPv6 error message indicating that the path cannot support the required minimum 1280-byte IPv6 MTU and is thus subject to translation [RFC2460] [RFC4443]. The latter case is relevant only for IPv6 datagrams sent to IPv4 destinations to support subsequent fragmentation after translation to IPv4. With the exception of these two cases, the ID field is not present for non-fragmented datagrams; thus, it is meaningful only for datagrams that are already fragmented or datagrams intended to be fragmented as part of IPv4 translation. Finally, the IPv6 ID field is 32 bits and required unique per source/destination address pair for IPv6, whereas for IPv4 it is only 16 bits and required unique per source address/destination address/protocol tuple. This document focuses on the IPv4 ID field issues, because in IPv6 the field is larger and present only in fragments. 3.1. Uses of the IPv4 ID Field The IPv4 ID field was originally intended for fragmentation and reassembly [RFC791]. Within a given source address, destination address, and protocol, fragments of an original datagram are matched based on their IPv4 ID. This requires that IDs be unique within the source address/destination address/protocol tuple when fragmentation is possible (e.g., DF=0) or when it has already occurred (e.g., frag_offset>0 or MF=1). Other uses have been envisioned for the IPv4 ID field. The field has been proposed as a way to detect and remove duplicate datagrams, e.g., at congested routers (noted in Section 18.104.22.168 of [RFC1122]) or in network accelerators. It has similarly been proposed for use at end hosts to reduce the impact of duplication on higher-layer protocols (e.g., additional processing in TCP or the need for application-layer duplicate suppression in UDP). This is discussed further in Section 5.1. The IPv4 ID field is used in some diagnostic tools to correlate datagrams measured at various locations along a network path. This is already insufficient in IPv6 because unfragmented datagrams lack an ID, so these tools are already being updated to avoid such reliance on the ID field. This is also discussed further in Section 5.1. The ID clearly needs to be unique (within the MDL, within the source address/destination address/protocol tuple) to support fragmentation and reassembly, but not all datagrams are fragmented or allow fragmentation. This document deprecates non-fragmentation uses, allowing the ID to be repeated (within the MDL, within the source address/destination address/protocol tuple) in those cases. ... Page6ff 4.0 Updates to the IPv4 ID Specification This document updates the specification of the IPv4 ID field in three distinct ways, as discussed in subsequent subsections: o Using the IPv4 ID field only for fragmentation o Encouraging safe operation when the IPv4 ID field is used o Avoiding a performance impact when the IPv4 ID field is used There are two kinds of datagrams, which are defined below and used in the following discussion: o Atomic datagrams are datagrams not yet fragmented and for which further fragmentation has been inhibited. o Non-atomic datagrams are datagrams either that already have been fragmented or for which fragmentation remains possible. This same definition can be expressed in pseudo code, using common logical operators (equals is ==, logical 'and' is &&, logical 'or' is ||, greater than is >, and the parenthesis function is used typically) as follows: o Atomic datagrams: (DF==1)&&(MF==0)&&(frag_offset==0) o Non-atomic datagrams: (DF==0)||(MF==1)||(frag_offset>0) 4.1 IPv4 ID Used Only for Fragmentation Although RFC 1122 suggests that the IPv4 ID field has other uses, including datagram de-duplication, such uses are already not interoperable with known implementations of sources that do not vary their ID. This document thus defines this field's value only for fragmentation and reassembly: >> The IPv4 ID field MUST NOT be used for purposes other than fragmentation and reassembly. Datagram de-duplication can still be accomplished using hash-based duplicate detection for cases where the ID field is absent (IPv6 unfragmented datagrams), which can also be applied to IPv4 atomic datagrams without utilizing the ID field [RFC6621]. In atomic datagrams, the IPv4 ID field has no meaning; thus, it can be set to an arbitrary value, i.e., the requirement for non-repeating IDs within the source address/destination address/protocol tuple is no longer required for atomic datagrams: >> Originating sources MAY set the IPv4 ID field of atomic datagrams to any value. Second, all network nodes, whether at intermediate routers, destination hosts, or other devices (e.g., NATs and other address- sharing mechanisms, firewalls, tunnel egresses), cannot rely on the field of atomic datagrams: >> All devices that examine IPv4 headers MUST ignore the IPv4 ID field of atomic datagrams. The IPv4 ID field is thus meaningful only for non-atomic datagrams -- either those datagrams that have already been fragmented or those for which fragmentation remains permitted. Atomic datagrams are detected by their DF, MF, and fragmentation offset fields as explained in Section 4, because such a test is completely backward compatible; thus, this document does not reserve any IPv4 ID values, including 0, as distinguished. Deprecating the use of the IPv4 ID field for non-reassembly uses should have little -- if any -- impact. IPv4 IDs are already frequently repeated, e.g., over even moderately fast connections and from some sources that do not vary the ID at all, and no adverse impact has been observed. Duplicate suppression was suggested [RFC1122] and has been implemented in some protocol accelerators, but no impacts of IPv4 ID reuse have been noted to date. Routers are not required to issue ICMPs on any particular timescale, and so IPv4 ID repetition should not have been used for validation purposes; this scenario has not been observed. Besides, repetition already occurs and would have been noticed [RFC1812]. ICMP relaying at tunnel ingresses is specified to use soft state rather than a datagram cache; for similar reasons, if the latter is used, this should have been noticed [RFC2003]. These and other legacy issues are discussed further in Section 5.1. ... Page 14ff Updates to Existing Standards The following sections address the specific changes to existing protocols indicated by this document. 6.1. Updates to RFC 791 RFC 791 states that: The originating protocol module of an internet datagram sets the identification field to a value that must be unique for that source-destination pair and protocol for the time the datagram will be active in the internet system. It later states that: Thus, the sender must choose the Identifier to be unique for this source, destination pair and protocol for the time the datagram (or any fragment of it) could be alive in the internet. It seems then that a sending protocol module needs to keep a table of Identifiers, one entry for each destination it has communicated with in the last maximum datagram lifetime for the internet. However, since the Identifier field allows 65,536 different values, some host may be able to simply use unique identifiers independent of destination. It is appropriate for some higher level protocols to choose the identifier. For example, TCP protocol modules may retransmit an identical TCP segment, and the probability for correct reception would be enhanced if the retransmission carried the same identifier as the original transmission since fragments of either datagram could be used to construct a correct TCP segment. This document changes RFC 791 as follows: o IPv4 ID uniqueness applies to only non-atomic datagrams. o Retransmitted non-atomic IPv4 datagrams are no longer permitted to reuse the ID value. 6.2. Updates to RFC 1122 RFC 1122 states in Section 22.214.171.124 ("Identification: RFC 791 Section 3.2") that: When sending an identical copy of an earlier datagram, a host MAY optionally retain the same Identification field in the copy. DISCUSSION: Some Internet protocol experts have maintained that when a host sends an identical copy of an earlier datagram, the new copy should contain the same Identification value as the original. There are two suggested advantages: (1) if the datagrams are fragmented and some of the fragments are lost, the receiver may be able to reconstruct a complete datagram from fragments of the original and the copies; (2) a congested gateway might use the IP Identification field (and Fragment Offset) to discard duplicate datagrams from the queue. This document changes RFC 1122 as follows: o The IPv4 ID field is no longer permitted to be used for duplicate detection. This applies to both atomic and non-atomic datagrams. o Retransmitted non-atomic IPv4 datagrams are no longer permitted to reuse the ID value. 6.3. Updates to RFC 2003 This document updates how IPv4-in-IPv4 tunnels create IPv4 ID values for the IPv4 outer header [RFC2003], but only in the same way as for any other IPv4 datagram source. Specifically, RFC 2003 states the following, where  refers to RFC 791: Identification, Flags, Fragment Offset These three fields are set as specified in ... This document changes RFC 2003 as follows: o The IPv4 ID field is set as permitted by RFC 6864. 7. Security Considerations When the IPv4 ID is ignored on receipt (e.g., for atomic datagrams), its value becomes unconstrained; therefore, that field can more easily be used as a covert channel. For some atomic datagrams it is now possible, and may be desirable, to rewrite the IPv4 ID field to avoid its use as such a channel. Rewriting would be prohibited for datagrams protected by the IPsec Authentication Header (AH), although we do not recommend use of the AH to achieve this result [RFC4302]. The IPv4 ID also now adds much less to the entropy of the header of a datagram. Such entropy might be used as input to cryptographic algorithms or pseudorandom generators, although IDs have never been assured sufficient entropy for such purposes. The IPv4 ID had previously been unique (for a given source/address pair, and protocol field) within one MDL, although this requirement was not enforced and clearly is typically ignored. The IPv4 ID of atomic datagrams is not required unique and so contributes no entropy to the header. The deprecation of the IPv4 ID field's uniqueness for atomic datagrams can defeat the ability to count devices behind a NAT/ASM/rewriter [Be02]. This is not intended as a security feature, however.
So what I wanted to show with this short story is that you do not need to be suprised if you see a packet with a strange IP ID behaviour. Because as long as the “Do not Fragment Bit”(DNF) is SET, the IP ID is not necessary. But if the DNF Bit is NOT SET then reusing of an IP ID is not allowed anymore for retransmitted datagrams.
And we have seen that this is the behaviour I thought about, while discussing the RFC 791.