doc/users-guide/users-guide-packet.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536

== Packet Processing
ODP applications are designed to process packets, which are the basic unit of
data of interest in the data plane. To assist in processing packets, ODP
provides a set of APIs that enable applications to examine and manipulate
packet data and metadata. Packets are referenced by an abstract *odp_packet_t*
handle defined by each implementation.

Packet objects are normally created at ingress when they arrive at a source
*odp_pktio_t* and are received by an application either directly or (more
typically) via a scheduled receive queue. They MAY be implicitly freed when
they are transmitted to an output *odp_pktio_t* via an associated transmit
queue, or freed directly via the `odp_packet_free()` API.

Occasionally an application may originate a packet itself, either directly or
by deriving it from an existing packet, and APIs are provided to assist in
these cases as well. Application-created packets can be recycled back through
a _loopback interface_ to reparse and reclassify them, or the application can
do its own parsing as desired.

Various attributes associated with a packet, such as parse results, are
stored as metadata and APIs are provided to permit applications to examine
and/or modify this information.

=== Packet Structure and Concepts
A _packet_ consists of a sequence of octets conforming to an architected
format, such as Ethernet, that can be received and transmitted via the ODP
*pktio* abstraction. Packets have a _length_, which is the number of bytes in
the packet. Packet data in ODP is referenced via _offsets_ since these reflect
the logical contents and structure of a packet independent of how particular
ODP implementations store that data.

These concepts are shown in the following diagram:

.ODP Packet Structure
image::packet.svg[align="center"]

Packet data consists of zero or more _headers_ followed by 0 or more bytes of
_payload_, followed by zero or more _trailers_.  Shown here are various APIs
that permit applications to examine and navigate various parts of a packet and
to manipulate its structure.

To support packet manipulation, predefined _headroom_ and _tailroom_
areas are logically associated with a packet. Packets can be adjusted by
_pulling_ and _pushing_ these areas. Typical packet processing might consist
of stripping headers from a packet via `odp_packet_pull_head()` calls as part of
receive processing and then replacing them with new headers via
`odp_packet_push_head()` calls as the packet is being prepared for transmit.
Note that while headroom and tailroom represent reserved areas of memory, these
areas are not addressable or directly usable by ODP applications until they are
made part of the packet via associated push operations. Similarly, bytes
removed via pull operations become part of a packet's headroom or tailroom
and are again no longer accessible to the application.

=== Packet Segments and Addressing
ODP platforms use various methods and techniques to store and process packets
efficiently. These vary considerably from platform to platform, so to ensure
portability across them ODP adopts certain conventions for referencing
packets.

ODP APIs use a handle of type *odp_packet_t* to refer to packet objects.
Associated with packets are various bits of system metadata that describe the
packet. By referring to the metadata, ODP applications accelerate packet
processing by minimizing the need to examine packet data. This is because the
metadata is populated by parsing and classification functions that are coupled
to ingress processing that occur prior to a packet being presented to the
application via the ODP scheduler.

When an ODP application needs to examine the contents of a packet, it requests
addressability to it via an API call that makes the packet (or a contiguously
addressable _segment_ of it) available for coherent access by the application.
To ensure portability, ODP applications assume that the underlying
implementation stores packets in _segments_ of implementation-defined
and managed size. These represent the contiguously addressable portions of a
packet that the application may refer to via normal memory accesses. ODP
provides APIs that allow applications to operate on packet segments in an
efficient and portable manner as needed. By combining these with the metadata
provided by packets, ODP applications can operate in a fully
platform-independent manner while still achieving optimal performance across
the range of platforms that support ODP.

The use of segments for packet addressing and their relationship to metadata
is shown in this diagram:

.ODP Packet Segmentation
image::segment.svg[align="center"]

The packet metadata is set during parsing and identifies the starting offsets
of the various headers in the packet. The packet itself is physically stored
as a sequence of segments that area managed by the ODP implementation.
Segment 0 is the first segment of the packet and is where the packet's headroom
and headers typically reside. Depending on the length of the packet,
additional segments may be part of the packet and contain the remaining packet
payload and tailroom. The application need not concern itself with segments
except that when the application requires addressability to a packet it
understands that addressability is provided on a per-segment basis. So, for
example, if the application makes a call like `odp_packet_l4_ptr()` to obtain
addressability to the packet's Layer 4 header, the returned length from that
call is the number of bytes from the start of the Layer 4 header that are
contiguously addressable to the application from the returned pointer address.
This is because the following byte occupies a different segment and may be
stored elsewhere. To obtain access to those bytes, the application simply
requests addressability to that offset and it will be able to address the
packet bytes that occupy the next segment, etc. Note that the returned
length for any packet addressability call is always the lesser of the remaining
packet length or size of its containing segment.  So a mapping for segment 2
in the above figure, for example, would return a length that extends only to
the end of the packet since the remaining bytes are part of the tailroom
reserved for the packet and are not usable by the application until made
available to it by an appropriate API call.

While the push/pull APIs permit applications to perform efficient manipulation
of packets within the current segment structure, ODP also provides APIs that
permit segments to be added or removed. The `odp_packet_extend_head()` and
`odp_packet_trunc_head()` APIs permit segments to be added or removed from
the beginning of a packet, while `odp_packet_extend_tail()` and
`odp_packet_trunc_tail()` permit segments to be added or removed from the end
of a packet. Extending a packet adds one or more segments to permit packets to
grow up to implementation-defined limits. Truncating a packet removes one or
more segments to shrink the size of a packet beyond its initial or final
segment.

=== Metadata Processing
As noted, packet metadata is normally set by the parser as part of
classification that occurs during packet receive processing. It is important
to note that this metadata may be changed by the application to reflect
changes in the packet contents and/or structure as part of its processing of
the packet. While changing this metadata may effect some ODP APIs, changing
metadata is designed to _document_ application changes to the packet but
does not in itself _cause_ those changes to be made. For example, if an
application changes the Layer 3 offset by using the `odp_packet_l3_offset_set()`
API, the subsequent calls to `odp_packet_l3_ptr()` will return an address
starting from that changed offset, changing an attribute like
`odp_packet_has_udp_set()` will not, by itself, turn a non-UDP packet into
a valid UDP packet. Applications are expected to exercise appropriate care
when changing packet metadata to ensure that the resulting metadata changes
reflect the actual changed packet structure that the application has made.

=== Packet Manipulation
ODP Packet manipulation APIs can be divided into two categories: Those
that do not change a packet's segment structure, and those that potentially do
change this structure. We've already seen one example of this. The push/pull
APIs permit manipulation of packet headroom/tailroom that does not result in
changes to packet segmentation, while the corresponding extend/trunc APIs
provide the same functionality but with the potential that segments may be
added to or removed from the packet as part of the operation.

The reason for having two different types of APIs that perform similar
functions is that it is expected that on most implementations operations that
do not change packet segment structure will be more efficient than those that
do. To account for this, APIs that potentially involve a change in packet
segmentation always take an output *odp_packet_t* parameter or return
value. Applications are expected to use this new handle for the resulting
packet instead of the old (input) handle as the implementation may have
returned a new handle that now represents the transformed packet.

To enable applications that manipulate packets this way to operate most
efficiently the return codes from these APIs follow a standard convention. As
usual, return codes less than zero indicate error and result in no change to
the input packet. A return code of zero indicates success, but also indicates
that any cached addressability to the packet is still valid. Return codes
greater than zero also indicate success but with a potential change to packet
addressability. For example, if an application had previously obtained
addressability to a packet's Layer 3 header via the `odp_packet_l3_ptr()` API,
a return code of zero would mean that the application may continue to use that
pointer for access to the L3 header, while a return code greater than zero
would mean that the application should reissue that call to re-obtain
addressability as the packet segmentation may have changed and hence the old
pointer may no longer be valid.

==== Packet Copying
One of the simplest manipulations that can be done is to make a copy of all or
part of a packet. The `odp_packet_copy()` and `odp_packet_copy_part()` APIs
are used to return a new packet that contains either the entirety or a
selected part of an existing packet. Note that these operations also specify
the packet pool from which the new packet is to be drawn.

==== Packet Data Copying and Moving
ODP provides several APIs to enable portions of a packet to be copied
either to or from a memory area, another packet, or within a single packet, as
illustrated below:

.ODP Packet Data Copying and Moving Operations
image::packet-copyops.svg[align="center"]

These APIs provide bounds checking when the source or destination is an ODP
packet. This means that data must be in the offset range
`0`..`odp_packet_len()-1`. For operations involving memory areas,
the caller takes responsibility for ensuring that memory areas
referenced by `odp_packet_copy_to/from_mem()` are valid.

When manipulating data within a single packet, two similar APIs are provided:
`odp_packet_copy_data()` and `odp_packet_move_data()`. Of these, the move
operation is more general and may be used even when the source and destination
data areas overlap. The copy operation must only be used if the caller knows
that the two areas do not overlap, and may result in more efficient operation.
When dealing with overlapping memory areas, `odp_packet_move_data()` operates
as if the source area was first copied to a non-overlapping separate memory
area and then copied from that area to the destination area.

==== Adding and Removing Packet Data
The various copy/move operations discussed so far only affect the data
contained in a packet do not change its length. Data can also be added to
or removed from a packet via the `odp_packet_add_data()` and
`odp_packet_rem_data()` APIs as shown below:

.Adding Data to a Packet
image::packet-adddata.svg[align="center"]

Adding data simply creates the requested amount of "space" within the packet
at the specified offset. The length of the packet is increased by the number
of added bytes. The contents of this space upon successful completion
of the operation is unspecified. It is the application's responsibility to then
fill this space with meaningful data, _e.g.,_ via a subsequent
`odp_packet_copy_from_mem()` or `odp_packet_copy_from_pkt()` call.

.Removing Data from a Packet
image::packet-remdata.svg[align="center"]

Removing data from a packet has the opposite effect. The specified number of
bytes at the designated offset are removed from the packet and the resulting
"hole" is collapsed so that the remainder of the packet immediately follows
the removal point. The resulting packet length is decreased by the number of
removed bytes.

Note that adding or removing data from a packet may affect packet segmentation,
so the application must use the returned packet handle and abide by the
return code results of the operation.  Whether or not segmentation is
changed by these operations, the amount of available packet headroom and/or
tailroom may also be changed by these operations, so again applications should
not attempt to cache the results of prior `odp_packet_headroom()` or
`odp_packet_tailroom()` calls across these APIs.

==== Packet Splitting and Concatenation
Another type of manipulation is to split a packet into two packets as shown
below:

.Splitting a Packet
image::packet-split.svg[align="center"]

The `odp_packet_split()` API indicates the split point by specifying the
resulting desired length of the original packet.  Upon return, the original
packet ends at the specified split point and the new "tail" is returned as
its own separate packet. Note that this new packet will always be from the same
packet pool as the original packet.

The opposite operation is performed by the `odp_packet_concat()` API. This API
takes a destination and source packet as arguments and the result is that
the source packet is concatenated to the destination packet and ceases to
have any separate identity. Note that it is legal to concatenate a packet to
itself, in which case the result is a packet with double the length of the
original packet.

==== Packet Realignment
As previously discussed, packets are divided into implementation-defined
segments that normally don't concern applications since contiguous
addressability extents are returned as part of APIs such as
`odp_packet_offset()`. However, if the application has performed a lot of
manipulation or processing on a packet, this can sometimes result in segment
boundaries appearing at inconvenient locations, such as in the middle of
headers or individual fields, or for headers to become misaligned with respect
to their addresses in memory. This can make subsequent processing of the
packet inefficient.

To address these issues, ODP provides a means of realigning a packet to allow
for more efficient processing as shown below:

.Packet Realignment
image::packet-align.svg[align="center"]

Input to `odp_packet_align()` specifies the number of contiguous bytes that
are needed at a given packet offset as well as the memory alignment required
for that offset. A value of zero may be specified for either as a "don't care"
value. If these criteria are already satisfied then the call is an effective
no-op and will result in a return code of zero to tell the caller that all is
well. Otherwise, the packet will be logically "shifted" within its containing
segment(s) to achieve the requested addressability and alignment constraints,
if possible, and a return code greater than zero will result.

The requested operation may fail for a number of reasons. For example, if the
caller is requesting contiguous addressability to a portion of the packet
larger than the underlying segment size. The call may also fail if the
requested alignment is too high. Alignment limits will vary among different ODP
implementations, however ODP requires that all implementations support
requested alignments of at least 32 bytes.

=== Packet References
To support efficient multicast, retransmit, and related processing, ODP
supports two additional types of packet manipulation: static and dynamic
_references_. A reference is a lightweight mechanism for
creating aliases to packets as well as to create packets that share data bytes
with other packets to avoid unnecessary data copying.

==== Static References
The simplest type of reference is the _static reference_. A static reference is
created by the call:

[source,c]
-----
ref_pkt = odp_packet_ref_static(pkt);
-----

If the reference fails, `ODP_PACKET_INVALID` is returned and `pkt`
remains unchanged.

The effect of this call is shown below:

.Static Packet Reference
image::refstatic.svg[align="center"]

A static reference provides a simple and efficient means of creating an alias
for a packet handle that prevents the packet itself from being freed until all
references to it have been released via `odp_packet_free()` calls. This is
useful, for example, to support retransmission processing, since as part of
packet TX processing, `odp_pktout_send()` or `odp_tm_enq()` will free
the packet after it has been transmitted.

`odp_packet_ref_static()` might be used in a transmit routine wrapper
function like:

[source,c]
-----
int xmit_pkt(odp_pktout_queue_t queue, odp_packet_t pkt)
{
	odp_packet_t ref = odp_packet_ref_static(pkt);
	return ref == ODP_PACKET_INVALID ? -1 : odp_pktout_send(queue, ref, 1);
}
-----

This transmits a reference to `pkt` so that `pkt` is retained by the caller,
which means that the caller is free to retransmit it if needed at a later
time. When a higher level protocol (_e.g.,_ receipt of a TCP ACK packet)
confirms that the transmission was successful, `pkt` can then be discarded via
an `odp_packet_free()` call.

The key characteristic of a static reference is that because there are
multiple independent handles that refer to the same packet, the caller should
treat the packet as read only following the creation of a static reference
until all other references to it are freed. This is because all static
references are simply aliases of the same packet, so if multiple threads were
independently manipulating the packet this would lead to unpredictable race
conditions.

To assist in determining whether there are other references to a packet, ODP
provides the API:

[source,c]
-----
int odp_packet_has_ref(odp_packet_t pkt);
-----

that indicates whether other packets exist that share bytes with this
packet. If this routine returns 0 then the caller can be assured that it is
safe to modify it as this handle is the only reference to the packet.

==== Dynamic References
While static references are convenient and efficient, they are limited by the
need to be treated as read only. For example, consider an application that
needs to _multicast_ a packet. Here the same packet needs to be sent to two or
more different destinations. While the packet payload may be the same, each
sent copy of the packet requires its own unique header to specify the
destination that is to receive the packet.

To address this need, ODP provides _dynamic references_. These are created
by the call:

[source,c]
-----
ref_pkt = odp_packet_ref(pkt, offset);
-----

The `offset` parameter specifies the byte offset into `pkt` at which the
reference is to begin. This must be in the range
0..`odp_packet_len(pkt)`-1. As before, if the reference is unable to be
created `ODP_PACKET_INVALID` is returned and `pkt` is unchanged, otherwise the
result is as shown below:

.Dynamic Packet Reference
image::ref.svg[align="center"]

Following a successful reference creation, the bytes of `pkt` beginning at
offset `offset` are shared with the created reference. These bytes should be
treated as read only since multiple references point to them. Each reference,
however still retains its own individual headroom and metadata that is not
shared with any other reference. This allows unique headers to be created by
calling `odp_packet_push_head()` or `odp_packet_extend_head()` on either
handle. This allows multiple references to the same packet to prefix unique
headers onto common shared data it so that they can be properly multicast
using code such as:

[source,c]
-----
int pkt_fanout(odp_packet_t payload, odp_queue_t fanout_queue[], int num_queues)
{
	int i;

	for (i = 0, i < num_queues, i++)
		odp_queue_enq(fanout_queue[i], odp_packet_ref(payload, 0));
}
-----

Receiver worker threads can then operate on each reference to the packet in
parallel to prefix a unique transmit header onto it and send it out.

==== Dynamic References with Headers
The dynamic references discussed so far have one drawback in that the headers
needed to make each reference unique must be constructed individually after
the reference is created. To address this problem, ODP allows these headers
to be created in advance and then simply prefixed to a base packet as part
of reference creation:

[source,c]
-----
ref_pkt = odp_packet_ref_pkt(pkt, offset, hdr_pkt);
-----

Here rather than creating a reference with a null header, a _header packet_
is supplied that is prefixed onto the reference. The result looks like this:

.Packet Reference using a Header Packet
image::refpktsingle.svg[align="center"]

So now multicasting can be more efficient using code such as:

[source,c]
-----
int pkt_fanout_hdr(odp_packet_t payload, odp_queue_q fanout_queue[],
		   odp_packet_t hdr[], int num_queues)
{
	int i;

	for (i = 0; i < num_queues, i++)
		odp_queue_enq(fanout_queue[i],
			      odp_packet_ref_pkt(payload, 0, hdr[i]));
}
-----

Now each individual reference has its own header already prefixed to
it ready for transmission.

Note that when multiple references like this are made they can each have
their own offset. So if the following code is executed:

[source,c]
-----
ref_pkt1 = odp_packet_ref_pkt(pkt, offset1, hdr_pkt1);
ref_pkt2 = odp_packet_ref_pkt(pkt, offset2, hdr_pkt2);
-----

the result will look like:

image::refpkt1.svg[align="center"]
image::refpktmulti.svg[align="center"]
.Multiple Packet References with Different Offsets
image::refpkt2.svg[align="center"]

Here two separate header packets are prefixed onto the same shared packet, each
at their own specified offset, which may or may not be the same. The result is
three packets visible to the application:

* The original `pkt`, which can still be accessed and manipulated directly.
* The first reference, which consists of `hdr_pkt1` followed by bytes
contained in `pkt` starting at `offset1`.
* The second reference, which consists of `hdr_pkt2` followed by bytes
contained in `pkt` starting at `offset2`.

Only a single copy of the bytes in `pkt` that are common to the
references exist.

===== Data Sharing with References
Because a `pkt` is a shared object when referenced, applications must observe
certain disciplines when working with them. For best portability and
reliability, the shared data contained in any packet referred to by references
should be treated as read only once it has been successfully referenced until
it is known that all references to it have been freed.

To assist applications in working with references, ODP provides the additional
API:

[source,c]
-----
int odp_packet_has_ref(odp_packet_t pkt);
-----
The `odp_packet_has_ref()` API says whether any other packets
exist that share any bytes with this packet.

===== Compound References
Note that architecturally ODP does not limit referencing and so it is possible
that a reference may be used as a basis for creating another reference. The
result is a _compound reference_ that should still behave as any other
reference.

As noted earlier, the intent behind references is that they are lightweight
objects that can be implemented without requiring data copies. The existence
of compound references may complicate this goal for some implementations. As a
result, implementations are always free to perform partial or full copies of
packets as part of any reference creation call.

Note also that a packet may not reference itself, nor may circular reference
relationships be formed, _e.g.,_ packet A is used as a header for a reference
to packet B and B is used as a header for a reference to packet A.  Results
are undefined if such circular references are attempted.

=== Packet Parsing, Checksum Processing, and Overrides
Packet parsing is normally triggered automatically as part of packet RX
processing. However, the application can trigger parsing explicitly via the
API:
[source,c]
-----
int odp_packet_parse(odp_packet_t pkt, uint32_t offset,
		     const odp_packet_parse_param_t *param);
-----
This is typically done following packet decapsulation or other preprocessing
that would prevent RX parsing from "seeing" the relevant portion of the
packet. The `odp_packet_parse_param_t` struct that is passed to control the
depth of the desired parse, as well as whether checksum validation should be
performed as part of the parse, and if so which checksums require this
processing.

Packets containing Layer 3 (IPv4) and Layer 4 (TCP, UDP, SCTP) checksums
can have these validated (on RX) and generated (on TX) automatically.
This is normally controlled by the settings on the PktIOs that
receive/transmit them, however they can also be controlled on an
individual packet basis.

Packets have associated `odp_packet_chksum_status_t` metadata that indicates
the state any checksums contained in that packet. These can be queried via
the APIs `odp_packet_l3_chksum_status()` and `odp_packet_l4_chksum_status()`,
respectively. Checksums can either be known good, known bad, or unknown, where
unknown means that checksum validation processing has not occurred or the
attempt to validate the checksum failed.

Similarly, the `odp_packet_l3_chksum_insert()` and
`odp_packet_l4_chksum_insert()` APIs may be used to override default checksum
processing for individual packets prior to transmission. If no explicit
checksum processing is specified for a packet, then any checksum generation
is controlled by the PktIO configuration of the interface used to transmit it.