Kinesis Data Streams Destinations

Service Name:	Kinesis Data Streams
ARN Format:	`arn:aws:kinesis:{region}:{account}:stream/{name of stream}`
Style:	Async (Write)
Actions (required Permissions):	`kinesis:PutRecords`
Payload Format:	JSON Packet, one record per packet
Arguments:	`PartitionKeyExpression` (binary range expression, optional), `PartitionKeyFormatter` (string, default: `"hex"`)

Proxylity's integration with Kinesis Data Streams writes batches of packets arriving at your listener as records to your stream. Kinesis Data Streams provides scalable, real-time data ingestion for streaming analytics, machine learning pipelines, and event-driven architectures. UDP Gateway enables you to stream UDP traffic directly into Kinesis without building custom collection infrastructure.

Technical Details

UDP Gateway uses the kinesis:PutRecords API to efficiently write packets to your stream. This batch API allows up to 500 records per request with a 5MB total size limit. UDP Gateway automatically chunks large batches into 500-record sub-batches and sends them concurrently, maximizing throughput while respecting API limits.

Important: While UDP Gateway batches packets for efficient transmission to Kinesis, each Kinesis record contains exactly one packet. When your stream consumers read from the stream, each record data blob will be a single JSON packet object, not an array. This design ensures that individual packets can be processed independently and distributed across shard consumers for parallel processing.

The record data contains the complete request packet with all metadata including source/destination addresses, arrival time, and the UDP payload data. Your stream consumers can process these records using the Kinesis Client Library (KCL), Lambda event source mappings, Kinesis Data Analytics, or any other Kinesis-compatible processing framework.

Partition Keys

Kinesis Data Streams uses partition keys to distribute records across shards. The partition key determines which shard receives each record, directly impacting throughput and ordering guarantees. UDP Gateway provides two approaches for partition key assignment:

Default Behavior: Random Distribution

If you don't specify a PartitionKeyExpression, UDP Gateway generates a random GUID for each packet. This distributes records evenly across all shards, maximizing throughput and parallelism when packet ordering isn't required. This is ideal for high-volume, order-independent workloads like metrics aggregation or statistical sampling.

Dynamic Partition Keys: Controlled Distribution

Set Arguments.PartitionKeyExpression to a binary range expression (e.g., "[0:4]") that extracts bytes from each packet's payload. Records with the same partition key are guaranteed to be processed in order by the same shard consumer. This enables ordered processing per device, session, or transaction while maintaining parallel processing across different partition keys.

Optionally specify Arguments.PartitionKeyFormatter to control how the extracted bytes are converted to a partition key string. Available formatters:

hex (default) - Hexadecimal encoding (e.g., [0xA1, 0xB2] → A1B2)
utf8 - UTF-8 string decoding for text-based identifiers
ascii - ASCII string decoding for 7-bit text
base64 - Base64 encoding for compact binary representation

Choose the formatter that matches your protocol's identifier format. Hexadecimal is typically best for binary protocols, while UTF-8 works well for text-based protocols with string identifiers.

Partition Key Example Configuration (JSON)

{
  "DestinationArn": "arn:aws:kinesis:us-east-1:123456789012:stream/device-data",
  "Arguments": {
    "PartitionKeyExpression": "[0:4]",
    "PartitionKeyFormatter": "hex"
  }
}

This configuration extracts bytes 0-4 from each packet payload as the partition key, formatted as hexadecimal. All packets from the same device (assuming the device ID is in those bytes) flow to the same shard, maintaining per-device ordering while enabling parallel processing across different devices.

Batching and Throughput

UDP Gateway batches records before sending to Kinesis Data Streams, optimizing for throughput and cost efficiency. The PutRecords API supports up to 500 records per request with a 5MB total size limit.

You control batching behavior using the destination's batching configuration:

Batching:
  Count: 1000           # Collect up to 1000 packets
  SizeInMB: 4.0         # Or 4MB of data
  TimeoutInSeconds: 5.0 # Or wait 5 seconds

When any threshold is met, UDP Gateway processes the batch by chunking into 500-record groups and sending concurrently to Kinesis. This approach balances latency (small batches = lower latency) with throughput (larger batches = better efficiency).

Error Handling and Logging

UDP Gateway monitors Kinesis API responses and logs detailed error information when records fail to be written. The PutRecords API returns individual success/failure status for each record in a batch, and UDP Gateway reports the count of failed records along with HTTP status codes.

To capture these errors in CloudWatch Logs, configure the LogGroupName property on your destination. UDP Gateway will write error and warning messages to this log group in your account, making them available for CloudWatch Insights queries, alarms, and dashboards:

LogGroupName: /proxylity/destinations/kinesis-stream

Your destination's IAM role must include logs:CreateLogStream and logs:PutLogEvents permissions for the specified log group.

Failed records are logged but not automatically retried—UDP is a best-effort protocol, and retrying at the gateway level can cause ordering violations or duplicate processing. For critical workloads requiring guaranteed delivery, consider using SQS FIFO queues as a staging layer with Kinesis as a downstream processor.

Best Practices

Batching Configuration: Larger batch sizes improve throughput and reduce per-message overhead. Configure batch size based on your traffic patterns and latency requirements. For high-volume streams, consider Count: 500 to maximize the efficiency of each PutRecords call.
Partition Key Strategy: Use random partitioning (default) for maximum throughput when ordering doesn't matter. Use dynamic partition keys extracted from payloads when you need ordered processing per device, session, or logical group.
Shard Capacity Planning: Each shard supports 1,000 records/second or 1 MB/second for writes. Monitor your stream's write throughput and add shards as needed to handle increasing traffic.
Formatter Selection: Choose hex for binary device IDs, utf8 for text-based identifiers, or base64 for compact representation of longer binary keys.
Permissions: Ensure your IAM role has the kinesis:PutRecords permission for the target stream.
Observability: Enable MetricsEnabled: true to track ingress traffic and configure LogGroupName to capture delivery errors for troubleshooting.

Example Use Cases

Gaming Telemetry: Stream player events for real-time matchmaking, anti-cheat detection, or leaderboards. Partition by player ID to maintain event ordering per player.
IoT Sensor Networks: Process sensor readings for anomaly detection or time-series analysis. Partition by device ID to guarantee ordered processing per sensor.
Network Monitoring: Ingest NetFlow, sFlow, or IPFIX records for traffic analysis. Use random partitioning for maximum throughput when order doesn't matter.
Financial Market Data: Stream market data feeds to trading or reporting systems. Partition by instrument ID to maintain quote and trade ordering.
Syslog Aggregation: Centralize syslog messages from distributed systems. Partition by hostname to maintain log ordering per system.

Downstream Consumers

Kinesis Data Streams serves as a pipeline to numerous AWS services and custom applications:

Kinesis Data Analytics: Run SQL queries on streaming data for real-time aggregations, filtering, or transformations
AWS Lambda: Trigger serverless functions to process each record or micro-batch
Kinesis Data Firehose: Automatically deliver data to S3, Redshift, OpenSearch, or third-party destinations
Amazon EMR: Process streams with Spark, Flink, or other big data frameworks
Custom Applications: Build consumers using the Kinesis Client Library (KCL) or AWS SDKs in any language

Example Code

A complete working example demonstrating both random and partition-based distribution is available in the Kinesis example in our GitHub examples repository. The example includes CloudFormation templates showing how to configure streams with different partition key strategies.