A
robust quasi-automated "Agent to HQ" communication system
Abstract
Via a series of
"text to packized binary stream" pre-processors (all
contained within a Java applet) it is possible to
securely encode short messages (of normalized single page text density)
for Shortwave or Earth Moon Earth transmission. The final coded output
file must be in an
audio file format, or if the operator so chooses -- text suitable for
Morse Code transmission or
higher speed Radioteletype (RTTY).
It is assumed that the HQ would
have a similar Java based decoding
system, coupled with agent management utilities in a unified Java
applet on a series of dedicated systems. A separably linked testing
environment for the system [also written in Java, and using the same
codebase] must also exist so that the system can be optimized over
time. These details are left to the implimentor. Ultimately, this is a
low complexity system -- steps must be taken at each point to reduce
complexity.
Long transmission paths with high
bit loss are assumed as the default
transmission environment. The message must have many internal levels or
redundancy, so that when the HQ intercepts the weak signals they can be
decoded with reasonable computing power once the bitstream is resolved.
Such
encoded messages should be able to survive damaging bit and
symbol loss over long shortwave paths or Earth Moon Earth links and yet
not be overtly or covertly complex for end user.
System Overview
Statistically most Agent to HQ reports are going to be under 6 KB about
95% of the time. During the WW II (1939-1945) most Agent messages were
under 1 KB per message (based on 5 bit representations of {0...9 and
A....Z}). The 1 kb limit was equally true in Western Europe as it was
in Asia, due to intercept avoidance requirements. Historically, any
Agent to HQ message will be under 2 kb --
except under rare circumstances where long information requests need to
be met by the agent.
RULE : Agent to HQ messages are always going to be reasonably short,
but contain dense information.
Ergo, any
transmission system for
Agent to HQ messaging has to be robust enough to survive a bare minimum
of 20% data loss. However the message should also be compact enough to
be transmitted in a short time frame.
CODEBOOK
COMPRESSION is practically obligatory, but using traditional agent
codebooks should be avoided. The Traditioanla codebook adds a layer of
complexity that the Agent must cope with, and may
[in the end] not increase message security significantly.
Any encoding and transmission scheme must allow the agent to fully
encode their message without overt worries about space requirements.
Forcing agents to compress their reports may lead to important
information being omitted, a practice that should be avoided.
- Asian languages when encoded in HTML 4 or HTML 5 (using UTF
8 or UTF 16)
don't have byte usage statics that are beyond 0.25 Standard Deviations
from any Western or Eastern European language.
- From a design perspective, language coding agnosticism is a
requirement. Unicode is key.
Message encoding and
decoding
issues
You have to assume that HQ will always have better antennas and signal
processing than the agent.
With modern computers and staffing
requirements this should only be assumed to be a 2x to 4x signal
processing advantage to the HQ.
As long as HQ has access to 3+ separate
antenna fields or ancillary reception systems (via Embassies, for
example) then
this transmission system will not be maximally stressed.
The message encoding path
- Check file size. Don't encode messages over 65 KB!
- Create header based on agent ID and file information to be
transmitted.
- Accept HTML or TXT file as input. If images or files or file
systems must be transmitted to HQ, they must go via a different path.
- If HTML, filter out all extraneous HTML coding. Most HTML word
processing applications produce about 20% disposable code. About 80% of
HTML syntax is disposable, except where core coding of text is needed.
Permitting underline, italics, bold; superscript, subscript; acronym;
... and UTF-8 or UTF-16 encoding for Asian languages is OK.
- HTML ultimately must be rendered to Text to keep the
pre-processing pipeline clean.
- For Text, filter out all but printable chars + space + cr/lf ...
but allow for HTML rendering compatibility.
- Compress text via ZIP algorithm, using any of the older less
optimal zip encodings. Optimal zip or other encocodings don't actually
help compression much for small messages, so don't use them. Only 3
levels of lesser ZIP compression are needed.
- You now have a compressed ZIP file, ready for encryption. Use all
of the ZIP file core facilities and none of the frills. The ZIP file
can be a security asset or a burden. It is an open technology.
- Encrypt ZIP file using 2 separate encryption algorithms and keys,
for diplomats (or even amateur radio people) one encryption step is
good enough. I prefer AES128 be used, as it is more than good enough.
However, using DES40 before or after the hard encryption step is a good
design compromise. There is not going to be a lot of intercept
material, so don't obsesses over it.
- The ZIP file should be converted to a uniform SREC data stream.
It should exist briefly as an ".S19" file.
- Packetize the message. I recommend using 56 byte CCSDS Packets
(and 256 byte Frames), as the standard is well known. The CCSDS packet
parameters (56/256) are approximately the lowest the standard will go.
You will have to add the standard CCSDS synchronization groups to the
data stream, but this only increases the message size a small amount. I
recommend using 32 bit synchronization words. CCSDS packets also have
sequence numbers, so I recommend using them but with custom (agent
specific) sequences -- but the rule of always or decreasing the
sequence number must be obeyed. Error correction must now be added, as
CCSDS supports some of the best known Error Correction Codes.
- CCSDS Voyager Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Cassini Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Modified Galileo Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Turbo Codes (for future use)
- The ZIP file should now be saved as an encoded CCSDS binary
file.
- Optional packetization : If CCSDS paketization is not deemed
adequate or suitable, then consider using
- Modified XMODEM packets windowed with YMODEM or ZMODEM
- Custom packet formats, but here I recommend using modified
forms of existing packet systems like XMODEM
* Convert the message to an audio file, possibly
inserting a tuning signal of at least 10 seconds. Typical audio file
parameters should be
o Input Audio
(Audio Level : 4, CW Tone : 3500 Hz [-/+ 300 Hz])
o Filter (250 Hz
around CW Tone)
o Audio File
(Coding : Mono (single audio channel); Sample rate : 22050 Hz, Bits : 8)
What the encoding application
needs to know (draft, fairly long for real world application)
- UTC Day, Day of Year
- Message Size, Message Type (HTML, TEXT)
- The Agent Day Key (either from a data file, or derived via
a
Hash function)
- The Data Whitener Key (generally changed 4x per year, no
user input)
- CCSDS Packet and Frame encoding parameters (fixed, based on
data size)
- 2 or 3 encryption algorithms
- how to write the final coded message in up to 3 audio file
formats
Codegroup
encoding-decoding algorithm
For decades spies, military
departments like Navies and Armies was well as Securirty Intelligence
services have written their encoded messages in groups of five letters.
Codegroup encodes
any file into this form, allowing it to be transmitted through any
medium, and decodes files containing codegroups into the original input.
Encoded files contain
a 16-bit cyclical redundancy check (CRC) and file size to verify, when
decoded, that the message is complete and correct. Files being decoded
may contain other information before and after the codegroups, allowing
in-the-clear annotations to be included.
Codegroup makes no
attempt, on its own, to prevent your message from being read.
Cryptographic security
should be delegated outside the low level Morse Code or RTTY coding of
the message.
Codegroup can then
be applied to the encrypted binary output, transforming it into easily
transmitted text.
Text created
by codegroup uses
only upper case ASCII letters and spaces. Unlike files encoded with
uuencode or pgp's “ASCII armour” facility, the output of
codegroup can be easily (albeit tediously) read over the telephone,
broadcast by shortwave radio to agents in the field, or sent by
telegram, telex, or Morse code.
To illustrate the
difference, here are the first few lines of a binary file encoded by:
- base64:
- H4sICFJ9MzYAA2EudGFyAOxba3faSNKer+lf0SezO3YmgLnY2I6TyQIGgwOGBTtOYjuJEMJo
DJJGF1+ys//9rarulpqLHRi/mdk9G84JIKGuqq579eNkNn745q9sNru9tcXhs5gtFPAzm83l
xad88WyxmNssbhe3sps8m8ttZ/M/8K1vL9oPP0RBaPggypU1vrad+59zosj0HqAj9xF//pe8
WsaVNbTH1rfkAfoobm7ea//cZn4rtv/mNtq/kM9t/cCz31Io9foftz9nnW77oMdfcdMdWJe+
- uuencode:
- begin
644 data.bin
M'XL("&7._RVUO;V/9U+FN2XSF3G6H5OA1(?HOB<=/<7__X7TN<PJ[L&
M=?-&1;I+)B80;P?_Z'?WY_-=7Q"T_JSZ_6)X9?&"$OU9[N'A[A%^L^6=
M?^M[OOV+:9=UM9J^]MAS_;X0O]U];(Z?<WWE9_[/]ZMMOO[CG'^2MM
M_G(+,US/LWKZE1#C^YO?D_;O#G[7][2R^+0>XJ^&PI/[?7-7U]KU=]SSWQ?
- pgp:
- -----BEGIN
PGP MESSAGE-----
Version: 2.6.2i
hIwCCb8iTku3pBUBA/9oSDlfk/On9bwjmTnB98Eejr6agkPSi3n6hd8JkAtJd33f
kzFq18Jo0xzRUWZ7Di6Jq/FXpeI1yztVDqispbcYOP0aDv4JZOSF1kRsmJ9xK9Bo
Cv4a967IXPkkRsjIAkx0B39dYxCzf8kHUn4THmyV/b2qLUZ0cc+mr8hxFfFpuYSM
- codegroup:
- ZZZZZ YBPIL AIAIG FMOPP
CPAAA DGNGP GPGPA ADNJN ELJKO ELIMO
GEOHF KIFGP IFBCB PKCPI YJMHE PHBHP PPOBH NCOHD AKLLL AGHFP
DEGEF LKELC EAIJI ABAGP AHPPO IHHPH OHPDF YNFPB ALEPO KMPKP
NGCHI GFPBI CBDML PFGHL LIHPC BOOBB HOLDO FJNHP OLHLL OPNIL
Only
Codegroup conforms
to the telegraphic convention of all upper case letters, and passes the
“telephone test” of being readable without any modifiers such as
“capital” and “lower-case”.
Avoiding punctuation
marks and lower case letters makes the output
of codegroup much
easier to transmit over a voice or traditional telegraphic link.
Known defects
Codegroup's current
major defect is that there is no extensible Java encode-decode library
for it. Theoretically one could abandon CCSDS ECC packet coding if the
Codegroup mechanism supported it.
Codegroup does not
support a fully declared status header or an optimal end-of-file
mechanism.
Codegroup may not fully
support UTF-8 encoding of the file name, up to 200 chars long.
Codegrop may not fully
support encoding File Mode, per POSTX : '000' ... '777'
Codegroup really should
use a stronger checksum like CRC-32K (Koopman) :
x32 + x30 + x29 + x28 + x26 + x20 + x19 + x17 + x16 + x15 + x11 + x10 + x7 + x6 + x4 + x2 + x + 1
Codegroup
should
also
use
a
strong
hash
function
like
MD5
or
SHA-1.
Technical references
Compression path
- HTML or TEXT originating content.
- Compress into a ZIP file (compression takes place here; no
extra ZIP file
format features should be used)
- BIN encode the file (encryption takes place here, no change
in byte
size vs ZIP file)
- SREC packetize the file (limited scale fixed format
packetization takes place
here, partly an error correction step)
- CCSDS encode the file (fixed format packetization, then
".ccsds" file
format)
- Encode with "Codegroup"; for RTTY or Morse Code
transmission OTHERWISE save as text
file
- For Morse / RTTY transmission : Render
to audio file, at or beyond 95 WPM for Mose Code. Render to RTTY audio
file using MT63 in the following modes (500 hz @ High Interleave, 1 k @
Low Interleave, 2 k @ Low Interleave). Alternate ancillary RTTY
recommendations are under consideration.
- Play the rendered audio file back over a shortwave path or
alternatly a VHF/UHF link or a microwave Earth Moon
Earth link.
Modified CCSDS ECC Packet Frame
Overhead
Message length
|
{4.5 Chars/Word}
|
56 Byte Packets
|
1000 Byte Frames
|
503 Byte Frames
|
391 Byte Frames
|
512 bytes
|
113
|
512/56 = 10
|
1 (not advised)
|
2 (not advised) |
2
|
1 KB
|
227
|
1024/56 = 18
|
2 (not advised)
|
3
|
3
|
2 KB
|
455
|
2048/56 = 36
|
3 (not advised)
|
5
|
6
|
4 KB
|
910
|
4096/56 = 73
|
5
|
9
|
11
|
8 KB
|
1820
|
8192/56 = 146
|
9
|
17
|
21
|
16 KB
|
3640
|
16348/56 = 292
|
16
|
33
|
Do not use
|
Table suggests that
the Frame
sizes that were initially suggested are too large. The view
must be taken that Frames should be (391, 503, 1000) bytes, for fixed
56 byte packets.
This gives a
Frame size range from ~7x to ~11x, but on a per message basis it is
fixed. It is assumed that terminating frames will have a minor syntax
change to indicate their length will not be the usual fixed length.
For
messages under 128 bytes, none of this paketization has any logic -- so
an alternate short message service encoding format must be considered,
unless it is deemed that null characters or groups should be used to
bloat the message to keep its relative size fixed.
It is a 'best practice' for the sending agent provide 1 kb of material
minimum in any message anyway. The implications of the 1 kb of material
recommendation is 50 kb of traffic per year per agent.
CCSDS Framed Packets recommendation
CCSDS is unique in that it separates Data Packets from Error Correction
Packets in the time domain. CCSDS Packets (but not Frames) offer packet
by packet choice on error correction scheme. CCSDS Frames have their
own limited error correction and detection system, allowing for packet
errors to be localized and corrected in most cases. The current CCSDS
packet system allows for 3 or
4 different kinds of error correction to be used -- based on traffic
type, urgency and need of error correction.
If any other packet protocol is used, like XMODEM / YMODEM / ZMODEM,
Kermit, SEAlink etc
... then all the error correction information must be sent after the
binary file is sent. This requires several garbage packets (at least 2)
to be sent as spacers in the traffic so as to decrease confusion as to
if there is data or error correction content. This limits the
flexibility of choosing error correction schemes, but for some users
this may be just as acceptable.
Technical references --
onward research
Physical link
Compressed files
File re-encoding for the binary ZIP file -- no compression used here
- SREC
(file
format)
[has ".S19" file format]
- ZIP to S19 allows for a lot of extra checksums to be added
before CCSDS encoding. CCSDS encoding may only require using Voyager
Codes as the extra checksums could prove useful in the message decoding
process. Cassini or Turbo Codes should only be used during the worst of
predicted ionospheric conditions.
Error correction
Message encoding
Cryptography
Java encoding application
Created by
Max Power
Concept
15 June 2008
Document created
17 March 2010
Document last revised
13 August 2010 (content)