A robust quasi-automated
"Agent to HQ" communication system
Abstract
Via a series of
(text or binary) to "packized binary stream" pre-processors (all
contained within a Java applet) it is possible to securely encode
short messages (of normalized single page text density) for Shortwave
or possibly even Earth Moon Earth transmission.
The final coded output file must be in an audio file format, or if the
operator so chooses -- text suitable for Morse Code transmission or
higher speed Radioteletype (RTTY).
It is assumed that the HQ would
have a similar Java based decoding system, coupled with agent
management utilities in a unified Java applet on a series of dedicated
systems. A separably linked testing environment for the system [also
written in Java, and using the same codebase] must also exist so that
the system can be optimized over time. These details are left to the
implementer. Ultimately, this is a low complexity system -- steps must
be taken at each point to reduce complexity.
Long transmission paths with high
bit loss are assumed as the default transmission environment. The
message must have many internal levels or redundancy, so that when the
HQ intercepts the weak signals they can be decoded with reasonable
computing power once the bitstream is resolved.
Such encoded messages should be
able to survive damaging bit and symbol loss over long shortwave paths
or Earth Moon Earth links and yet not be overtly or covertly complex
for end user.
System Overview
Statistically most Agent to HQ reports are going to be under 6 KB about
95% of the time.
During World War II (1939-1945) most Agent messages were under 1 KB per
message (based on 5 bit representations of {0...9 and A....Z}).
The 1 kb limit was equally true in Western Europe as it was in Asia, due
to intercept avoidance requirements. Historically, any Agent to HQ message
will be under 4 kb -- except under rare circumstances where long
information requests may need to be met by the agent.
RULE : Agent to HQ messages are always going to be reasonably short, but
contain dense information.
Ergo, any transmission system for Agent
to HQ messaging has to be robust enough to survive a bare minimum of 20%
data loss. However the message should also be compact enough to be
transmitted in a short time frame.
CODEBOOK COMPRESSION is practically
obligatory, but using traditional agent codebooks should be avoided. The
"Traditional codebook" adds a layer of complexity that the Agent must
cope with, and may [in the end] not increase message security
significantly.
Any encoding and transmission scheme must allow the agent to fully encode
their message without overt worries about space requirements. Forcing
agents to compress their reports may lead to important information being
omitted, a practice that should be avoided.
- Asian languages when encoded in HTML 4 or HTML 5 (using UTF 8 or
UTF 16) don't have byte usage statics that are beyond 0.25 Standard
Deviations from any Western or Eastern European language.
- From a design perspective, language coding agnosticism is a
requirement. Unicode is key.
Message encoding and decoding
issues
You have to assume that HQ will always have better antennas and signal
processing than the agent.
With modern computers and staffing requirements this should only be
assumed to be a 2x to 4x signal processing advantage to the HQ.
As long as HQ has access to 3+ separate antenna fields or ancillary
reception systems (via Embassies, for example) then this transmission
system will not be maximally stressed.
The message encoding path
- Check file size. Don't encode messages over 65 KB!
- Create header based on agent ID and file information to be
transmitted.
- Accept HTML or TXT file as input. If images or files or file systems
must be transmitted to HQ, they must go via a different path.
- If HTML, filter out all extraneous HTML coding. Most HTML word
processing applications produce about 20% disposable code. About 80% of
HTML syntax is disposable, except where core coding of text is needed.
Permitting underline, italics, bold; superscript, subscript; acronym;
... and UTF-8 or UTF-16 encoding for Asian languages is OK.
- HTML ultimately must be rendered to Text to keep the pre-processing
pipeline clean.
- For Text, filter out all but printable chars + space + cr/lf ... but
allow for HTML rendering compatibility.
- Compress text via ZIP algorithm, using any of the older less optimal
zip encodings. Optimal zip or other encodings don't actually help
compression much for small messages, so don't use them. Only 3 levels of
lesser ZIP compression are needed.
- You now have a compressed ZIP file, ready for encryption. Use all of
the ZIP file core facilities and none of the frills. The ZIP file can be
a security asset or a burden. It is an open technology.
- Encrypt ZIP file using 2 separate encryption algorithms and keys, for
diplomats (or even amateur radio people) one encryption step is good
enough. I prefer AES128 be used, as it is more than good enough.
However, using DES40 before or after the hard encryption step is a good
design compromise. There is not going to be a lot of intercept material,
so don't obsesses over it.
- The ZIP file should be converted to a uniform SREC data stream. It
should exist briefly as an ".S19" file.
- Packetize the message. I recommend using 56 byte CCSDS Packets (and
256 byte Frames), as the standard is well known. The CCSDS packet
parameters (56/256) are approximately the lowest the standard will go.
You will have to add the standard CCSDS synchronization groups to the
data stream, but this only increases the message size a small amount. I
recommend using 32 bit synchronization words. CCSDS packets also have
sequence numbers, so I recommend using them but with custom (agent
specific) sequences -- but the rule of always or decreasing the sequence
number must be obeyed. Error correction must now be added, as CCSDS
supports some of the best known Error Correction Codes.
- CCSDS Voyager Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Cassini Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Modified Galileo Code (concatenated Reed-Solomon-Viterbi)
- CCSDS Turbo Codes (for future use)
- The ZIP file should now be saved as an encoded CCSDS binary file.
- Optional packetization : If CCSDS paketization is not deemed adequate
or suitable, then consider using
- Modified XMODEM packets windowed with YMODEM or ZMODEM
- Custom packet formats, but here I recommend using modified forms of
existing packet systems like XMODEM
- Convert the message to an audio file, possibly inserting a tuning
signal of at least 10 seconds.
- Typical audio file parameters should be
- Input Audio (Audio Level : 4, CW Tone : 3500 Hz [-/+ 300 Hz])
- Filter (250 Hz around CW Tone)
- Audio File (Coding : Mono (single audio channel); Sample rate :
22050 Hz, Bits : 8)
What the encoding application needs to know (draft, fairly long for real
world application)
- UTC Day, Day of Year
- Message Size, Message Type (HTML, TEXT)
- The Agent Day Key (either from a data file, or derived via a Hash
function)
- The Data Whitener Key (generally changed 4x per year, no user input)
- CCSDS Packet and Frame encoding parameters (fixed, based on data
size)
- 2 or 3 encryption algorithms
- how to write the final coded message in up to 3 audio file formats
Codegroup encoding-decoding
algorithm
For decades spies, military
departments like Navies and Armies was well as Security Intelligence
services have written their encoded messages in groups of five letters.
Codegroup encodes any file into
this form, allowing it to be transmitted through any medium, and decodes
files containing codegroups into the original input.
Encoded files contain a 16-bit
cyclical redundancy check (CRC) and file size to verify, when decoded,
that the message is complete and correct. Files being decoded may contain
other information before and after the codegroups, allowing in-the-clear
annotations to be included.
Codegroup makes no attempt, on
its own, to prevent your message from being read.
Cryptographic security should be
delegated outside the low level Morse Code or RTTY coding of the message.
Codegroup can then be applied to
the encrypted binary output, transforming it into easily transmitted text.
Text created
by codegroup uses only upper case ASCII letters and spaces.
Unlike files encoded with uuencode or pgp's “ASCII armour”
facility, the output of codegroup can be easily (albeit tediously) read
over the telephone, broadcast by shortwave radio to agents in the field,
or sent by telegram, telex, or Morse code.
To illustrate the difference, here are the first few lines of a binary file
encoded by:
- base64:
- H4sICFJ9MzYAA2EudGFyAOxba3faSNKer+lf0SezO3YmgLnY2I6TyQIGgwOGBTtOYjuJEMJo
DJJGF1+ys//9rarulpqLHRi/mdk9G84JIKGuqq579eNkNn745q9sNru9tcXhs5gtFPAzm83l
xad88WyxmNssbhe3sps8m8ttZ/M/8K1vL9oPP0RBaPggypU1vrad+59zosj0HqAj9xF//pe8
WsaVNbTH1rfkAfoobm7ea//cZn4rtv/mNtq/kM9t/cCz31Io9foftz9nnW77oMdfcdMdWJe+
- uuencode:
- begin
644 data.bin
M'XL("&7._RVUO;V/9U+FN2XSF3G6H5OA1(?HOB<=/<7__X7TN<PJ[L&
M=?-&1;I+)B80;P?_Z'?WY_-=7Q"T_JSZ_6)X9?&"$OU9[N'A[A%^L^6=
M?^M[OOV+:9=UM9J^]MAS_;X0O]U];(Z?<WWE9_[/]ZMMOO[CG'^2MM
M_G(+,US/LWKZE1#C^YO?D_;O#G[7][2R^+0>XJ^&PI/[?7-7U]KU=]SSWQ?
- pgp:
- -----BEGIN PGP MESSAGE-----
Version: 2.6.2i
hIwCCb8iTku3pBUBA/9oSDlfk/On9bwjmTnB98Eejr6agkPSi3n6hd8JkAtJd33f
kzFq18Jo0xzRUWZ7Di6Jq/FXpeI1yztVDqispbcYOP0aDv4JZOSF1kRsmJ9xK9Bo
Cv4a967IXPkkRsjIAkx0B39dYxCzf8kHUn4THmyV/b2qLUZ0cc+mr8hxFfFpuYSM
- codegroup:
- ZZZZZ YBPIL AIAIG FMOPP CPAAA DGNGP
GPGPA ADNJN ELJKO ELIMO
GEOHF KIFGP IFBCB PKCPI YJMHE PHBHP PPOBH NCOHD AKLLL AGHFP
DEGEF LKELC EAIJI ABAGP AHPPO IHHPH OHPDF YNFPB ALEPO KMPKP
NGCHI GFPBI CBDML PFGHL LIHPC BOOBB HOLDO FJNHP OLHLL OPNIL
Only Codegroup conforms to
the telegraphic convention of all upper case letters, and passes the
“telephone test” of being readable without any modifiers such as “capital”
and “lower-case”.
Avoiding punctuation marks and
lower case letters makes the output of codegroup much easier to
transmit over a voice or traditional telegraphic link.
Known
defects
Codegroup's current major defect
is that there is no extensible Java encode-decode library for it.
Theoretically one could abandon CCSDS ECC packet coding if the Codegroup
mechanism supported it.
Codegroup does not support a fully
declared status header or an optimal end-of-file mechanism.
Codegroup may not fully support
UTF-8 encoding of the file name, up to 200 chars long.
Codegrop may not fully support
encoding File Mode, per POSTX : '000' ... '777'
Codegroup really should use a
stronger checksum like CRC-32K (Koopman) :
x32 + x30 + x29 + x28 + x26 + x20 + x19 + x17 + x16 + x15 + x11 + x10 + x7 + x6 + x4 + x2 + x +
1
Codegroup
should also use a strong hash function like MD5 or SHA-1.
Technical references
Compression path
- HTML or TEXT originating content.
- Compress into a ZIP file (compression takes place here; no extra ZIP
file format features should be used)
- BIN encode the file (encryption takes place here, no change in byte
size vs ZIP file)
- SREC packetize the file (limited scale fixed format packetization
takes place here, partly an error correction step)
- CCSDS encode the file (fixed format packetization, then ".ccsds" file
format)
- Encode with "Codegroup"; for RTTY or Morse Code transmission OTHERWISE
save as text file
- For Morse / RTTY transmission : Render to audio file, at or
beyond 95 WPM for Morse Code. Render to RTTY audio file using MT63 in
the following modes (500 hz @ High Interleave, 1 k @ Low Interleave, 2 k
@ Low Interleave). Alternate ancillary RTTY recommendations are under
consideration.
- Play the rendered audio file back over a shortwave path or alternately
a VHF/UHF link or a microwave Earth Moon Earth link.
Modified CCSDS ECC Packet Frame Overhead
Message length
|
{4.5 Chars/Word}
|
56 Byte Packets
|
1000 Byte Frames
|
503 Byte Frames
|
391 Byte Frames
|
512 bytes
|
113
|
512/56 = 10
|
1 (not advised)
|
2 (not advised) |
2
|
1 KB
|
227
|
1024/56 = 18
|
2 (not advised)
|
3
|
3
|
2 KB
|
455
|
2048/56 = 36
|
3 (not advised)
|
5
|
6
|
4 KB
|
910
|
4096/56 = 73
|
5
|
9
|
11
|
8 KB
|
1820
|
8192/56 = 146
|
9
|
17
|
21
|
16 KB
|
3640
|
16348/56 = 292
|
16
|
33
|
Do not use
|
Table suggests that the
Frame sizes that were initially suggested are too large. The view must be
taken that Frames should be (391, 503, 1000) bytes, for fixed 56 byte
packets.
This gives a Frame size range from ~7x to
~11x, but on a per message basis it is fixed. It is assumed that
terminating frames will have a minor syntax change to indicate their
length will not be the usual fixed length.
For messages under 128 bytes, none of this
paketization has any logic -- so an alternate short message service
encoding format must be considered, unless it is deemed that null
characters or groups should be used to bloat the message to keep its
relative size fixed. It is a 'best practice' for the sending agent provide
1 kb of material minimum in any message anyway. The implications of the 1
kb of material recommendation is 50 kb of traffic per year per agent.
CCSDS Framed Packets
recommendation
CCSDS is unique in that it separates Data Packets from Error Correction
Packets in the time domain.
CCSDS Packets (but not Frames) offer packet by packet choice on error
correction scheme. CCSDS Frames have their own limited error correction and
detection system, allowing for packet errors to be localized and corrected
in most cases. The current CCSDS packet system allows for 3 or 4 different
kinds of error correction to be used -- based on traffic type, urgency and
need of error correction.
If any other packet protocol is used, like XMODEM / YMODEM / ZMODEM, Kermit,
SEAlink etc ... then all the error correction information must be sent after
the binary file is sent. This requires several garbage packets (at least 2)
to be sent as spacers in the traffic so as to decrease confusion as to if
there is data or error correction content. This limits the flexibility of
choosing error correction schemes, but for some users this may be just as
acceptable.
Technical references -- onward
research
Physical link
Compressed files
- ZIP (file
format) (uses internal ZIP-64 encoding), but the 7zip and gzip
(gz) compression formats should also be available for use.
- Rzip (optional if ZIP
file not satisfactory)
- Gzip (optional if ZIP
file not satisfactory)
- After the compressed file is encrypted outside the encryption standard
of the file format, it should be error corrected and that.
File re-encoding for the binary ZIP file -- no compression used here
- SREC
(file format) [has ".S19" file format]
- ZIP to S19 allows for a lot of extra checksums to be added before
CCSDS encoding. CCSDS encoding may only require using Voyager Codes as
the extra checksums could prove useful in the message decoding process.
Cassini or Turbo Codes should only be used during the worst of predicted
ionospheric conditions.
Asian language issues
- Chinese Telegraph Code (CTC),
renders Chinese ideograms to telegraph friendly numbers.
- Big-5 or Big5 (wiki, HKSCS [Hong Kong
Supplementary Character Set] variant) is a Chinese character encoding method used
in Taiwan, Hong
Kong, and Macau for Traditional
Chinese characters.
- Mainland
China, which uses Simplified
Chinese Characters, uses the GB character
set instead.
- UTF (Unicode)
namely UTF8 & UTF16 are better
solutions for modern computing in comparison to Big5 or GB.
- Unicode’s Universal Character Set (UCS) has a potential capacity to
support over 1 million characters. Each UCS character is mapped to a
code point, which is an integer between 0 and 1,114,111, used to
represent each character within the internal logic of text processing
software (1,114,112 = 220 + 216 or 17 × 216, or hexadecimal 110000 code
points).
- As of Unicode 6.2, released in September 2012, 249,764 (22.4%) of
these code points are assigned, including 110,182 (9.9%) encoded
characters, 137,468 (12.3%) reserved for private use, 2,048 for
surrogates, and 66 designated noncharacters, leaving 864,348 (77.6%)
unassigned.
Error correction
Message encoding
Cryptography
Java encoding application
|
Created by
|
|
Initial idea
|
|
Document created
|
|
Latest revision
|
|
Version
|
|
Document Revision State
|
|
|
Max Power |
|
15 June 2008 |
|
17 March 2010 |
|
22 May 2014 (appearance) |
|
0.46a
|
|
Developmental
|
|