A Secure Communication System

using Factorization via Elliptic Curve Methods






Abstract
A secure communication system using factorization can make possible the sending [low complexity] secure messages over difficult communication links with no loss in overall security. This Elliptic Curve coding system is designed with spies, spy ring supervisors and the diplomatic corps in mind. This is a general specification, not a fixed recommendation.

The output is Morse Code Compatible {"A"..."Z" + "0" ... "9"}, making it suitable for Shortwave (HF) transmission and Earth Moon Earth (EME) transmission modes.

This research is theoretical, as there are many kinks and weak points in this kind of information and telecom technology.

The core requirement is that all of the cryptography run on a 100 MHz to 500 MHz PC with a Java language application.
  • A outgoing plaintext message pre-processor stage will be needed to make message encoding optimal for the narrow bandwidth conditions of potential transmission systems.
  • An outgoing "initial" encrypted message pre-processor (to process the message for transmission) is needed too, so as to allow for better message propagation over bad conditions.
  • The agent must have a simple and memorable key or keys.
  • The application should be able to disguise itself as a ECM factoring application, so something obscure and terse.





Mechanical details, for illustration only...



Letter, Command
or Codebook Group
Suggested
Value
Symbol


A
11 / 27
01


B
12
02


C
13
03


D
14
04


E
15 / 26
05


F
22
06


G
23
07


H
24
08


I
25 / 28
09


J
35
10


K
36
11


L
37 / 10
12


M
38
13


N
39
14


O
41 / 17
15


P
42
16


Q
43
17


R
44
18


S
45
19


T
46
20


U
56 / 81
21


V
57
22


W
58
23


X
59
24


Y
60
25


Z
70
26


Macro / Terminate Macro
71 / 16 / 61
27


Codebook Code Group
89 / 33 / 62
28


1
82 / 90
29


2
83
30


3
84
31


4
85
32


5
86
33


6
87
34


7
88
35


8
94
36


9
95
37


0
96 / 80
38


SPACE (a quasi null)
97 / 63
39


NULL (optional)
99 / 72 / 67
40


Unicode 3 Glyph / Glyph Terminator
74 / 49
41




The above table is optimized for Roman based character sets. Due to the limitations of 2 digits, Cyrillic and similar alphabets mixed with Roman might start to use up the available symbol space.

Roman typically uses 36 core coding symbols, including digits.
  • Cyrillic uses 32 core letters, and the usual 10 numbers (for a total of 42 total symbols) so it will need an expanded table.
  • Asian languages that have phonetic symbols will need a symbol custom set that may possibly including the Roman symbol.
  • Languages that used an expanded Roman character (Vietnamese, Czech, Slovak, Finnish, German, Swedish etc...) set should use a custom table.
  • Unicode 3 Glyphs should be encoded as decimal numbers, but base16 or base32 encoding should be allowed.

Ideally, there should be 90 symbols used (with more duplication and nulls than this table) so as to decrease the quality of each intercept -- if the intercept is worked backward to a stream of 2 digit numbers embedded into a larger number.

Symbol representations should be uneven for cryptographic security reasons. You have to decrease the frequency of representations of the most common symbols, for cryptographic reasons.

The most common characters should have the most 3 representations (12 chars x 3 representations = 36 representations; 36 representations / 90 symbols = 40% symbol 'tax' so to speak). For messages under 2 KB this is very practical, but for messages reaching into the 20 KB range this will start to backfire.

It is okay for at least one symbol to have only one representation, providing it is not a space or a macro or a common character.

Ideally, this table should change automatically in the software about 9 times per year or about once per 40 days from 01 JAN.

Use of the "Null" symbol should be optional, but if used 5% to 10% of the message should be Nulls.

It is assumed that the message pre-processor should randomly allocate symbols, based on a random symbol representation picker. Macros are not affected by this requirement, as they are special irregular symbol groups not subject to this requirement.

What if you want to use 3 digit codes?
Three digit codes have a great deal of viability here, but to some extent one sees similar trade offs vs the recommended two digit codes here

Issue
4 Digits
3 Digits
2 Digits
Difference
Note
Numeric Range 1000 ... 9999
100 ... 999
10 ... 99
899/89 = 10.1x
More
Symbol Density
~3.8
3
2
3/2 = 1.5x; 3.8/3 = 1.27x
Varies
Mix 2, 3 & 4
NO, Macro
NO, Macro
NO, Macro
Macro needed
Method?
Compressibility
Codebook-like
Better than 2
Less as symbols++
Method?

Symbol Code Density
Codebook-like 10.1 better
= NA =
Symbols >> Alphabet

ECM Codebreaking
Codebook-like Easier
Harder
Padding insertion rule
Variable
ECM Coding
Codebook-like Same
Same
Lower Code Densities

Asian language utility
Excellent
Excellent
Poorer
Symbol Code Space
Variable
Change Symbol Space
4x per year
5x per year
40 days
Symbols >> Alphabet







The pre-processor should use the "Macro" and "Terminate Macro" commands to encode Upper Case (CAPS ON), Lower Case (CAPS OFF), Text Formatting Symbols, Cyrillic {ON / OFF}, etc ... using a structure like
  1. Macro Indicator
  2. Macro State = {"A" ... "Z"} + ({"A" ... "Z"} and {"0" ... "9"}) + [optional ({"A" ... "Z"} and {"0" ... "9"})]; preferred formatting states being $# or $#($ | #) or $$#. [$ : Alpha, # : Number]
  3. Macro Data = ({"A" ... "Z"} + {"A" ... "Z"}), allowing for 262  = 676 macro data groups.
  4. Terminate Macro (a syntactical necessity to allow macros to be used at all)
It is up to the HQ end user (not the agent) to determine how macros can or should be used, and how the pre-processor should function. Without the macros, message complexity is greatly reduced -- and phrases, equations and other important cannot be transmitted unless with great difficulty.

With macros, codebook access becomes possible -- so entire dictionaries (typically under 35,000 words) can be used to compress the message with long macros.
  • Short macros (with no data) could be used to implement limited scale trigram word compression.
  • Compression is important to lessen the quality of the intercept.
  • The master dictionaries should be updated every 3 to 6 months. as they can be among other things : an important database of place names.

This current specification allows for [{26} x {36}] + [{26} x {36} x {36}] macros or [936] + [33696] or 34632 macros -- this is more than enough for any possible user. Macros can be used with or without data, it is assumed that short macros can exist without data.





The encoding chain (the decoding chain works in reverse)
  1. Accept raw text (RTF, UTF8-HTML, ASCII) typically under 6 KB. This is to allow for non-Roman character set information to sent.
  2. If message is greater than 6 KB, initialize message fragmentation procedures.
  3. A Secure Header is developed from the inputs "To" + "Subject" + "Message Length" + "Words" + "Filler"
  4. Pre-processor outputs optimized output, with nominal header and version number.
  5. The crypto encoder takes over.
  6. Message compression via macros it initiated, with nominal header and version number.
  7. The message is converted into a stream of message fragments.
  8. Encode message fragments
Within each message fragment
  1. The sequence of raw numbers is broken into 80, 82, 84 ... 90 long number blocks. Odd number blocks are probably acceptable and feasible but ignored here. To keep the computational complexity low, under 100 digits should be used.
  2. Add a beginning and terminating random number, with beginning number not permitted to be "0" for mathematical reasons. Alternately, up to 5 'base-10' and 'base-11' check digits could be inserted at fixed points in the numeric message block to distort the 2 digit pattern.
  3. A final number for the message fragment is known, so save state.
  4. Factor the number using ECM.
  5. Take the ECM factors of the message fragment, and store in a bracketed or delimited format.
  6. Format the message fragments for transmission using a low grade cipher that hides the prime number message stream.





Message fragment mathematical encoding properties
  1. Assuming one is working in pure cipher mode : 84 digits = 82 groups = 41 symbols @ 4.55 char/word ~ 9 words.
  2. In pure cipher mode, one can assume : 84 digits / 9 words ~ 9.33 or about 10% error margin per word.





Sample message : "The Quick Gray Fox Jumped Over The Lazy Brown Dog"
  1. "The Quick" = 46/24/15/49/43/56/25/13/36
  2. Add One Time Pad to message (Optional)
  3. Add initial and terminating numbers :: 7/46/24/15/49/43/56/25/13/36/1; optionally one could add base-10 or base-11 check digits in fixed locations inside the message, in gobs of 3 or 5.
  4. Final form :: 74624154943562513361
  5. Factor :: 74 624 154 943 562 513 361 = 9 x 227 x 3121 x 6263 x 18371 x 101719
  6. If the (huge) number is prime, send it as a single group.
  7. Format for transmission :: AA {9  227  3121  6263  18371  101719} (a nominal message pre-processor will be required)
  8. Optional, use a low density One Time Pad to fuzzy the numbers : AA {9  229  3133  6271  18376  101722}
If the message is to be transmitted via high speed Morse Code, and there is a desire to hide the numeric origin of the message then use the Western Union "Numbers-to-Letters Substitution Table" that was used by nearly all the users of One Time Pad technology during World War II. The Western Union table would have to have added to it "Space" and "Null", and possibly "{" and "}".

Use of the Western Union 'Numbers to Letters' table could allow for a synthetic 3 rotor device to be used to encode each line. This would or could force the agent to remember yet another key parameter. No plugboard or reflector should be used with this rotor system, and the rotor should reset to the "Agent Key" on a line by line basis. It is really up to the designers of the coding system to mull over what to do here as these are just suggestions.

Subjecting the numeric 2 digit code groups of the final plain text encoding to a 'low density' One Time Pad (~80% of "0") is probably a good solution if you don't want to add checksums or initial or final numbers. Another separate low density One Time Pad could be applied to all the prime numbers in the output that are greater than 1 digit long. You don't need a high density One Time Pad to do a lot of cryptographic damage.





What does the the reporting party have to remember?
  • Assuming that the Whirlpool Hash Function could be used to create a One Time Pad : the One Time Pad "initialization vector" could be something like like "Snowflake" (as the encoder could add "-Day_of_Year" to add entropy). If the decimal digits created from such a mnemonic are rationed sparsely, then this should generate more than enough entropy for a 2 KB message. As One Time Pads are optional, this should be taken as an advisory.
  • The Random Number Generator could use the same "initialization vector" as the One Time Pad, but it is a different codebase.
  • Any "Low Density" One Time Pads should not have to be remembered by the agent.
  • Low complexity rotor devices, in software could be used at some stages of message encoding. The default agent key would have to be remembered, but it is assumed that the key would change about 12 times per year.
  • An agent "Rotor Key" if a rotor system is used in the final encipherment stage before transmission.





Seeking workable solutions
The near inevitable output of this encryption format is a series of sequences of prime numbers. Given the automated signal processing technology available that allow signals and messages to be processed over many different domains of analysis -- this cannot be good.

Forbidden or Forced
  • You cannot do any digit re-grouping in the plaintext domain.
  • You must however scramble the ECM prime numbers in the ciphertext domain.
Maybe
  • You could insert non-prime numbers into the message stream, but the injection rate would have to be high -- something like 80%. The numbers would have to be filtered out. Sounds simple, but its probably a mess.
  • A well designed 8999 or 18999 element codebook could allow for some hefty message compression, but the codebook digits would be required not to collide with the alpha numeric substitution designators. Codebooks are best when implemented in the Macro section to avoid this possible conflict.

Things that must be done to lessen this possibility of message origin
  1. All bracketed output sequences must be scrambled. Sorting of the multipliers via value should be forbidden.
  2. All message sequence indicators are OK, but the message itself must have its segments sent out of sequence.
Encoding State Machine Issues
  • I don't believe in the encoding state machine changing 365 times per year, that is just overkill.
  • The encoding state machine should change with a frequency no greater than 90 days to no fewer than 40 days.
  • Agent telecom uplinks are not expected to be any more than 52 times per year, with 36 being more typical.
  • The encoding state machine should change adequately to allow for a network of 400 users to use this technology, based on a probable message frequency of 36 nominal + 8 emergency messages per year.


Conclusions

Although this ECM cryptography system can be powerful one, the day to day message transmission issues may betray the ECM origin of the message. There are many problems that may plague this transmission system and hinder its development and security.
  • Although it may be possible to hide the bracketed "multiplication groups" of primes in each message fragment, this does come at the cost of message length and increases encoder and decoder complexity.
  • This cryptography system does have error correction issues. Without error correction levels similar to the "Voyager Code" or "Cassini Code" messages may be damaged on arrival forcing extra work to recover all the codegroups.
  • There are a lot of extra decoder blocks (as well as encoder blocks) due to using ECM -- and this increase in complexity will cause multiple debugging and deployment problems.
  • The "baseband" transmission formats for this format are more limited due to its error correction properties. RTTY formats with error correction are preferred over Morse Code, not only for their higher transmission rates but their self correcting nature.
  • Bell 103 modem and Bell 202 modem are recommended as transmission formats as an alternate to self correcting RTTY transmission systems providing the message is transmitted twice.

Providing that the known issues can be overcome, this system can provide a viable and secure communications channel for messages of 2 KB to 4 KB in maximal length.





2015 Research


All of the 2010 ideas are fine and good, but no message format experimentation took place in that era. So, as it goes -- only with experimentation can flaws be found.

One should at least have a test header to see how message coding as a whole might be done.

Test Header 01
"9000201503330102315011110573097622224868333343644444"
  • "9000" blockfill
  • "2015033301" => 2015, 333rd day, Message 01; the two extra 0s are blockfill
  • "02315" => UTC Time of Message Coding; the extra 0 is blockfill
  • "01111" blockfill
  • "0573" From or To, addressed person
  • "0976" From or To, addressed person
  • "2222" blockfill
  • "4868" Message Characteristic
  • "3333" blockfill
  • "4364" Message Characteristic
  • "4444" blockfill, Terminating. The last digit can be {0...9} to make the maths more workable
  • The idea of this header format is to keep the number of factorable digits under 60, or at least under 75. 
With ECM Factoring, one gets this

9 000 201 503 330 102 315 011 110 573 097 622 224 868 333 343 644 444 =

2 ^ 2 x 17 x 1 358 393 x 52 430 489 737 x 2 231 479 141 000 129 x 832 800 810 775 396 247

Number of divisors:  96

Sum of divisors:  16 676 856 239 262 777 960 261 369 310 279 408 898 180 331
233 009 280

Euler's Totient:  4 235 385 824 722 005 280 787 810 296 291 640 758 367 141
977 915 392

Moebius: 0

Sum of squares: a^2 + b^2 + c^2 + d^2
a = 63 289 502 379 433 337 755 779 660
b = 57 016 321 069 173 391 532 155 502
c = 33 857 132 603 682 543 686 021 394
d = 24 443 283 238 701 692 392 635 902

As far as it can be understood, the factors could be thrown away and this remaining data kept :

"header()"
{
Number of divisors:  96

Moebius: 0

Sum of squares: a^2 + b^2 + c^2 + d^2
a = 63 289 502 379 433 337 755 779 660
b = 57 016 321 069 173 391 532 155 502
c = 33 857 132 603 682 543 686 021 394
d = 24 443 283 238 701 692 392 635 902
}

Taking this into account, what impact does changing the final header digit have?

Old_Header + 1 =

9000201503330102315011110573097622224868333343644445

via ECM factoring one gets :

header()
{
Number of divisors:  64

Moebius: 1

Sum of squares: a^2 + b^2 + c^2
a = 88 894 212 471 717 732 018 118 309
b = 33 136 392 265 350 884 817 616 930
c = 8
}

In essence, not only can changing the final terminating digit make the number factorable (as that is the primary purpose for allowing for this) -- but it also allows for massive data compression.







References



Original Ideas

Created
Last Modified
Last Change

Revision State


10 June 2009

04 March 2010
01 December 2015

Update minor parts of old research

Revisable