According to Encoding
Theory and Information Theory :
How many binary Bits does it take to Encode the State(fulness) of a NOUN or VERB?
Abstract
In Information and Coding Theory (as well as Cryptography) the
issue has always arisen in the modern era -- as to how many bits are needed to
encode a grammatical structure. It is not at all clear if
anyone has any sort of viable answer to this basic coding problem.
Some notable notable grammatical structures have a known number of bits to encode them
Yet, as a whole -- for most languages commonly used in telecommunications -- there is not a unified view on the bits needed to encode the most basic parts of speech : Nouns and Verbs.
One must remember that there is an entire branch of languages in Asia (and Australasia) that don't have (and have never used) Adverbs and Adjectives.
In "content containerization" terms Adverbs and Adjectives are at
best "a blight and annoyance of Indo-European Languages" and are
thus not needed to determine the sateful complexity of a
language's Nouns or Verbs.
Separating out the different grammatical components from Nouns
and Verbs is an absolute requirement, but this is not without some
cost to accuracy as there is always some binding of these
grammatical objects to other grammatical objects with some
encoding bits existent no matter what.
Error Correction Aspect
If each Noun or Verb used in a language has up to 32 bits of
state associated with it globally, then these "statefuness bits"
could be considered Error Correction mechanisms. Error correction
mechanisms can have checksums, hashsums and distributed check
bits.
Considering the theoretical possibility of a phrase having 5 words, its Noun Subject and primary Verb would give it ~264 (~18,446,744,073,709,551,616) encoding complexity.
The primary complainants of a phrase would [by nature] have
Hashsum encoding complexity (CRC128-ECMA, MD5, SHA-1 and for
longer grammatical structures Whirlpool) if this is true.
Because Error Correction and Cryptography and Data Compression
are [at a basic mathematical level] equivalent functions proven by
the Information and Coding Theory realms of research -- one must
at least assume that any grammatically correct phrase that is
deterministically correct -- can not exist in any other
combination of words or word-atoms.
Yet, in word order languages (English, Cantonese, Mandarin etc)
word-atoms are functionally replaceable in most common phrase
structures at least to the point of 4 Thousand Millions of permutations.
Hints at the hidden complexity
Even with simple Nouns there is a lot of hidden complexity to
take into account for the simplest of usages.
English (a Celtic-Germanic hybrid with Old French and Latin-Greek
vocabulary inheritances)
The English that existed around 1080 BCE had a lot more in common (in grammatical stateful complexity) with modern day Czech or Croatian. So there has been a lot of change in form and function the past 1100 years, but some languages may not change that much over as long a time. Koranic Arabic has only had modest changes in the past 500 years, but somehow the changes has been greater overall than Sanskrit.
"Separating out a Named Object"
Verbs have just as much complexity as Nouns do, but in a
different way due to the entirely separate way Verbs are used.
Hungarian and Finnish (and the Asian languages they are related to)
Rings of Deterministic Bits
Grammar does strange things to the stateful organization of determinism of a word. Grammar is about getting
Global bit settings for each word or word-atom (use here to account for compound nouns and verbs) are not enough to make a grammatically correct phrase.
Locality | Notes | Examples |
Global | Every word or word-atom in a language has global settings that determine how the grammatical system may use it. | |
Phrase | Within each phrase's local structure there may be exceptions to the Global bit settings. These exceptions (local bit reset switches) may overrule part of a grammatical structure based on custom or or word-atom dependencies. | Polish preserves 2 or 3 instances this way of its long
abandoned Binary Plurals. No other Slavic Language is known
to have ever had Binary Plurals. |
Word
or
Word-atom
|
Words themselves (or more often word-atoms) may have their own stateful rules that force them to be used in a phrase in some ways but not others. | In English "A" or "AN" before a named object (Noun) change based on the object starting with a CONSONANT or VOWEL. |
Basic Principals of Analysis
There are certain rules that must be in place to make a reasonably correct stateful analysis of Noun and Verb stateful complexity. A lot of the complexity of Nouns and Verbs is hidden in their extreme use and commonality. Hidden complexity in almost all linguistic concept encoding runs deep...Conclusion: Noun & Verb Complexity is ~228 = 268,435,456 So 268m ways to get it wrong! |
Sub
Total
|
|||||||||||
Bits Global (to the language
as a whole) |
3 |
NA |
2 |
2 |
Probable
8 |
PP |
3 |
NA |
2 |
2 |
= |
22 |
Bits Local (to the word
atom's local state) |
1 |
NA |
1 |
1 |
Probable
2 |
PP |
1 |
NA |
1 |
1 |
= |
6 |
Phrase --ยป |
The |
quick |
Gray |
Fox |
jumped |
over |
the |
lazy |
Brown |
Dog. |
||
NA : Not counted (Adverb or
Adjective) PP : Preposition, a grammatical Case not in the analysis |
Grand
Total
|
|||||||||||
NOTE : REVERSION 3_2015 |
28 |
Text TBA
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Text TBA
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Overall conclusions, or
at least observations...
References
Languages
Information Theory
Coding Theory
Error Correction
Initial Idea | Created | Last Update | Last Change | Version | Revision State | |||||||
15 February 2015 | 31 August 2015 | 17 September 2015 |
Minor fixes |
0.18 | Initial, revisable |