According to Encoding Theory and Information Theory :

How many binary Bits does it take to Encode the State(fulness) of a NOUN or VERB?



Abstract

In Information and Coding Theory (as well as Cryptography) the issue has always arisen in the modern era -- as to how many bits are needed to encode a grammatical structure. It is not at all clear if anyone has any sort of viable answer to this basic coding problem.

Some notable notable grammatical structures have a known number of bits to encode them

Yet, as a whole -- for most languages commonly used in telecommunications  -- there is not a unified view on the bits needed to encode the most basic parts of speech : Nouns and Verbs. 

One must remember that there is an entire branch of languages in Asia (and Australasia) that don't have (and have never used) Adverbs and Adjectives.

In "content containerization" terms Adverbs and Adjectives are at best "a blight and annoyance of Indo-European Languages" and are thus not needed to determine the sateful complexity of a language's Nouns or Verbs.

Separating out the different grammatical components from Nouns and Verbs is an absolute requirement, but this is not without some cost to accuracy as there is always some binding of these grammatical objects to other grammatical objects with some encoding bits existent no matter what.


Error Correction Aspect

If each Noun or Verb used in a language has up to 32 bits of state associated with it globally, then these "statefuness bits" could be considered Error Correction mechanisms. Error correction mechanisms can have checksums, hashsums and distributed check bits.

Considering the theoretical possibility of a phrase having 5 words, its Noun Subject and primary Verb would give it ~264 (~18,446,744,073,709,551,616) encoding complexity.  

The primary complainants of a phrase would [by nature] have Hashsum encoding complexity (CRC128-ECMA, MD5, SHA-1 and for longer grammatical structures Whirlpool) if this is true.

Because Error Correction and Cryptography and Data Compression are [at a basic mathematical level] equivalent functions proven by the Information and Coding Theory realms of research -- one must at least assume that any grammatically correct phrase that is deterministically correct -- can not exist in any other combination of words or word-atoms.    

Yet, in word order languages (English, Cantonese, Mandarin etc) word-atoms are functionally replaceable in most common phrase structures at least to the point of 4 Thousand Millions of permutations.


Hints at the hidden complexity

Even with simple Nouns there is a lot of hidden complexity to take into account for the simplest of usages.

English (a Celtic-Germanic hybrid with Old French and Latin-Greek vocabulary inheritances)

The English that existed around 1080 BCE had a lot more in common (in grammatical stateful complexity) with modern day Czech or Croatian. So there has been a lot of change in form and function the past 1100 years, but some languages may not change that much over as long a time. Koranic Arabic has only had modest changes in the past 500 years, but somehow the changes has been greater overall than Sanskrit. 

"Separating out a Named Object"

"A" or "AN" to separate out an Noun as having a notable quality, has 3 bits of determinism

Verbs have just as much complexity as Nouns do, but in a different way due to the entirely separate way Verbs are used.

Hungarian and Finnish (and the Asian languages they are related to)

Rings of Deterministic Bits

Grammar does strange things to the stateful organization of determinism of a word. Grammar is about getting

Global bit settings for each word or word-atom (use here to account for compound nouns and verbs) are not enough to make a grammatically correct phrase.      


Locality Notes Examples



Global Every word or word-atom in a language has global settings that determine how the grammatical system may use it.
Phrase Within each phrase's local structure there may be exceptions to the Global bit settings. These exceptions (local bit reset switches) may overrule part of a grammatical structure based on custom or or word-atom dependencies. Polish preserves 2 or 3 instances this way of its long abandoned Binary Plurals. No other Slavic Language is known to have ever had Binary Plurals.
Word
or
Word-atom
Words themselves (or more often word-atoms) may have their own stateful rules that force them to be used in a phrase in some ways but not others. In English "A" or "AN" before a named object (Noun) change based on the object starting with a CONSONANT or VOWEL.


Basic Principals of Analysis

There are certain rules that must be in place to make a reasonably correct stateful analysis of Noun and Verb stateful complexity. A lot of the complexity of Nouns and Verbs is hidden in their extreme use and commonality. Hidden complexity in almost all linguistic concept encoding runs deep...
  1. All languages have a huge amount of encoded redundancy in the Phonetic and Typographical domains. The minimal redundancy rates found are universally above 40% and below 55% (based in 1935 to 1960 cryptanalytic research, using both codebook and letter frequencies : Digraphs, Trigraphs etc).
  2. All Typography must be ignored, this is about grammatical statefulness. 
  3. Phonetic statefulness should only be paid attention to when it affects the boundary issues of how a Noun or Verb is encoded. Nouns and Verbs are as a rule universally reshaped by nearby Vowels and Consonants as a matter of linguistic redundancy (and error correction). 
  4. All attempts must be made to add in the encoding complexities of Non-European languages. The Eurosphere notably has 3 known non-European languages : Pre-Roman : Basque, Asiatic Branch : Finnish and Hungarian.
  5. Agreement with other parts of speech must be considered in the complexity analysis, including Adverbs and Adjectives or their analogues. 
  6. The number of "Cases" a language has (or uses) should not affect the maths relating to Noun or Verb encoding complexity.
  7. One can apparently have either Prepositions (lightweight) or Cases (more deterministic) but usually never both, but neither is a Noun or Verb so cannot and should not be counted.
  8. All subclassifications should have subtotals, as a matter of debugging the complexity model. 

Notes


These practices must be in place so as to avoid intertwining global and local bit encoding schemes.

Example of Local vs Global Bits encoding a Phrase
Conclusion:
Noun & Verb Complexity is ~228 = 268,435,456
So 268m ways to get it wrong!











Sub
Total
Bits Global (to the language as a whole)
3
NA
2
2
Probable 8
PP
3
NA
2
2
=
22
Bits Local (to the word atom's local state)
1
NA
1
1
Probable 2
PP
1
NA
1
1
=
6
Phrase --ยป
The
quick
Gray
Fox
jumped
over
the
lazy
Brown
Dog.


NA : Not counted (Adverb or Adjective)
PP : Preposition, a grammatical Case not in the analysis











Grand
Total
NOTE : REVERSION 3_2015











28




Nouns

Text TBA



Functional Encoding Aspect
Global
Bits
Category
Subtotal
Category
Total
Notes





Separable Status




Does Separable Status exist at all?
1


As in "A" vs "AN" vs "THE"
Force No Separable Status absolutely and exclusively.
1


Slavic Languages
How many kinds of Separable Status exist?
2


English
Is Separable Status universally forced on all Nouns?
1


All forced : German, French, Arabic ...
Does Separable Status exist only by (Case or Preposition)?
1





6







Imposed Object Gender




Use no Gender at all? (all Neuter (N))
1


English
Force Neuter absolutely and exclusively?
1


English
Use Masculine (M) & Feminine (F)?
1


Many Indo-European languages
Force M & F absolutely?
1


French, Spanish, Portuguese ...
M & F & N & mixed in 4 states?
1


Czech
Inanimate Object Gender
1


Indigenous Languages Globally
Inanimate Object Gender agrees with M/F/N?
1


Indigenous Languages Globally
Force Inanimate Object Gender to Neuter?
1


Indigenous Languages Globally










Plurals (Multiplicity)




Use Plurals at all?
1


Indonesian has none, dog dog = dogs
Are Binary Plurals used?
1


Polish historical use
Use Gender Agreement
1


French, Spanish, Portuguese, ...










Compound Nouns (Concatenation)




Compound Nouns Forbidden?
1


Almost true in French, Spanish, Italian...
Compound Nouns permitted at all?
1


Mostly not in French, Spanish, Italian...
Upper limit on Compound Noun units?
3


Most Germanic Languages, Hungarian
Force Compound Nouns to Neuter?
1


German
Permit Adjectives in Compound Nouns?
1


Hungarian
Permit Prepositional Case in Compound Nouns?
1


Hungarian










Agreement with (-)




Previous Adjective?
1



Previous Adjective Gender State?
1



Previous Adjective Plural State?
1



Previous Noun Gender State?
1



Previous Noun Plural State?
1



"Case" or Prepositional use at all?
1


Hungarian : Londonban = London
"Case" or Prepositional use forced for only one Case?
1


English Possessive
"Case" or Prepositional use universally forced? 1


Laplandish maybe does this.




















Verbs

Text TBA



Functional Encoding Aspect Bits
Category
Subtotal
Category
Total
Notes





Separable Status (Can a Verb be used alone?)




Permit Verb Compounding?
1


EN : "Will have done"
Absolutely forbid Verb Compounding?
1








Temporal Adaptability




Allow Past Tenses?
1



Allow Past Past Interaction in same phrase? 1


Allow Future Tenses?
1



Allow Future Future Interaction in same phrase? 1







Allow Future & Past Interaction in same phrase? 1







Perfection or Indifference to Probable Completion State




Verbs have Imperfective States?
1



Preferred rate of Imperfective States? 4

Decimal Binary Fraction
Verb "Indifference to Completion" via Word Order alone?
1


English's Subjunctive is via Word Order
Verb "Indifference to Completion" via Word Order exclusivity? 1

Mandarin, Cantonese?
Is the Imperfective State different from Ordinary (Present / Past / Future)?
1


EN: "-ING" terminating Verb ending; FR "-ANT" ...
Allow Present Tense Only?
1


EN: Doing, going, ...
Allow in Future Tenses?
1



Allow Future Tense via Compounding Only?
1


EN: Will be going...
Allow in Past Tenses?
1



Allow Past Tense via Compounding Only?
1


EN: Was going...
Permitted Subjunctive <-- --> Imperfective "Functional Equivalence"?
1


Translation (Romance <-- --> Slavic)





Past Tenses




Use Past Imperfection?
1


EN: Was doing, was going ...
Use Coupled Past Imperfection & Perfection?
1



Use Past Perfection?
1


EN: Was finished.
Use Past Total Perfection?
1








Present Tenses









Future Tenses









Compound or Concatenation State












Overall conclusions, or at least observations...


References

Languages

Information Theory

Coding Theory

Error Correction



Initial Idea
Created
Last Update
Last Change
Version
Revision State

15 February 2015
31 August 2015
17 September 2015

Minor fixes

0.18
Initial, revisable