Neurological Properties to Circumvent AI’s Error Reduction Impasse

Thaddeus JA Kobylarz; Erik J Kobylarz; Thaddeus JA Kobylarz; Erik J Kobylarz

ISSN: 2641-3086

Trends in Computer Science and Information Technology

Review Article Open Access Peer-Reviewed

Neurological Properties to Circumvent AI’s Error Reduction Impasse

Thaddeus JA Kobylarz¹ and Erik J Kobylarz²

Author and article information

¹Department of Wireless Technology, Bell Laboratories Retiree, USA
²Geisel School of Medicine & Thayer School of Engineering, Dartmouth College, USA

*Corresponding author: Thaddeus JA Kobylarz, PhD, Department of Wireless Technology, Bell Laboratories Retiree, USA, Tel: 973-539-3086, 873-539-3086; E-mail: t.kobylarz@ieee.org

ORCiD : https://orcid.org/0000-0003-4129-5944

doi : 10.17352/tcsit.000070

Received: 08 September, 2023 | Accepted: 20 September, 2023 | Published: 21 September, 2023

Keywords: Neural networks; Structural neural plasticity; Functional neural plasticity; Associative learning; Inter-association; Intra-association; Memory formation; “Gestalt” phenomenon; Linearly separable; Nonlinearly separable; Excitatory synapses; Inhibitory synapses

Cite this as

Kobylarz TJA, Kobylarz EJ (2023) Neurological Properties to Circumvent AI’s Error Reduction Impasse. Trends Comput Sci Inf Technol 8(3): 061-072. DOI: 10.17352/tcsit.000070

Copyright License

© 2023 Kobylarz TJA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Our paper proposes significant changes to AI technology. We believe this is necessary because current implementations have stagnated at average error rates of approximately 8%. Implementers hope that further improvements will lower error rates to 5% by 2025. This would require 10²⁸ floating-point operations, which is not possible with today’s algorithms and computer technology. Even errors of 5% are excessive for many practical applications.

The current AI implementations have ignominious errors. Near bankruptcy of a prominent real estate corporation, and the obligatory resignation of an elected government official resulted from AI errors. The causation errors were ludicrous and unlikely performed by humans. Applications of AI are therefore limited to those for which errors are nugatory.

In contrast, the human brain’s capabilities and efficiency are astonishing. In significant contrast to current AI models, the human brain is impressive in terms of its relatively small size (adult average 79 in³), weight (approximately 4#), and power consumption (nominally 15W). We feel that this implies that AI technology needs to adopt excluded neurological properties.

The current AI neuron model is an overly simplified linear model, which was proposed about 70 years ago. We propose emulating the neurological neuron’s nonlinear capabilities. The versatility of the improved AI model would be many orders of magnitude beyond that of the currently implemented linear neuron models.

Also, the proposed neurological properties are of neural plasticity. Specifically, we describe the neurological associative learning aspect of neuroplasticity, partitioning associative plasticity into “inter-association” (neural network structure), and “intra-association” (neuron functioning).

Main article text

Abbreviations

AI: Artificial Intelligence; CO2: Carbon Dioxide; LED: Light Emitting Diode; W: Watts; #: Pounds; in: Inches.

Introduction

In this section, we argue for the need to drastically alter current AI technology. The ongoing small improvements will not successfully circumvent the error reduction impasse of today’s AI technology. Colossal failures represent a rude awakening that the current AI technology has reached a dead end regarding applications for which errors are intolerable. The frequent hype of being able to arrive at decisions more accurately than humans, because of the ability to digest immense amounts of data, has been shown to be untrue for such applications.

In order to circumvent the error reduction impasse, we are proposing changes that will require a substantial investment in work and expense. There therefore needs to be extensive justification for the substantial investment. The justification is presented by an elaborate discourse identifying reasons for the impasse and other current AI implementation problems.

At first, touted to revolutionize [1] how real estate is bought and sold, this AI application nearly drove Zillow into bankruptcy [2]. Zillow lost $304 million dollars, its stock dropped 18%, and 2000 employees were laid off in the third quarter of 2021. Needless to say, their AI division was dissolved [3]. Interestingly, the decisions to buy or sell properties were so ridiculous, that most humans would not have made them [3].

The inadequacy of current AI technology also destroyed the political career of a Netherlands prime minister [4]. This leader decided to use AI to determine the qualifications of families for the government’s childcare allowance. The AI algorithm developed a pattern of falsely labeling claims as fraudulent, and harried civil servants rubber-stamped the fraud labels. So, for years the tax authority baselessly ordered thousands of families to pay back their claims, pushing many into onerous debt and destroying many lives in the process.

These are only two examples of erroneous conclusions made by the current AI technology. Many more exist. For example, training an AI program to perform mathematics problems [5], using hundreds of thousands of examples with step-by-step solutions, was a disaster. Following this training, the AI program yielded an average accuracy of only 5 percent (95% errors) for high school algebra and trigonometry problems.

Very poor results also exist for examples of AI use in medical applications. It has been reported that 85 percent of studies using machine learning to detect COVID-19 in chest scans failed reproducibility and quality checks. None of the models was deemed ready for use in clinics [6]. In addition, it has been discovered that a nationally deployed healthcare AI algorithm in the United States was racially biased, affecting millions of Americans [7]. The AI algorithm was designed to identify which patients would benefit most from intensive-care programs, but it routinely enrolled healthier white patients into such programs ahead of black patients who were generally in poorer health.

There is now overwhelming evidence that AI performance can often prove to be unstable [8]. A slight alteration in received data can lead to a wild change in outcomes. For example, it has been demonstrated that changing a single pixel on an image can cause an AI application to consider a horse image to be a frog [9]. In addition, medical images can be modified in a way that is imperceptible to the human eye such that a misdiagnosis of cancer occurs 100 percent of the time [8].

The best AI deep-learning systems designed for recognizing objects currently have an error rate of approximately 8% and are projected to improve only to about 5% by 2025 [10]. It is anticipated that achieving a 5% error rate would require 10¹⁹ billion floating-point operations [10], which is regarded to not be possible with today’s AI algorithms and computer technology [10]. Nevertheless, the 5% error rate is unacceptable for many applications, such as AI autonomous vehicles. At a 5% error rate, an error is expected for every 20 times the vehicle is used. Because an error may result in an accident that could include a fatality, the error rate should be no more than the deadly crash rate of 0.00135% [11] per 1000 miles of travel in 2022. We are confident all AI practitioners will agree this error rate is impossible with today’s technology. There is no alternative but to make major AI technology changes in order to further reduce error rates.

Another application, which obligates a near 0% error rate, is for an internet search engine. Software programs have been developed by Microsoft (Bing Chat) and Google (Bard) to perform online searches using AI [12]. These programs are intended to allow personalized, conversational search experiences by means of a chat between the user and the search engine. Bing Chat has been found to fabricate sources and facts when none are available in order to support its assertions [12]. “The danger is that, taken at face value and without checking the sources, the user may be misinformed or even misled by the result” [12]. Google’s Bard had the same problem having produced factual errors in its demo [12,13]. Even a promoter (Riedl) expressed, “It’s difficult to predict if or when conversational AI searches will reach a level of accuracy that’s acceptable for users”. This suggests that after an AI-guided search is completed, one needs to perform a conventional search on the topic to ascertain AI’s search accuracy. Our conclusion is “Why bother with the AI search in the first place?”. The anticipated cost to implement an AI search engine using current AI technology is over $100 billion to develop the new server and new network infrastructure. In addition, new recurring costs for such technology are estimated to be $36 billion [12]. Therefore, a more efficient and accurate AI technology is required to significantly reduce these astronomical costs.

Catastrophic forgetting, the tendency of AI implementations to entirely and abruptly forget information it previously knew after acquiring new information is another AI weakness [9]. Due to inefficient memory use, AI programs overwrite past knowledge with new knowledge. As a consequence, the abysmal memory of artificial neural networks imposes many hours of repeated training.

Errors represent only one aspect of AI’s deficiency. Significant energy consumption for training makes it impractical for many applications. Training a single AI model can result in the emission of as much carbon as five cars produce in their lifetimes [14]. While training an AI application to only 13 percent completion, its graphics processing unit emitted almost as much carbon as powering a home for a year in the United States [15]. A study found that training an off-the-shelf AI language-processing system produced 1,400 pounds of carbon emissions. This is approximately the amount produced by flying one-person roundtrip between New York and San Francisco [16]. The full suite of experiments needed to build and train an AI language system from scratch can generate up to 78,000 pounds of CO2, which is twice as much as the average American exhales over an entire lifetime [16].

By comparison, the human brain, which is less than the size of three softballs, can perform astounding reasoning with very little energy consumption. The total human power rate typically varies between 45 and 85 Watts [17]. Of the total power, 20% to 25% may be consumed by the human brain. This implies human brain consumption is between 9 and 21 Watts (power in the range of a single LED lamp). Obviously, AI researchers have much to learn from the human brain’s efficiency. In the sections that follow, neurological properties are described from which significantly more efficacious neural networks will result.

Two significant conclusions are that current AI applications should be limited to those that can justify the pollution resulting from training and have only innocuous errors. When errors of serious consequences may result, then conclusions drawn from AI applications should be considered as suggestions for actions, and humans need to intercede in order to determine the proper actions. A frequent response to AI errors is that larger neural networks and more training would preclude such errors. The problem with this retort is that current AI technology has encountered a size/training limit [10]. In order to improve error rates, the computing resources and energy required to train such a future system would be enormous, leading to the emission of as much carbon dioxide as New York City generates in one month [10].

The neuron model currently being deployed is based on neurological evidence published about 70 years ago [18-20]. For AI to progress and evolve beyond its current limit, more neuroscience principles need to be adopted [21]. We have undertaken the task of introducing, heretofore not considered neurological properties to AI technology in this paper. The neurological aspects of plasticity’s associative learning and reasoning are proposed to be the basis of the next generation of AI technology.

Neural plasticity

Neural plasticity, also known as neuroplasticity, is the ability of neural networks in the brain to change through growth and reorganization [22]. Through this mechanism, the brain is rewired to function in some way that differs from how it previously functioned. Neural plasticity was once thought to occur only during childhood, but research in the latter half of the 20th century showed that many aspects of the brain can be altered (or are “plastic”) even through adulthood [23]. It is true that the developing childhood brain exhibits a higher degree of plasticity than the adult brain. However, it should be noted that neuroplasticity can occur not only with brain development, but also in neurologic diseases, such as epilepsy, as well as recovery from neurologic insults, e.g., following a stroke, and brain injury [24].

The associative learning aspect of neuroplasticity is examined in this paper. It is through associative recall that such learning is utilized. We regard associative learning and its corresponding associative recall to be an extremely important, if not the most important cognitive process. To demonstrate associative recall, consider the word “dog”. After reading this word, one or more associations come to mind. For example, an image of one’s dog may appear, or one may be reminded of an incident with an aggressive dog, etc. Such associations are vital to our thinking process. Recalling procedures, used in solving past problems provide insight to solve new problems. Associations guide us through a day’s activities, such as dressing, washing, eating, etc.

Two broad categories of neural plasticity exist [19]: 1) structural neural plasticity (the brain’s ability to change its neuronal connections) and 2) functional neural plasticity (the brain’s ability to alter and adapt the functional properties of neurons). Virtually all past work has involved functional plasticity and nearly none concerns structural plasticity. However, structural plasticity is indeed very important to learning and takes place throughout a human’s lifetime; this suggests the performance of artificial neural networks would benefit from this type of learning. The associative aspect of plasticity will be applied to each of these two categories. For structural neural plasticity, inter-association learning and recall will be described. For functional neural plasticity, a description of intra-association learning and recall is presented.

Structural neural plasticity’s inter-association

Inter-association deals with the axonal (output) connection of one neuron that connects to the dendrites (inputs) of another neuron. This is a structural connection that establishes an association between the two neurons.

An example of structural plasticity’s inter-associative learning is illustrated in Figure 1 [25]. The growth of a neural circuit can be considered to consist of three steps. Its catalyst is the simultaneous axonal firing of neurons. When pairs of neurons fire together (step 1) they become strongly linked (as with memory development). Their simultaneous firing or “potentiating” may also forge a link (step 2) to a nearby third neuron (network growth).

Through repeated firing, the 3 neurons become strongly linked (step 3). The newly formed neuron interconnections link or inter-associate the neurons. This phenomenon enables the formation of recall memories as a result of neuron network development. The inter-association process of Figure 1 is a potential basis for an artificial neural network growth algorithm.

It is important to underscore the difference between the example just provided and current AI neural networks. The current AI neural networks have fixed configurations and only the weights of connections change, usually by means based on the Hebb/Allport posit [26]. The AI deep-learning models are overparameterized, which is to say they have more parameters than there are data points available for training [10]. These neural networks are overly cumbersome, in that many extraneous interconnections exist.

No superfluous neuron connections occur in the process just described. By employing structural plasticity only necessary connections are forged with inter-associated neuron activity. Thus, the neural network has changed because of an associative need and not with an a priori-defined structure. This results in a more efficient use of the software for the neural network of an application.

A potential scenario for network growth by inter-association begins with a primitive network, which only contains the minimal connections necessary for growth. Included in the primitive network are peripheral neurons (peripheral nervous system [27]) connected to sensory detectors (afferent division [27]), as for visual, auditory, somatosensory, etc. inputs. A second layer of neurons (central nervous system [27]) exists beyond the peripheral neuron layer. Interconnections between the peripheral and central layers are, at first, minimal. As training ensues, the inter-associative firing of neurons will create more neuronal connections. It is also possible for a primitive network connection to be removed and even some neuronal atrophy may take place during training.

Many neuronal central nervous system layers exist. These deeper layers can be relegated to specialized applications, such as pattern recognition, memory recall, etc. Interconnections among deeper layers are also minimal, prior to training. Inter-associative learning will result in the growth of more interconnections and inter-associations, which are possible among these application layers. For example, a pattern recognition application could become interconnected with an application that recalls name spelling.

Functional neural plasticity will also develop via intra-association, simultaneously with structural growth as training proceeds. The next section addresses functional neural plasticity. Papers are being prepared that address algorithms for the previous inter-association scenario and a forthcoming intra-association scenario.

Another important neurological structural property to incorporate is the “Gestalt” phenomenon. A simple definition of Gestalt is to observe the whole. Gestalt psychologists emphasize that organisms perceive entire patterns or configurations, not merely individual components [28]. The view is sometimes summarized using the adage, “The whole is more than the sum of its parts”. The Gestalt principles; of proximity, similarity, continuity, closure, and connection describe human perception in connection with different objects and environments [29].

The utilization of Gestalt in AI can be demonstrated in Figure 2. Termed “invariance” [30], the same object is concluded in its various orientations. This is because each orientation shows the same salient features to create a holistic image of the object.

The salient features provide the holistic Gestalt view of an image, irrespective of size, orientation, etc. Considering features, rather than the current AI approach of considering pixels, will reduce errors considerably. Mistaking a horse image to be a frog, due to a single pixel change [9], likely evokes the reaction “This is ridiculous, a horse looks nothing like a frog”. This reaction implies a human’s recognition of a horse by means of holistic features yields no similarity to the features of a frog. Changing one pixel may distort a feature, but it will not alter the remaining features. This strongly implies that the current AI “pixel only” means for recognition is neurologically inconsistent and Gestalt feature recognition is the proper methodology.

Additionally, the current AI limitation of confidently recognizing only images that have been used in training will not exist if the Gestalt holistic means of recognition is adopted. Being able to recognize an image not used in training is currently an AI matter of luck. Hence, the Gestalt feature recognition process will considerably reduce errors, training time, and memory requirements.

A caveat exists for the Gestalt holistic approach. In Figure 3, a triangle is perceived although no triangle exists [30]. This is because our brains utilize the features of three vertices to identify the image of a triangle. Although a triangle is visualized, the image is of three black globs, each having a “V” cut. A triangle contains three lines with the three possible line pairs connected at three vertices without crossing the lines. For such mistaken conclusions, training must add features, such as a straight line connecting each pair of vertices, to remedy errors.

The illusory triangle is shown in Figure 3 for another purpose. It manifests the point that human image recognition is based on salient features to represent an entire object. Because the three vertices are used to identify the triangle, the visualized triangle can have its shape changed or be rotated and still be identified as a triangle, as long as the features exist. Figure 4 illustrates this phenomenon, showing a rotation and a size/shape change. This illustrates that the adoption of Gestalt feature recognition in AI has the potential to reduce errors, training, and memory requirements; underscoring the value of Gestalt feature recognition.

We envision the transformation from pixels to features to be performed by means of a layered neuron network. The layering will not resemble that currently used for AI deep learning networks. Rather, the first layer will begin with primitive features formed by the pixels, which constitute elementary shapes. By training via inter-association, succeeding layers will make connections that compound features of prior layers to form more complex features. The final layers will combine the most complex features to identify observed objects.

Functional neural plasticity’s intra-association

Intra-association deals with the relationship of a neuron’s dendrites (inputs). A neuron has many dendrites connected to many neurons. These many dendrites play a functional role in a neuron’s operation, termed a functional inter-association of a neuron’s plasticity.

A major purpose of this section is to provide sufficient information to write a computer program for a new neuron model. Hence, it contains a mathematical definition, a proof of completeness, a basis for a training algorithm, training examples, and sundry model properties. A casual reader may desire to merely skim through the new model’s definition and perhaps read only portions of interest.

Functional neural plasticity involves neuronal properties. Although the currently deployed AI neuron model and the model we propose both deal with weights associated with a neuron’s axonal inputs, our model is significantly different.

A major distinction is linearity. The AI-deployed neuron model’s linear threshold function severely limits its logic capability to only linearly separable functions [31].

Shown in Table 1, are the amounts of possible logic functions for linearly separable functions L(n) [32] and the total number of logic functions T(n), with respect to the number (n) of axonal inputs.

Where the total number T(n) of logic functions for “n” inputs is given by:

${T(n)=2}^{2^{n}} (1)$

The exceedingly rapid growth of T(n), as shown in Table 1, illustrates the weakness of the currently deployed linear neuron model. For n=7 there are greater than 4.06×10²⁸ (41 octillion) times more functions available than those that can be realized by AI linear deployments. Even more startling is that a human neuron receives an average of 10³ to 10⁴ inputs [33] from other neurons, suggesting the percentage of linearly separable functions is essentially zero for physiological neurons.

A partial plot of the ratio versus inputs is shown in Figure 5. Notice that the ordinate has a logarithmic scale. Logarithmic scales will plot exponential growth as a straight line. Even with the logarithmic scale the plotted rate of growth is much more rapid than exponential; suggesting that for thousands of inputs, this ratio is far beyond astronomical.

Moreover, it is absurd to think that Mother Nature, after expending the effort to create an exceedingly complicated neuron with thousands of inputs, would limit a neuron’s operation to the vanishing percentage of linearly separable functions. One is obliged to conclude that the neurological neuron is nonlinear, as well as linear, and possesses a nonlinear threshold function portion to maximize the versatility of a neuron.

Adopting a nonlinear neuron model function can be viewed as an increase of power over the strictly linear neuron model. This means that a nonlinear neuron model of seven inputs is 4.06×10²⁸ times more powerful than a linear model of seven inputs. Seven inputs represent the tip of the iceberg, as one expects the new generation of AI neurons to have hundreds of inputs.

A neuron model, in general, has two functional components. The axonal output portion of the model is binary digital; i.e., variables possess one of the two per unit values “0” or “1”. The other component is an analog threshold function. The two functions interrelate to provide an axonal output, computed according to:

$f_{} (x) = {\begin{cases} 1<=>F(X)_{} \geq θ \\ 0<=>F(X)_{} < θ \end{cases} (2)$

Where: F is an analog threshold function performed by a neuron’s cell,

θ is a threshold value contained within a neuron’s cell,

x = (x₁, . . . ,x_m) represent pre-synaptic values of binary components,

X = (X₁, . . . ,X_m) represents post-synaptic values of binary components,

x = X, the vectors have equal per unit value,

<=> represents “if and only if” hence, the inverse exists..

We introduce here a threshold function (3) that, unlike a linear threshold function, can be applied to all possible logic functions and therefore is “complete”:

The complete threshold function F_n (X) is not new. It was first presented in 1967 [34], having been conceived at the Stevens Institute of Technology during the summer of 1966, in a quest to find a threshold function not limited to linearly separable logic functions. Seed funding, provided by the institute, had paid for this endeavor with the intent to acquire future research support. At approximately the same time, research in artificial intelligence had obtained a bad reputation, due to the many unachieved promises and that no utilitarian results were realized. This resulted in no acquired research support and the seed funding led to no fruition. Because financial support for artificial intelligence research had disappeared, nonlinear modeling research had to be postponed for many years. Artificial intelligence and modeling itself languished for this long time period, devoid of interest until the recent progress of intuitively developed pattern recognition algorithms.

The single 1967 publication [34] was merely an abstract without any detail of this work. Lack of interest in AI plus the abstract’s obscurity resulted in having the model overlooked in any subsequent work to realize the separation of nonlinear logic functions. The only other nonlinear separation work, which was unsuccessful, described the use of polynomials and splines [35]. Considering that nonlinear separations needed to take place in hyperspace, the failure of these attempts is not surprising.

The equation (3) neuron model could not be found elsewhere in the literature. Observe that equation (3) contains both a linear portion and a nonlinear portion. Rather than referring to this neuron model as linear and nonlinear, the name of the Kobylarz model will be assigned to equation (3). This is akin to referring to the solely linear model as the McCullough-Pitts model [18]; which name we will use henceforth.

It is not proposed that the complete equation Fn (X) be used in its entirety for neuron models existing in a network. The equation represents a template from which pertinent variable product terms are extracted by means of intra-associative learning. Assigments of non-zero weights, by means of a training algorithm, provides the neuron model.

Examples will be given of the intra-associative learning process. But first, a proof of completeness will be provided. The proof is by means of an algorithm that will assign threshold weights for an arbitrary logic function, linear or nonlinear, according to the complete equation. Demonstrating that the algorithm will always provide an assignment of threshold weights such that F_n (X) satisfies an arbitrary f_n(x) of “n” variables represents proof that the equation is complete. The algorithm utilizes a truth table (Table 2) to assign the weights to the complete threshold function, according to any logic function f_n(x).

Table 2: Generalized Truth Table and Threshold Function Value Assignments.
Row	x_n	x_n-1	. .	x₃	x₂	x₁	f_n Value and Weight Assignment
0	0	0	. .	0	0	0	If f_n = 0, then assign θ_n > 1; else assign θ_n < 0.
1	0	0		0	0	1	f_n = 0 Λ f_n ( X ) > θ_n => assign W₁ э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W₁ э: f_n ( X ) > θ_n.
2	0	0		0	1	0	f_n = 0 Λ f_n ( X ) > θ_n => assign W₂ э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W₂ э: f_n ( X ) > θ_n.
3	0	0		1	0	0	f_n = 0 Λ f_n ( X ) > θ_n => assign W₃ э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W₃ э: f_n ( X ) > θ_n.
. . .							. . .
n	1	0		0	0	0	f_n = 0 Λ f_n ( X ) > θ_n => assign W_n э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_n э: f_n ( X ) > θ_n.
n+1	0	0		0	1	1	f_n = 0 Λ f_n ( X ) > θ_n => assign W_1,2 э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_1,2 э: f_n ( X ) > θ_n.
n+2	0	0		1	0	1	f_n = 0 Λ f_n ( X ) > θ_n => assign W_1,3 э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_1,3 э: f_n ( X ) > θ_n.
. . .							. . .
(n² + n) / 2	1	1		0	0	0	f_n = 0 Λ f_n ( X ) > θ_n => assign W_n-1,n э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_n-1,n э: f_n ( X ) > θ_n.
. . .							. . .
r	r_n	r_n-1		r₃	r₂	r₁	f_n = 0 Λ f_n ( X ) > θ_n => assign W_r э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_r э: f_n ( X ) > θ_n.
. . .							. . .
2ⁿ - 2	1	1			1	0	f_n = 0 Λ f_n ( X ) > θ_n => assign W_p э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_p э: f_n ( X ) > θ_n. Where “p” indicates penultimate.
2ⁿ -1	1	1			1	1	f_n = 0 Λ f_n ( X ) > θ_n => assign W_u э: f_n ( X ) < θ_n f_n = 1 Λ f_n ( X ) < θ_n => assign W_u э: f_n ( X ) > θ_n. Where “u” indicates ultimate.

The truth table (Table 2) has its variables specified in the complete threshold function F_n (X), with columns ordered as (x_n, . . . ,x₁). The rows are arranged according to the computation sequence specified in equation (3).

The truth table begins with row “0”, and thereafter rows are successively numbered. Row “0” has all (x_n, . . . ,x₁) components equal to “0” and is the means to assign the threshold “θ_n” value. The row “0” assignment is “1” if the logic function has a “0” value. Otherwise, the assignment for “θ_n” is “0”.

Except for row 0, each successive row is intended to assign a weight for the F_n (X) variable product term associated with the row; i.e., for the variables with a “1” value in F_n (X); where (X_n, . . . , X₁ ) = (x_n, . . . , x₁ ). The F_n (X) equation begins with the linear part, which has only one variable of (x_n, . . . ,x₁), per row, equal to “1”, and the remaining variables all equal “0”. The row variable succession of single “1’s” is x₁, x₂, . . .,x_n. The F_n (X) equation then addresses two variables having “1” values. This leads to row (n² + n) / 2. The process of successively including one more variable continues until the last row, in which all “n” variables have a value of “1”.

Although not required, initially all weights of the F_n (X) equation equal “0”. Only when necessary are weight changes made for a row. That is, if F_n (X) satisfies the fn value at a particular row, the weight for its corresponding product of variables remains the same. However, should f_n = 0 and F_n (X) ≥ θ_n, a weight, for the corresponding product of variables of the row, is assigned to make F_n (X) < θ_n. Also, if f_n = 1 and F_n (X) < θ_n, a weight, for the corresponding product of variables of the row, is assigned to make F_n (X) ≥ θ_n. That is, a weight change to a different value is only made if needed to have F_n (X) satisfy f_n(x) for the row being considered.

Theorem

All switching functions “f_n” of “n” variables have a realization by means of the complete threshold function “F_n”.

Proof

The proof is based on the preceding truth table which represents an algorithm, termed “weight assignment”. Each row of the truth table will imply an independent weight. This independent weight can be assigned any value to satisfy the needed weight summation for that row’s switching function value. This weight independence permits a set of weights to accommodate any switching function. Showing that the weight assignment algorithm assigns the “F_n” weights and a “θ_n” that satisfies fn(x_n,...,x₁) for all 2n possible product variable values, represents proof that “F_n” is complete.

Consider now row “r”. If “F_n” for row “r” is consistent with the required value for “f_n”, the weight “W_r” is not to change. Maintaining the value of “W_r” will have no impact on any preceding row’s weight. This is because all input variables of preceding rows either have the same quantity of “1” value or fewer “1” values. Hence, at least one of the “r” row’s product terms (variable with a “1” value), has a “0” value in a preceding row. This means that row “r” has no impact on preceding rows.

There may be an impact on a succeeding row. However, a succeeding row will have an opportunity to adjust its weight, following the consideration of row “r”. Just as the weight of row “r’ had no impact on a preceding row’s weight, the weight of rows, succeeding row “r”, will have no impact on row “r”.

If “W_r = 0” and “F_n” is consistent with the required value for “f_n”, then the “F_n” product term for row “r” is unnecessary and should not appear in the neuron network. Eliminating this row will save memory, reduce training, and increase processing speed. If the value “W_r” causes “F_n” to be inconsistent with the required value for “f_n”, then the algorithm instructs “W_r” to be changed in a manner to make “F_n” and “f_n” consistent. As existed for not needing a change to “W_r”, changing the value of “W_r” will also have no impact on rows preceding “r”. Although rows succeeding “r” may be impacted, when these rows are considered, their weights are adjusted so that “F_n” is consistent with the required value for “f_n”.

Hence, execution of the assignment algorithm will result in having all 2ⁿ rows consistent with the defining equation relating “fn” and “F_n”. This implies that it is possible to assign weights that provide a threshold function to satisfy an arbitrary logic function, having any number of variables, and the threshold function is complete. Q.E.D.

As an example, the weight assignment algorithm will now be used to determine the nonlinear threshold function for the logic function Table 3.

f_e = x₁ x₂’ x₃’ + x₁’ x₂ x₃’ + x₁ x’₂ x₃ + x₁’ x₂ x₃ (4)

For three variables, the complete threshold function is:

F_e = X₁ W₁ + X₂ W₂ + X₃ W₃ + X₁ X₂ W₁₂ + X₁ X₃ W₁₃ + X₂ X₃ W₂₃+ X₁ X₂ X₃ W₁₂₃ (5)

The assignment algorithm truth table is shown in Table 4. Initial weights and “θ” are assigned “0”. Only integer values, beginning with “0” will be used.

Observe that “W₃” remains at its initial value of “0”, implying that the input “x₃” is superfluous and can be removed from the input. Setting “x₃= 0” yields the “exclusive or” function:

f_e = x₁ x₂’ + x₁’ x₂ (6)

The corresponding threshold function for the example is:

F_e = X₁ + X₂ - 2 X₁ X₂; θ_e = 1. (7)

Also observe that for three variables, equation (3) has seven terms (2ⁿ -1). However, “F_e” contains only three terms or 42%. More will be discussed later concerning having fewer realization terms than included in the complete equation.

Attempts have been made to determine functions to create multi-dimensional surfaces to separate nonlinear logic functions [32]. For example, threshold functions represented by polynomials and also splines were attempted, but with very limited success. Such functions could only be applied to certain classes of nonlinearly separable logic functions and were never adopted by an AI system. The only other more recent references were abstracts published by the authors of this publication [36,37].

We contend that multi-dimensional hyperbolic functions are capable of separating any nonlinear function. A basic hyperbola, in two dimensions, can be expressed as X₂ = C_1,2/X₁ (where “C_1,2 ” is a constant). Alternatively, it can be expressed as X₁ X₂ = C_1,2. Such products of variables are the main ingredient of the complete threshold function “F_n”. It is for this reason that hyperbolic functions are considered to be the means for the binary separation of nonlinearly separable functions.

Consider the “exclusive or” as an example:

f_e = x₁ x₂’ + x₁’ x₂ (8)

F_e = X₁ + X₂ - 2 X₁ X₂ θ_e = 1 (9)

Manipulation of the equation for F_e = θ_e = 1 results in:

X₂ = (1 - X₁ ) / ( 1 - 2 X₁ ) (10)

The locus of X₂ is plotted in Figure 6. Because we chose to satisfy the condition f_n = 1 ⇔ F_n ≥ θ, by substituting f_n = 1 ⇔ F_n = θ = 1, the values on the drawn hyperbolic lines represent a logical “1” result.

To suit the multi-dimensional aspect of our switching functions, we extrapolate hyperbolic functions to multi-dimensional hyperspace. For two dimensions a single asymptote, with separation lines, exists. Multiple asymptotes, with separation surfaces, exist in a hyperspace beyond two dimensions. For example, the product of (X₁ X₂ X₃ W₁₂₃), within “F_n”, would possess a three-dimensional hyperbolic surface with two asymptotes for the binary separation.

Corollary

All nonlinearly separable functions can be separated by n-dimensional hyperbolic surfaces.

Proof

The previous theorem had proven that the general threshold function “F_n” can realize any switching function. The nonlinear part of “F_n” contains products of variables in the form (W_i…j X_i...X_j). For multi-dimensional space, the hyper-dimensional hyperbola is given by a product of two or more variables equal to a constant; i.e., (X_i***X_j = C_ij). This signifies that all nonlinear terms of “F_n” correspond to hyperbolic surfaces.

Assume that the product (W_i…j X_i***X_j) is encountered while deploying the weight assignment algorithm and a positive weight is assigned to “W_i…j”. Such an assignment is done to have F_n ≥ θ. It further reveals that no smaller subset of {X_i ,…, X_j }, that forms a product, would have resulted in F_n ≥ θ. Otherwise, a positive weight is not assigned to “W_i…j ”. The hyperspace for (X_i***X_j = C_ij) can therefore be used to form a hyper-dimensional hyperbolic surface for the nonlinear separation. For F_n ≥ θ, all of the associated variables must satisfy X_i = … = X_j = 1. If one or more of these equals “0”, then F_n < θ.

Hence, all nonlinearly separable switching functions are separable by a hyperbolic function of two or more dimensions. For the linearly separable part of “F_n” a line or a (hyper) plane provides the separation. To create the hyperbolic function, one merely assigns F_c = θ_c (θ_c is the threshold constant) and algebraically manipulates the equation to have only one dependent variable (the equation’s left side). Q.E.D.

It should be noted that, although digital computations may be precise, a neuron contains analog computations having margins of error. Hence, any hyperbolic separation will contain an indeterminate region for which a value is uncertain. This indicates that variables within a threshold function must take on values such that “F_n” is always outside of its indeterminate region. To accomplish this, consider an indeterminate width of “Δ”. One assigns weights an amount “Δ” above that required in the original algorithm.

For the previous example, the threshold function becomes:

F_e = (1 + Δ) X₁ + (1 + Δ) X₂ -2 (1 + Δ) X₁ X₂; θ_e = 1 (11)

Resulting in:

X₁ = 1 Λ X₂ = 0 => F_e = (1 + Δ) > θ (12)

X₁ = 0 Λ X₂ = 1 => F_e = (1 + Δ) > θ, (13)

X₁ = 1 Λ X₂ = 1 => F_e = 0 < θ (14)

By assigning F_e = 1 the equation becomes:

(1 + Δ) X₁ + (1 + Δ) X₂ - 2 (1 + Δ) X₁ X₂ = 1 (15)

Algebraic manipulation, to determine the hyperbola, yields:

X₂ = [1 - (1 + Δ) X₁] / [(1 + Δ) (1 - 2 X₁ )] (16)

Resulting in:

X₁ = 0 => X₂ = 1 / (1 + Δ) (17)

(That is, the left line of Figure 6 is lowered for Δ > 0.)

And:

X₁ = 1 => X₂ = Δ / (1 + Δ) (18)

(That is, the right line of Figure 6 is raised for Δ > 0).

Figure 7 [38] represents a means to posit the neurophysiological process that performs the “exclusive or” logic function. Observe that an axon may have branches that form synapses on both the soma and dendrites. The effect on the postsynaptic neuron is determined by the type of receptor that is activated, not by the presynaptic neuron.

Receptors can be either excitatory or inhibitory. This is significant because excitatory (Type 1) synapses are typically located on the shafts or the spines of dendrites; whereas inhibitory (Type II) synapses are typically located on a cell body [39]. The different locations of Type I and Type II synapses divide a neuron into two zones: an excitatory dendritic tree and an inhibitory cell body. This means that the postsynaptic neuron may interpret presynaptic axonal stimuli as excitatory or inhibitory. Whether the presynaptic neuron’s axonal signal is excitatory or inhibitory depends upon the specific neurotransmitter released, as well as the training or the learning experience (intra-associative plasticity) of the postsynaptic neuron. That is, the neuron-applied weights of the excitatory and inhibitory synapses determine which of the two possible reactions will predominate.

With respect to the “exclusive or” function (f_e = x₁ x₂’ + x₁’ x₂), when “x₁ = 1” and “x₂ = 0”, the neuron soma training will have its “X₁” excitatory weight supersede its inhibitory weight. This also exists for “X₂” when “x₁ = 0” and “x₂ = 1”. However, by intra-association, the weight for the “X₁ X₂” associated product will have a neuron soma-trained weight for which the inhibitory response prevails. Hence, an “exclusive or” function is achieved. An analysis of chemical/electrical neuron properties also reveals the neuron’s capability of performing nonlinear logic [40].

As indicated earlier, all terms of equation (3) will not appear in the threshold function realization. The nonlinearly separable logic function f₄ = x₁ x₂ + x₃ x₄ provides such an example. By using the assignment algorithm, the threshold function is:

F₄ = X₁ X₂ + X₃ X₄; θ₄ = 1 (19)

Equation (3) will contain 15 terms (2n -1). Whereas, “F4” has only two product terms (13%). We anticipate that usually 20% to 40% of the equation (3) terms will appear in a realization.

The AI’s McCullough-Pitts neuron model possesses other disparities with the physiological neuron. For instance, an AI “and” logic function requires a large percentage variation of the threshold as the number of inputs increases. A ten input “and” logic function requires a threshold five times as large as a two input “and” logic function. There is no neurological evidence that a neuron’s threshold varies to such a degree.

We believe that the physiological neuron does not vary its threshold to accommodate “and” logic functions. Although some threshold variation has been observed, “Actual threshold variance is relatively low” [41]. An in vivo threshold, measured at a soma, varies between -52.1 mV and -42.2 mV [42]. By using -70 mV as a resting potential, one can show that the variation is only about 40%; which cannot even support an “and” function of two variables, since a two variable “and” function requires a threshold doubling. This indicates that the McCullough-Pitts model is neurologically inconsistent for “and” function realizations.

Explanations of the threshold variation included neuron accommodation of synaptic ionic densities, not for logic. Stated was that “two known ionic mechanisms were found to make the threshold adapt to the membrane potential, thus providing the cell with a form of gain control [42]”.

Our weight assignment algorithm results in a per unit threshold value of “1”, when not at “0”; which is per unit consistent with neurological evidence of only a per unit threshold value that equals 1. For this evidence, a McCullough-Pitts model becomes incapable of performing “and” logic. (A threshold of “0” is indicative of an unstable neuron, as action potentials occur continually when there is no stimulus to terminate the instability. This may suggest a means to terminate an epileptic seizure without medication.)

Another neurological disparity exists for the AI McCullough-Pitts model’s use of “and” logic threshold variation. Unsustainably tight analog tolerances would exist for the analog portion of a physiological neuron, should a physiological neuron be only linear.

The relationship between the number of “and” logic inputs and acceptable tolerance for a linear threshold function, can be established by considering the following worst case scenario:

“Consider an “n” input “and” function such that all but one input has a “1” logic value.”

The single nonconforming input does not occur and therefore has a “0” value. The following would therefore represent the threshold function and its threshold relation.

$F_{n} (\underline{X}) = \sum_{i=1}^{n-1} W_{i} X_{i} < θ (20)$

Representing “Δ” as the percentage variation of the variables results in the worst case equation:

$F_{n} (X) = \sum_{i=1}^{n-1} [(1 + Δ) W_{i} X_{i}] < θ, Δ > 0 (21)$

Substituting per unit values of the threshold function permits the simplification:

$F_{n} (X) = \sum_{i=1}^{n-1} [(1 + Δ) < n (22)$

Manipulation results in:

n < (1/Δ) + 1 (23)

Considering that analog systems have difficulty achieving more than a 1% (Δ = 0.01) accuracy, had current AI systems incorporated analog threshold functions, “and” logic functions would not be error free with more than 100 inputs. Since a physiological neuron possesses thousands of inputs resulting in an infinitesimal tolerance, this precludes having an analog linear threshold function calculation for “and” logic. It is possible to simulate on, digital computers, such analog operations. However, in order to avail the large number of “and” logic inputs, this model is very wasteful of memory, training, and all computer processing.

The threshold tolerance problem is avoided by intra-association within the Kobylarz model. As Mother Nature’s infinite wisdom would conclude, rather than a sum, a product of 1’s and 0’s are utilized to evaluate an “and” function. The intra-associated product terms in the complete threshold function “F_n” are the analog portion representation of an “and” function. If any input component of the “and” logical function does not exist, its value of “0” is used in the product. Consequently, no threshold tolerance problem exists.

Conclusion

The current AI implementations have an average error rate of approximately 8%. It is projected that a 5% error will not be achieved until 2025, and at the enormous cost of 10¹⁹ billion floating-point operations. Computational cost is estimated to grow to at least the fourth power for improvement. A 10-fold improvement, for example, would require at least a 10,000-fold increase in computation. In practice, the actual requirements have been scaled to at least the ninth power. This means that to halve the error rate, one can expect to need more than 500 times the computational resources which is a devastatingly high price.

These statistics indicate that the current AI methodology has reached an impasse to error reduction that is essential for its progress. Because of the human brain’s efficiency, the obvious strategy is to incorporate more of the brain’s behavior into AI technology. We believe that the incorporation of plasticity into the structure and function of AI will yield a dramatic, versatile performance improvement. This will allow for the building of novel neural networks that evade the current error reduction impasse experienced by today’s AI technology.

The inter-association of structural plasticity, incorporating Gestalt principles, will reduce network size and training extent. During training, networks will be created that only have the number of neuron models and their interconnections needed for an application. Functional plasticity and its intra-association training, using the Kobylarz neuron model template, will create networks of neuron models many orders of magnitude more powerful than the current AI neuron model. This vast efficiency improvement will establish a new generation of AI with a performance capability heretofore unavailable.

Development of algorithms represents future work. We believe that the presented basic principles, presented for the structural plasticity’s inter-associative property, can lead to an algorithm to grow a neural network from training. Because it is evolved by training, only relevant neurons having relevant connections will result. Unlike the currently a’ priori defined AI networks that possess extraneous network connections and superfluous neurons. The description of functional inter-association includes a mathematical definition, proof of completeness, and a weight assignment procedure. This information should be sufficient to compose an algorithm, because the Kobylarz neuron model is vastly more versatile than the currently used McCulloch & Pitts model, the size of neural networks and their efficacy will be substantially improved. Hence, errors will be reduced.

Acknowledgement

I, Thaddeus Kobylarz, acknowledge the assistance of Dr George Stibitz and Dr. Preston Clement to my portion of this paper. My mentor and introducer to digital theory, Dr, Stibitz advised me during my Master’s thesis research and before I left Vermont in 1960, I asked him to suggest a dissertation topic. He provided me with a paper describing Rosenblatt’s Perceptron and stated that this could be a kernel for future digital processing. I eventually built a neuron model for my doctoral dissertation research. It consisted of an analog computer with interfacing digital circuitry. Being a linear neuron model, it became immediately apparent to me that no threshold weight convergence existed for certain switching functions.

While a young professor at the Stevens Institute of Technology, I was able to persuade my Electrical Engineering Department Chairman, Dr. Clement, to request Stevens Administration’s support for performing research on neuron modeling during the summer of 1966. The purpose was to conceive a neuron model that did not have the switching function limitation of linear neuron models and afterwards to solicit funding for implementation of this model in a neural network. The outcome was the neuron model template described in this paper. Unfortunately, additional funding for this endeavor was denied by the National Science Foundation and no further AI work was done.

References

Article Alerts

Subscribe to our articles alerts and stay tuned.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Quick Enquiry

Table 1: Limitation of the AI Linear Neuron Model.
n	Linearly Separable Logic Functions L(n) [32].	Total Logic Functions T(n)	Ratio T(n)/L(n)
1	4	4	1
2	14	16	1.143
3	104	256	2.462
4	1,882	65,536	34.82
5	94,572	4,294,967,296	45,415
6	15,028,134	1.84467440737x10¹⁹	1.227x10¹²
7	8,378,070,864	3.40282366921x10³⁸	4.062x10²⁸

Trends in Computer Science and Information Technology

Neurological Properties to Circumvent AI’s Error Reduction Impasse