Natural computation entered week 6 with an introduction to unsupervised learning. That is, learning in a neural network without a target for output. This is generally achieved through classification/clustering/self organising maps [SOM].
self organising map
The networks for SOMs are actually a little bit simpler than MLP. The process for creating clusters is also quite intuitive. Each neuron in the feature map layer has a unique weight vector, if an input results in that neuron being the most activated (which neuron has the lowest euclidean distance from the input vector) then its weight values move closer to that of the input ( again using euclidian distance):
The concept of decaying the learning rate was introduced during the lecture but this must be done carefully. If one were to train a network until the weight adjustments stabilized, training will end after a certain number of epochs regardless of how well the network has clustered the data.
Finally the concepts of ‘topological neighborhood’ was introduced. In actual brains, weights of neighboring neurons are updated when a neuron wins the competitive activation. Logically this will result in similar classifications being held by neighboring neurons. The update of the neighboring weight can be done using Gaussian or exponential decay functions:
Week 6 continued on the topic of firewalls. We got into some more detailed discussion about the implementation of the concept of firewalls. For example, the use of hardware/software and integration into the OS kernel etc. The question of where and how the firewall engine should be implemented generates a number of options. One thing to remember is that (all else being equal) a more secure configuration will result in less convenience for the business/users. The requirements in terms of security and convenience of each case must be the driving force behind decisions such as the FE implementation.
Next weeks lecture will be focussed on the IP layer and IP security. This week the focus for network security has been the RSA assignment so this is a short post!
Taking a turn away from first order logic, search algorithms and planning, week 5 introduced the key issues around natural language processing [NLP] and the programming language Prolog.
The logic programming paradigm use by Prolog is something I have not learned about before. The development of axioms and problems solving by querying the axioms is the foundation of languages such as prolog. The engine of Prolog is a backward chaining theorem prover. The axioms in logic programming need to be Horn clauses: disjunctions of literals with exactly one positive literal
An example:
1 2 3 4 5 6 7
king(X)& greedy(X) → evil(X).
king(john).
greedy(john).
?evil(john).
In the tutorial we were able to do some basic playing with a toy implementation by Lloyd Allison:
Prolog relies very heavily on unification, a process that we were actually unable to correctly re-inact in the tutorial.
p(X, c(X,Y)).
p(X, c(Y,Z)) <= p(X,Z)
?p(A, c(1,c(2,c(3,nil)))).
Prolog Answer:
p(1, c(1, c(2, c(3, nil)))) yes
p(2, c(1, c(2, c(3, nil)))) yes
p(3, c(1, c(2, c(3, nil)))) yes
After reading the tutorial solution, I am not really much clearer on the proves for each of these outcomes. I will have to follow up in the lecture.
NLP
We discussed the surface level methodologies for NLP:
Lexical analysis
Syntactic analysis
Semantic analysis
Pragmatic analysis
The focus of the lecture was however on the limitations of NLP. How ambiguity of words, their meaning and context makes effective NLP very difficult. Implications were another issue for NLP covered for some time.
Next came some approaches for overcoming the challenges of NLP. Statistical approaches such as N-Gram analysis. This veered the lecture into information retrieval , discussing the techniques used by search engines such as google to interpret searches.
On the topic of NLP I wondered if there were any large knowledge bases being assembled to try and assist in the task. Yahoo have a cluster of computers working on this:
SELECT object_name FROM user_objects WHERE object_type='FUNCTION';
SELECT*FROM user_source WHERE name='PATIENT_AGE';
SELECT*FROM user_errors;
David Taniar made a goof portion of source code available on moodle for student, this will assist greatly.
The component that we have covered so far are not particularly complicated. However, when implementing the number of processes we need to remember is beginning to grow. Some practical revision of work done in the tutorials will be needed.
This weeks tutorial involved the implementation of a naive library program. For me, implementing the stored procedures that we had just learnt was the easy part. Remembering how to create a button to interact with a LOV object proved more illusive. I will endevour to find some time whilst at uni to do some revision of the LOV and button objects!
Part 2 of the MLP lectures was completed in week 5. We ran through some extended examples including Batch and Online learning methods. The issue of local minimums and over fitting were also introduced along with some ways of overcoming the limitations they impose.
It turns out that batch learning is the most common method of learning. We ran through an example of proportionality using Mean Square Error [MSE] then a further example applying momentum.
The concept and reasoning behind each operation in back-propagation and batch learning are quite clear, I definitely need to do some repetition to memorize the process for an exam condition however.
The next topic was network generalization, whereby the fitting of the model is relaxed. This ensures that noise and sample data patterns do not have a negative impact on the ability of a NN generated model to reflect further values.
Other method for preventing over fitting, thrashing and intractable learning were:
Early stopping [set number of epochs]
Regularization/weight decay
Data normalization
More that will be covered in next weeks lecture
The tutorial enabled us to start using matlab. The nprtool and nntool were used to create neural networks which could then be exported and manual modified to specific requirements. I found matlab to be fairly easy to use, with exception for the plotting tools when I was unable to make what I wanted with.
Firewall was the topic of week 5’s lecture. We begun by discussing what the definition of a firewall is. We put it simply:
A firewall is a “choke point/guard box” of controlling and monitoring the network traffic.
It allows interconnections between different networks with some level of trust.
It imposes restrictions on network services (only authorized traffic is allowed).
It enforces auditing and controlling access (alarms of abnormal behavior can be generated).
It provides perimeter defence.
Ideally a firewall will block all ‘bad’ traffic whilst allowing all good traffic. Differentiating between good and bad traffic is a very difficult task.
Some slides were dedicated to the demilitarized zone [DMZ]. As shown above the DMZ is a sub network which is exposed to the internet. One would usually see servers such as web, email and DNS in the DMZ.
After running through the key components of firewall architecture, the lecture focussed on the importance of organisation structure and needs. Knowing which services are required, who should be able to use them and from which locations they can be used is necessary knowledge.
Some firewall types were also explored in the lecture notes:
Packet filtering [network layer]
Stateful packet filtering
Circuit level [transport layer]
Proxy firewalls [Application level]
There was a lot more detail in the lecture notes which I look forward to hearing about in next weeks lecture.
The fourth week of Intelligent systems provided an introduction to Planning. The question that stands out in my mind from the tutorial is: what is the difference between problem solving via search and problem solving via planning? After running through the weeks material I hope to be able to answer that question.
First off the two learning objectives for the week (hmm seems to simple):
Goal stack planning
Partial order planning
What do we need to do goal based planning?
a world model
an action model
a problem solving strategy
Planning uses a divide and conquer strategy, allowing for sub goals to be identified. One must be careful to ensure that interrelation between sub goals is identified. This can reduce branching factors and assist in problems where heuristics are difficult to define.
Planning algorithms can work forwards or backwards, it seems that in most situations working backwards from the goal proves to be more efficient and further reduces branching factors.
Here is an example of a Goal stack planner from the lecture:
World model: Objects, states and goals
Action model: Operators
Problem solving strategy: Goal-stack planning
States are defined using first order logic, ie: AT(home) ΛHAVE(milk) ΛHAVE(bananas)
At this point we were introduced to the frame problem
The frame problem is that specifying only which conditions are changed by the actions do not allow, in logic, to conclude that all other conditions are not changed.
I don’t really understand why this is such an issue at present, the human brain does not reconfirm every fact after an action and it can still function effectively. Perhaps the reasoning this is considered such a big issue will become more apparent as I learn a bit more on the topic.
So, with an action in the goal stack planning systems [ using STRIPS] would appear as such:
>ACTION: GO(market)
>PRECONDITION: AT(home) ΛCAR-AT(home)
>ADD: AT(market), CAR-AT(market)
>DELETE: AT(home), CAR-AT(home)
From the goal state, one can define the preconditions, identifying which action is required to generate those conditions and work back until the initial state is reached. I do have some questions on how actions are selected and backtracking occurs. As in box world if B is picked up, why would it not be placed back on C (see lecture notes) unless there is an explored set.
After the box world example we moved onto Partial order Planning:
Partial order planning allows for less commitment in the search, reducing backtracking. I am still a little fuzzy on how this would be implemented so I will have to review the text book.
So, planning is clearly different from searching which simply applies valid operations to an initial state until it stumbles onto the goal state (this stumbling can be guided by heuristics).
Unfortunately I was absent for week 4’s lecture and tutorial. My review of the week will be limited to the printed material. PL/SQL was the topic of week 4.
Ok, so to start with I have not used PL/SQL before so it is worth defining; a procedural programming language developed by Oracle as an extension for their relational databases. It allows for complex applications to further leverage the DB layer.
The general structure of PL/SQL:
DECLARE
<variable declarations>
BEGIN
<program statements>
EXCEPTION
<error handling statements>
END;
Any oracle datatypes can be used (CHAR, VARCHAR2, NUMBER, etc).
Constructs covered in the lecture were:
IF/THEN/ELSE
Loops, pretest and posttest
Cursor (implicit[select,from,where] and explicit[FOR DroomRow IN DCursor LOOP])
Exception(pre-defined, undefined, user defined) and error handling
Triggers is an interesting topic, I have always been of the opinion that this sort of procedure should be in the application layer (aside from logging). The syntax:
CREATE OR REPLACE TRIGGER trigger_name
{BEFORE|AFTER|INSTEAD OF]}
{INSERT|UPDATE|DELETE}
[OF <attribute_name>] ON Table_name
[FOR EACH ROW] [WHEN (condition)]
BEGIN
Trigger_body
END;
I will need to work through the tutorial work to get some practice with this material!
Natural computation, week number 4 -> Multilayer perceptron [MLP] for non-linear data analysis.
MLP is one of several neural networks for non-linear modelling but is the most popular, hence our focus on it.
MLP’s are very popular because they can model any non-linear pattern. Everything sounds great if we compare an MLP network to the single layer networks we discussed earlier. However, the learning of the MLP network is quite a bit more complex do to the distorted relationship between hidden layers and the output error. Also discussed previously, neural network learn [supervised learning] by adjusting the weights applied to inputs to neurons. The weight adjust must be connected to the output error, the only way to back propagate is through differentiation.
At this point it is worth noting that the activation function for MLP neurons must be continuous so as to enable backward chaining differentiation.
See the notation for this backward chaining example
Now we need to find the error gradient with respect to b(output neuron weights) and a(hidden neuron weights). After conduction the first round of differentiation:
Now for the hidden layer:
In the tutorial following we completed this process in excel to see the learning process.
I will be uploading a copy of this once I confirm it is correct.
The fourth lecture for network security continued down the path of encryption. We completed week three’s material discussing the reasoning behind and applications for encryption. The focus of week 4’s material was the public key encryption framework, specially the RSA system. Although we had little time to discuss, also on the agenda were hash functions and Digital signatures.
The point of the public key system is that two people can have no shared secret key, yet still communicate securely.
Shared information: Public key[encrypt] (which may be certified by a certificate authority) + encryption algorithm
Private information: Private key [decrypt]
In this way party A can send an encrypted message to party B using party B’s public certificate.
The mathematical process for generating RSA public and private key pairs is amazing simple and effective:
1) Generate 2 prime numbers as randomly as possible; p and q
2) n = pq
3) as p and q are prime numbers -> φ(n) = (p – 1)(q – 1) [φ – numberof coprimes]
4) Generate 2 numbers, e and d where:
-> (ed) mod φ(n) = 1
e is easy to generate as it must be coprime with φ(n).
then d = e–1 mod φ(n)
With this in mind it can be seen that d [private key] can be found using (e,n) [public key]. However as the key size increases, the computing power [workload] required to break the encryption increases dramatically.
“RSA claims that 1024-bit keys are likely to become crackable some time between 2006 and 2010 and that 2048-bit keys are sufficient until 2030.”
In any case, now that we have our public and private key, how do we use them??
Lets say Alice wants to send a message ‘m’ to Bob.
Alice gets Bob’s public key – (e,n)
Alice encrypts here message before sending it: m^e(mod n) -> encrypted message [c]
Bob decrypts using his private key(d,n): c^d(mod n) -> message 😀
Some constraints to note:
The message needs to be an integer between 1 and n… not difficult considering ascii
Long messages need to be encrypted in block.
The lecture notes say that the good points of public key systems are that key distribution is not a problem.
This fact does not however protect Alice from being tricked by a MITM attack when she is retrieving Bob’s public key. Lets hope that Alice checks the validity of Bob’s public key and that the CA and software vendors ensure that vulnerabilities such as certificate chaining are avoided.