Categories
Intelligent systems

FIT5047 – Intelligent Systems Week 11

Week 11 moved into recommender systems, perhaps one of the most popular and commonly used forms of AI. Sites such as Google and Amazon built their success on the effectiveness of their recommender systems (now I guess their brands can carry them for a while). The first topic of the lecture was association mining, given a large dataset, how do we find useful associations between attributes.

Support and confidence were proposed as useful metrics to drive this process. Unfortunately we found some conflicting definitions among Data Mining, Weka and R&N texts. When in doubt check wikipedia..:

The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset.

supp(Z) = P(Z)

The confidence of a rule is defined – confidence

The lift of a rule is defined as – lift

The leverage of a rule is defined as –

leverage(X -> Y) = P(X and Y) – (P(X)P(Y))

The source listed in wikipedia for these definitions is:  http://michael.hahsler.net/research/association_rules/measures.html

In our lecture notes we had support for rule a -> b as the union of A and B, this confused me as I still think that support is the intersection of A and B.

The rules described above are quite intuitive when work through in an example. Lift feel like and extension of confidence taking into account independence. An increase in lift implies dependence.

No leverage implies independence between attributes and vice versa.

This topic was closed with the conclusion that it is in fact bad practice as variance and standard deviation are completely ignored.

A quick review of collaborative and content based filtering were covered next. Content Based Filtering [CBF] (haha) can be implemented using an array of machine learning  techniques already covered. Naive Bayes, Neural Networks and Decision trees are classification methods that can be applied to CBF. The pre-processing involved with CBF seems to be the most limiting factor. Term frequency and Inverse Document term frequency can be compile into tables allowing for effective searching. Considering the vast size of the data sets that these systems would be applied to, this can seem a bit daunting.

Collaborative filtering [CF] seems a bit easier to implement but does then rely on user participation. The introduction in the lecture felt very similar to the basics of Self Organising Maps. Vectors are created to represent instances (in  this case users). Euclidean distances (or some spin-off of this) are used to measure instance ‘likeness’ then missing values for instance vectors can be predicted based on the instances that are considered ‘like’. There was quite a bit of mathematical methodology described on the lecture slides which would be required when implementing a CF system.

Collaborative Filtering v Content Based Filtering
Collaborative Filtering v Content Based Filtering (soure: Week 11 lecture notes)
leverage(X -> Y) = P(X and Y) – (P(X)P(Y))
Categories
Intelligent systems

FIT5047 – Intelligent Systems Week 10

Week 10 moved on from classification to clustering. Although, conceptually, there was close relation to topics covered in Natural Computation the methods discussed were new. Again, Euclidean distance is a fundamental measure of similarity/uniquness.

The first method introduced was Heirarchical Clustering. This introduction was very bried and reference to the text would need to be made for issues such as linkages.

The next method was K-Means clustering.

 

kmeans
As cabn be seen with K = 3 we can move the center but the number of clusters is static

 

 

I find the limitation of assuming the number of clusters [K] to go close to invalidating this methodology in its basic form. Of course, however the algorithm can be extended to an exhaustive or stoichastic search were multiple K values are compared and contrasted. The idea of clustering is to simplify data sets, in essence reducing dimensianality. With this in mind there must be a penalty for extended K-means algorithms for the number of clusters. Otherwise the best clustering would always result in K = number of unique instances. MML, MDL and BIC are examples of algoriths that incorporate these penalities. Interestingly, I came across MDL when looking for effective method for discretizing continuous variables. It now seems obvious that discretization is a form of clustering where there need to be penalties for an increasing number of clusters. For more info on using MDL to discretize continuos variables see:

Fayyad, U., Irani, K., 1993, Multi-interval discretization of continuous valued attributes for
classification learning, Thirteenth International Joint Conference on Articial Intelligence, 1022-
1027

Interstingly Usama Fayyad is now Chief Data Officer and Executive Vice President, Yahoo! Inc… for next time anyone says research in this field is pointless for a career.

The lecture continued to introduce issues and algorithms which require a great deal of reading and writing to do justice (which I am yet to complete).

Chief Data Officer and Executive Vice President, Yahoo! Inc.
Categories
Natural computation for intell. sys.

FIT5167 – Natural Computation Week 10

Time series forecasting was the topic of week 10’s lecture. To complete time series forecasting we first need to remove anything that is easy to forecast from the data:

  • Trends
  • Cycles (Cyclical Components)
  • Seasonal variations
  • ‘Irregular’ the hardest to predict and the component which our neural networks will be attempting to forecast.

Autocorrelation generally stronger for recent data items and degrades in quality as we step back through the time series. Autocorrelation uses past data items in an attempt to predict n time steps into the future. It must be noted that error incurred in lesser timesteps forward are more than likely to grow as the prediction continues to step forward. Although this point seems obvious when viewing data predictions that seem intuativley correct our own confirmation bias often outweighs the awareness of a models limitations.

 

autocorrellationUsingMLP
Autocorrelation using MLP

Spatio-Temporal models incorporate a principal component. This is a variable/s who’s influence on future timesteps is significant. An example in our water use prediction would be the rainfall of previous months. Low rainfall would suggest higher water usage. Their are many methods for identifying Principal components, Karl Pearson invented this field with the introduction of his Pearson Product-moment analysis.

Forecasting linear time series can be conducted using a Single layer perceptron. It may however be questionable as to how much this tool would be superior as opposed to more simplistic modelling methods. Auto-regressive with external variables [Arx] models utilize both previous time series data and principal component states for generating forecasts.

Evaluating model accuracy can be done in rudamentary fashion using root mean square error [RMSE].

Moving past the simplisting Single layer networks we review time lagged feed forward networks:

 

timeLaggedFeedFor
Time Lagged feed forward networks (Non-Linear)

We then moved to Non-Linear Auto-regressive with external variable [NArx] networks:

 

Narx1
A recurrent NArx network

The same training principles as with standard NNs applies to time series forecasting. Importantly the training data must be viewed in chronological order as forcasting would suggest in contrast to classification.

 

nArxtraining
Minimize RMSE on validation, not training data!

Again, awareness must be given to over/under fitting. Minimizing RMSE on training data does not infer an accurate model for all/future data.

 

 

 

Categories
Network security

FIT5044 – Network Security Week 10

We moved up to the application layer and looked at Web/Email security and malicious programs in week 10. Some of the possible (or impossible) improvements to email systems were discussed. I feel that some of the options we looked at would be impossible given POP3 style communications. The predominance of IMAP and web based email clients could allow for extensions of email systems in the future. The same key issues of data communications were cited as areas where email needs to be improved:

  • Privacy/Confidentiality
  • Authentication
  • Non-repudiation
  • Proof of submission

Some suggested areas of reading on these topics for email: Secure/Multipurpose email extension, Privacy Enhance email.

Integrity and DoS where issues raised when discussing web server security. Additionally the concerns for web server clients. The increasing emergence of cross site scripting XSS indeed puts many users at risk. The automation of modern web browsers, high usage of cookies and large amount of confidential information stored within most people’s browsers server to further increase the threat of XSS.

Given such a large array of risks one could ask how can we protect systems.

malProgramProtection
A simple illustration of the architecture and components of a protected system

In addition to firewalls, anti-virus programs should be utilized. Due to the large amount of computing resources required dedicated scanning machines for system are becoming more common.

Malicious programs come in a wide variety of forms and functions, they can be a field of research all on their own and definitely require more than a week to understand.

Categories
Intelligent systems

FIT5047 – Intelligent Systems Week 9

Supervised learning was covered in week 9. Happily I found that there was some overlap found here with FIT5167 Natural Computation. The introduction defined machine learning as a parent of data mining. Supervised and unsupervised learning were skimmed along with definitions of what learning actually is (ie: pattern classification/time series forecasting/clustering). The concept of splitting data for training and testing was also brought up.

Getting into the points that were different from Natural Computation (neural networks). Decision trees! Decision trees are a fairly simple concept. Constructing them can involve some calculation though. For example, give a set of 8 attributes and one classifying attribute. When building a decision tree, which attributes do we split the tree on first? The answer to this question can be found in Shannon’s information theory. Calculating information gain, when explained on the lecture slides displayed some mathematics that was not really intuitive for me. In the context of decision trees, determining the information gain can be calculated as the initial entropy (‘unsortedness’ of values) minus the entropy after the split of data on that attribute. The split that subtracted the lowest amount (having the lowest entropy) there for left the largest ‘information gain’. Simply is means that if we have back with 20 hats, 10 white, 10 black, entropy would equal 1.  If a split of the bag on, say brand, left us with 2 bags, 1 with 10 black hats and 1 with 10 white hats, entropy would then be equal to 0. Everything in between can be calculated using the following:

Entropy
The mathematical notation for calculating entropy

I found and slightly modified a Python implementation (http://mchost/sourcecode/fit5047/shannonsEntropyCalc.py):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## initial source: http://code.activestate.com/recipes/577476/ }}}
##
from sets import Set

tmp = raw_input("Enter a set:")
st = []
st.extend(tmp) # input string

print 'Input string:'
print st
print
stList = list(st)
alphabet = list(Set(stList)) # list of symbols in the string
print 'Alphabet of symbols in the string:'
print alphabet
print
# calculate the frequency of each symbol in the string
freqList = []
for symbol in alphabet:
ctr = 0
for sym in stList:
if sym == symbol:
ctr += 1
freqList.append(float(ctr) / len(stList))
print 'Frequencies of alphabet symbols:'
print freqList
print
# Shannon entropy
ent = 0.0
for freq in freqList:
ent = ent + freq * math.log(freq, 2)
ent = -ent
print 'Shannon entropy:'
print ent
print 'Minimum number of bits required to encode each symbol:'
print int(math.ceil(ent))
Categories
Adv. programming for DB apps.

FIT5059 – Adv. Prog. for DB Applications Week 9

Menu’s were the topic of week 9.

  • Pop up
  • Pull down
  • Tabbed

pulldown
pull down menus are canvas independent
pop up menus must have programmed triggers
Categories
Natural computation for intell. sys.

FIT5167 – Natural Computation Week 9

Natural computation’s 9th week saw an introduction to associative memory networks.

 

Two major types of associative networks

Initialization of an Bi-Directional Associative memory network involves establishing a weight matrix using input and output pairs:

BAM initialization

It seems much easier to write a simple script which demonstrates understanding of the weight initialization and memory recall algorithms. Hopefully I can do that this week. The major question that comes to mind after this lecture was why these networks would be used instead of a SOM.

Categories
Network security

FIT5044 – Network Security Week 9

Week 9 continued from IPsec into security at the transport layer, specifically SSL. Unsurprisingly, given that SSL uses public key cryptology is used at least in the initial stages of all SSL connection, distribution and authentication of public keys was the first issue raised. The use of certificate authorities providing signed keys is the current solution. Similarly to IPsec, authentication, integrity and confidentiality are the goals of SSL.

With such a wide number of computers using SSL, there needs to be provisioning for different cypher suits which is included in the SSL handshake:

 

SSL
SSL handshake initiated the transport layer security

 

There was also some discussion over the definition of sockets, my interpretation is that they are basically application layer ports. A better explination can be found here: http://pro-programmers.blogspot.com/2009/02/socket-vs-port.html

Work also began on the second assignment, development of firewalls using iptables.

Categories
Intelligent systems

FIT5047 – Intelligent Systems Week 8

Intelligent decision support was the topic of week 8’s lecture. This is a topic that has been built on the ‘Fundamental Preference Assumption’ (given choices A, B either A > B, B> A or A ~ B). This topic is closely related to our previous lectures which centered around reasoning with uncertainty.

Rational preferences are a prerequisite for intelligent decisions. Characteristics of rational preferences are:

  • Orderability
  • Transitivity
  • Continuity
  • Substitutability
  • Monotonicity

Mapping of preferences that may not have a readily comparable outcome is achieved through utility values. We did not cover any material on the development of utility values. I believe that to increase the success rate of intelligent decision systems, collaboration between end users and implementors must be conducted. A perfect systems with poorly represented utility values will fail.

Principle of Maximum Expected Utility (MEU) – An agent is rational iff it makes decisions that reflect MEU (I would argue rather that ‘An agent can’t be rational if it does not make decision based on MEU). Rationality should encompass consideration of the source of utility values.

Using Bayesian networks with decision and utility nodes, Dynamic Utility Networks can be developed. Depending on the information available, the maximum expected utility of decision can be calculated. This is a key concept for rational planning in uncertain environments. The value of new information can also be calculated using Shannon’s utility gain, a topic to be discussed next lecture.

ExpectedUtility
Expected utility in uncertain environments is linked with Bayes theorm
Categories
Adv. programming for DB apps.

FIT5059 – Adv. Prog. for DB Applications Week 8

Tabbed forms

As it seems this course is in essence a revision of the Oracle forms builder manual, theoretical revision covered in the blog is somewhat pointless. Most of my revision time for this subject will now be dedicated to the subject’s assignment which is is an all-inclusive one that requires students to implement everything that has been taught.

Week 8 introduced tabbed canvases: