Automata: AI

martes, 4 de febrero de 2014

Python, PyBrain, Cython and CyBrain

The best thing about Python is that its a diverse language where some use it to create commercial web apps in Django, some to teach programming, and there are those in the scientific community who seek an open source replacement for Matlab.

When I came in touch with Python I belong to this last group; my main motivation to use this language was its growing popularity in Artificial Intelligence or Machine Learning. In this terrain packages like Numpy, SciPy and the SciKits collection have been doing a great job giving consistent tools to the scientific community.

In relation to Data Mining and Machine Learning amongst Python offers is Scikit-Learn, an extremely well documented module, with some youtube tutorials and a very smart team behind it. When I first started to wonder the machine learning world, my main drive soon became neural networks, and not wanting to depend on Matlab I gave my hopes to Scikit-learn. But things didn't go so fluidly. It seems that they initially had plans to support neural nets and at some point began to implement them, but soon decided not to and threw the ball to PyBrain; a package specialised on NN and AI.

After searching on many forums, and trying many modules, I finally settled with PyBrain. Their "modular" philosophy is great: you build neural networks as if they were Legos by you creating layers, make connections amongst them in any (consistent) order you want, adding them to a network, and finally training it. On the long run, a package like this is needed to do modern machine learning because of its capability to create deep networks with a custom architecture.

As pretty as PyBrain may be, it has a huge achilles heel: PYTHON IS SLOW! PyBrain is written in pure Python and you will hit a dead end if you need wings. The truth is that large scale neural nets are one of those real world examples where you can have a function with millions of parameters that will require optimisation by running through thousands training cases. But you don't need to go to that extreme to see the wall, just create a network with about 2 or 3 hidden layers, each with about 5 neurons, and you will feel the pain of waiting lots of seconds for the console to pop-up the answer. Becuase of this limitation, PyBrain is at best and educational -maybe not even scientific- package, since using PyBrain in a real world scenario would be unreasonable.

PyBrain's philosophy is great, but PyBrain itself may not suite my purposes. That is why I decided to start CyBrain, a neural networks module inspired by PyBrain, written in Cython. Now for those who don't know what Cython is I will just say its an inch close to being the "perfect language". Formally Cython is superset of the Python language that compiles pythonic code to optimised C. By superset it means that (except for generators) every python statement a valid cython statement, however, not all cython statements are valid python statements. The real deal is that Cython give you the opportunity to write pythonistic code and pseudo-C code in any mixed way you want, specifically Cython lets you write C TYPES!

When you write cython code you feel you are connecting two foreign realms and at first it is thrilling and confusing. The first thing you automatically do is test the speed; its like driving Ferrari, even if you don't like cars you are bound to hit the accelerator. Cython is fast, C fast. The first bit is a little rough, since cython is a compiled language you have to arrange all the parts in a setup. The documentation helps in this first stage but I really took of after this 4 part youtube tutorial from a guy at Enthought. Pass that initial trial, relax and watch you python code run from 1.4x to 7x faster; then add some type and feed the wind of 100x+ speed!!!

Back to CyBrain, I just finished the basic parts for what you could call a minimum viable product. I don't a lot about Pybrain's internals, while I did download the project since it's open source, looking at unknown code become boring after a few minutes unless you want to fix it. My main design inspiration came from a section in the docs where they taught you to create you own custom neurons by subclassing the Neuron class and overriding some functions; these functions where the ones that gave me the hints.

In Cython you can do some nasty tricks like use pointers... pointers!!! This is heaven and hell at the same time. You have to malloc them (ahh!!!) but then you can insert them in C++ vectors -which Cython supports- and for free have variables like floats act like modern objects. This might not seem directly useful, state sharing is very efficient for some applications: weight sharing technique is really easy with this.

Any way, this was a long digression. I haven't compared speed yet, but results seem promising. If you want to fork the code, go ahead, here is the github link to CyBrain. Feedback is welcomed.

domingo, 22 de diciembre de 2013

Self-Modifying Code in Python

Self-modifying code seems a scary sci-fi idea where computers write their own programs and break loose of human control. In reality, there is nothing particular about self-modifying code apart that it is hard to generate in most programming languages. Think about it, you can modify files on your computer, but isn't the executable program just another file?

I am not an expert in computer architecture, but my knowledge about computation tells me that all the computer does is read and write data on some memory storage unit; when it boils down to zeros and ones, writing on top of the machine instruction set is just like writing anywhere else, just that the consequences are different. In low-level languages like Assembly to modify the code of instructions is easy since the language itself contains operations for this purpose. (Read more: Assembly)

Anyway, getting to the AI dream (or nightmare) of truly intelligent machines that upgrade their own code and become more intelligent does not seem in sight. On one of the Ulam Lectures at Santa Fe Institute (Lecture Video), the researcher talked about its incredible work with a bug-solver program that modified a the source code of another program using a genetic algorithm to try to pass all the test cases that measured the 'correctness' of the program. To get good results the program actually modified the semantic tree of the code, not symbol by symbol, and had operations like copy, move, and delete branches, thus it didn't actually create "new" code, but modified existing code the best it could. It turns out that about 70% of the bugs in most bug databases could be solved, the small print of this is that most only required changing about 5 lines of code, but that is a good thing for us.

Intrigued by this paradigm and my new love for Python, I wrote Python class named "Code" to test pythons dynamic abilities. The key ingredient is the function exec() which takes as a input argument a string and executes it in the python interpreter. The basic trick is show below:

var = 'x'
equals = '=' 
val = '2'
exec( var + equal + val )

This creates a variable named x with a value of 2.

Here is an example of what the class can do; at the end you can find the code for the class itself. In this example, a conditional checks if the variable x == 1, if the condition is met it proceeds to assign x to a value of 2, and after that it erases the whole conditional block.

This first part creates the code-objet and adds statements to it:

#Init Variables
x = 1
#Create Code-object
code = Code()
code + 'global x, code' #Adds a new Code instance code[0] with this line of code => internally             code.subcode[0]
code + "if x == 1:"     #Adds a new Code instance code[1] with this line of code => internally code.subcode[1]
code[1] + "x = 2"       #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[0]
code[1] + "del code[1]" #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[1]

This next part prints the python code and show the the value of x. It just shows you the structure, it doest execute it.

#Prints
print "Initial Code:"
print code
print "x = " + str(x)

Output 1:

Initial Code:

global x, code
if x == 1:
    x = 2
    del code[1]

x = 1

As you can see, it contains the conditional if block that assigns a value to x. The last line 'x = 1' is not a statement, as you can se in the actual code, its just a print that shows you the value of x.

The next segment actually execute the code pretty easily and then prints it.

print "Code after execution:"
code() #Executes code
print code
print "x = " + str(x)

Output 2:

Code after execution:

global x, code

x = 2

As you can see, the code changed the variable x to the value 2, but most importantly it deleted the whole conditional block! A use of this could be to avoid checking for conditions once they are met. A better use would be to create a meta-program that writes the actual program or have autonomous coroutines that modify themselves gradually to give a better performance depending the history of use.

class Code:

    def __init__(self,line = '',indent = -1):

        if indent < -1:
            raise NameError('Invalid {} indent'.format(indent))

        self.strindent = ''
        for i in xrange(indent):
            self.strindent = '    ' + self.strindent

        self.strsubindent = '    ' + self.strindent

        self.line = line
        self.subcode = []
        self.indent = indent


    def __add__(self,other):

        if other.__class__ is str:
            other_code = Code(other,self.indent+1)
            self.subcode.append(other_code)
            return self

        elif other.__class__ is Code:
            self.subcode.append(other)
            return self

    def __sub__(self,other):

        if other.__class__ is str:
            for code in self.subcode:
                if code.line == other:
                    self.subcode.remove(code)
                    return self


        elif other.__class__ is Code:
            self.subcode.remove(other)


    def __repr__(self):
        rep = self.strindent + self.line + '\n'
        for code in self.subcode: rep += code.__repr__()
        return rep

    def __call__(self):
        print 'executing code'
        exec(self.__repr__())
        return self.__repr__()


    def __getitem__(self,key):
        if key.__class__ is str:
                for code in self.subcode:
                    if code.line is key:
                        return code
        elif key.__class__ is int:
            return self.subcode[key]

    def __delitem__(self,key):
        if key.__class__ is str:
            for i in range(len(self.subcode)):
                code = self.subcode[i]
                if code.line is key:
                    del self.subcode[i]
        elif key.__class__ is int:
            del self.subcode[key]

Automata

martes, 4 de febrero de 2014

Python, PyBrain, Cython and CyBrain

domingo, 22 de diciembre de 2013

Self-Modifying Code in Python

Datos personales

Archivo del blog