Tuesday, June 17, 2008

Programming Python, Part II

Programming Python, Part II
July 1st, 2007 by José P. E. "Pupeno" Fernandez in

* HOWTOs

Having covered some advanced features in Part I, it's time to include some basics.

The tutorial in last month's issue covered the basics of installing Python, running it and using it. Then, we moved on to building a basic blog in Python. The blog was extremely simple—only a Python list. We focused on the posts and built a Post class:

class Post(object):
def __init__(self, title, body):
self.set_title(title)
self.set_body(body)

def set_title(self, title):
self._title = title

def get_title(self):
return self._title

def set_body(self, body):
self._body = body

def get_body(self):
return self._body

def __repr__(self):
return "Blog Post: %s" % self.get_title()

In this follow-up article, let's focus on the blog itself and go further.
The Blog

Now that we have the Post class, we can make the Blog class. An initial implementation may look like this:

class Blog(object):
def __init__(self):
self._posts = []

def add_post(self, post):
self._posts.append(post)

def get_posts(self):
return self._posts

We are using a list to maintain the posts, but the interface is totally abstract behind a set of methods in the Blog class. This has a huge advantage: tomorrow we could replace that simple list with an SQL back end, and the code that uses Blog will need few, if any, changes.

Notice that there's no way to delete a post. We could tamper with _posts directly, but as long as we do what the class was meant to do, we can't delete a post. That may be good or bad, but the important thing is that by defining a set of methods, we exposed the design of how the class should be used.
To Publish or Not to Publish

The method get_posts returns all the posts. When we are writing a new post, we don't want the whole world to be able to read it until it is finished. The posts need a new member that tell whether it is published. In Post's initalizator, __init__, we add the line:

self._published = False

That makes every new post private by default. To switch states, we add the methods:

def publish(self):
self._published = True

def hide(self):
self._published = False

def is_public(self):
return self._published

In these methods, I introduced a new kind of variable—the boolean. Booleans are simple; they can be true or false. Let's play with that a bit:

/>>> cool = blog.Post("Cool", "Python is cool")
/>>> cool.is_public()
False
/>>> cool.publish()
/>>> cool.is_public()
True
/>>> cool.hide()
/>>> cool.is_public()
False
/>>>

If, when you run is_public, you get:


Traceback (most recent call last):
File "", line 1, in ?
File "blog.py", line 25, in is_public
return self._published
AttributeError: 'Post' object has no attribute
'_published'

That's because _published was not created, it can't be used, and is_public wants to use it. Understanding errors in your tools is important if you want to be a successful programmer.

In this short set of messages, the last line is the error itself. There are various types of errors, and this one is an AttributeError. A lot of important information is given in the traceback. A traceback is a list of “who called whom”, providing an idea of what was being executed when the error occurred.

The first line of the traceback doesn't give much information. It probably relates to the line we typed at the REPL. The second line tells that the error was in the file blog.py, on line 25, on the method is_public. Now we have the line that raised the problem.

This traceback is simple. In a real application, you would have methods that call methods that call methods and so on. In those cases, it is not uncommon to see tracebacks of 25 lines or more. I've seen tracebacks of more than 150 lines, but those were extreme cases in extreme conditions.

The next step is a modification to the Blog class to pick up only published posts. So, we add a new method:

def get_public_posts(self):
published_posts = []
for post in self._posts:
if port.is_public():
published_posts.append(post)

Python tries to be as readable as possible, but that method introduces too many new things, so it requires some careful explanations.
Loops

One of the Python's looping constructs is for. It is designed to iterate over lists, sets, maps and other iterable objects. In this case, it takes all the items in self._posts and, one by one, assigns them to the variable post. In the body of the for, which is executed on each iteration, we can use the variable post.

The body of the for, as with other constructs that need a piece of code, is delimited by nothing more than the indentation. Here's an example:

/>>> the_list = [1,2,3,"a","b"]
/>>> for item in the_list:
... print item
...
1
2
3
a
b
/>>>

Various tasks are solved with a loop. One such task is doing something for each member of a collection, like we did in the previous example. For those types of tasks, the for construct is excellent.

Another common practice is to perform an action a given number of times—for example, printing “Hello, world” three times. To do that we can use:


/>>> a = 0
/>>> while a < 3:
... print "Hello world"
... a = a + 1
...
Hello world
Hello world
Hello world
/>>>

Another loop construct is while, and it will continue to run its body until the check—that is, the expression after while and before the colon—becomes false.

We can rethink the previous loop as iterating over a list containing the numbers 0–9. There's a way to do it with a for construct:

/>>> for a in range(0,3):
... print "Hello world"
...
Hello world
Hello world
Hello world
/>>>>

This is shorter and arguably more readable. What is while useful for then? It is useful any time you don't really know when you are going to stop the loop. Here are some examples:

*

Reading characters from a file until you encounter the End of File (EOF).
*

Reading commands from a user until the user enters the quit command.
*

Reading temperatures from a sensor until the temperature is too high.
*

Reading events from a user interface until the user presses the X button at the top of the window to close the program.

There's a pattern forming here—doing something until something else happens. That's what while is good for.

Some time ago, when we didn't have as many choices in programming languages and we ended up using C most of the time, the while construct tended to be much more useful than the for construct. But today, with a powerful for construct, nice functions such as range and the possibility of putting an iterator around anything, for is being used much more than while.

Here's one last example for your enjoyment:

/>>> for l in "Hello World":
... print l + " ",
...
H e l l o W o r l d

Conditionals

In the fourth line of some previous sample code, if post.is_public(), we have another new construct—an if. This allows programs to make choices based on data. It needs a boolean value and a piece of code. The code is run only if the boolean is True. If you provide something that is not a boolean, Python does its best to interpret it as a boolean. For example, the number 0 is interpreted as False, but all the other numbers as True. Here are some examples:

/>>> if True:
... print "It is true!"
...
It is true!
/>>> if False:
... print "Is it false?"
...
/>>>

We can perform many different types of comparisons on different kinds of objects. Note that the equality operator is ==, not = (that is, two equal signs):

/>>> a = 10
/>>> if a == 10:
... print "Ten!"
...
Ten!

There are other comparisons, such as greater than (>), less than (<) and different (!=). You can experiment with comparisons directly on the REPL:

/>>> 3 == 4
False
/>>> 10 != 5
True
/>>> 4 >= 1
True

It is common to run a piece of code if something is true and another piece of code if it is false. For example, we could do the following:

if a == 10:
print "A is ten."
if a != 10:
print "A is not ten."

This has a big problem. If we change a to b in the first case, we have to remember to change it in the second. And, the same should be done for any other little changes we do. The solution is an extension to the if construct:

if a == 10:
print "A is ten."
else:
print "A is not ten."

The piece of code after the else will be executed if the first piece wasn't executed.

Another common situation is having various conditionals for different cases. In that case, we use a string of ifs:

if a == 10:
print "A is ten."
elif a == 0:
print "A is zero."
elif a != 30:
print "A is not thirty."
else:
print "Who cares about a ?"

elif is the contraction of “else if”, and indeed, the previous code could be written as:

if a == 10:
print "A is ten."
else:
if a == 0:
print "A is zero."
else:
if a != 30:
print "A is not thirty."
else:
print "Who cares about a ?"

But, that is ugly and prone to errors. If you have 10 or 15 different cases, you'll need a 29"-widescreen monitor just to view it. (Not that I have anything against such a monitor. I'd like to have one.)

If you come from other languages that have a switch or select or case construct and are wondering where they are in Python, I'm sorry to disappoint you. Python doesn't have such constructs. There's a proposal to include them, but it hasn't been implemented yet. Right now, the solution is to use a chain of ifs, elifs and elses. After you use this a few times, it's not so bad.

Now that you know about else, here's an interesting tidbit: for and while also can have elses. What do they do? Run Python, and try it out until you discover for yourself. While programming, you'll need to run a lot of code to find out how many undocumented, obscure, almost black-magic, things work, so starting with something simple will help you get some training.
Inheritance

The short introduction to object-oriented programming (OOP) in Part I of this article left out a big topic—inheritance. This feature is what makes OOP really useful, and as OOP tries to mimic real life, I explain inheritance here with real-life examples.

Think about a chair. A chair is made out of some kind of material, has two armrests, a back, a color, a style and maybe even a warranty. Now, think about a table. It is made out of some kind of material, might have some drawers, a color, a style and maybe a warranty. They have a lot in common! If we were to make the two classes, Chair and Table, a lot of code would be repeated. In programming, when you write the same line of code twice, you probably are doing something wrong—inheritance to the rescue.

A chair is a piece of furniture. So is a table. Such similarities can be in the Furniture class. Let's make the Furniture class have a default material and the ability to set other materials:

class Furniture(object):
def __init__(self):
self._material = "wood"

def set_material(self, material):
self._material = material

And now, a Chair class inheriting Furniture:

class Chair(Furniture):
def __init__(self):
self._backrest_height = 30

def set_backrest_height(self, height):
self._backrest_height = height

Now, you know what goes inside parentheses in the class header: the name of the class being inherited, which also is known as a super class or parent class. Let's play a bit with this, so you can see what happens:

/>>> c = Chair()
/>>> c.set_backrest_height(50)
/>>> c._backrest_height
50
/>>> c.set_material("plastic")
/>>> c._material
'plastic'
/>>>

As you can see, the methods of Furniture also are on Chair. I leave the definition of the Table class as an exercise for the reader. But first, here's another interaction:


/>>> d = Chair()
/>>> d._backrest_height
30
/>>> d._material
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'Chair' object has no attribute '_material'
/>>>

I bet that is not what you expected. Let's take a closer look at what happened. We created a Chair, the method Chair.__init__ was run setting _backrest_height. Oh! Nobody called Furniture.__init__, which never set _material. There are two solutions to that.

Setting _material in Chair.__init__ is not a solution. If we do that, the classes would be coupled, meaning the implementation of one will depend on the implementation of the other. If we change the name of _material to _materials, suddenly Chair will stop working. If you have hundreds of classes developed by hundreds of different people, keeping track of those changes is difficult. Also, Furniture will grow to have more members, so we have to remember to set all those members to the same defaults in Chair.__init__. I'm getting a headache just thinking about it.

One real solution is calling Furniture.__init__ and rewriting Chair.__init__ this way:

def __init__(self):
Furniture.__init__(self)
self._backrest_height = 30

We had to pass self to __init__, because if we called it with the class instead of the object, it wouldn't know in which object to do its operations.

I personally don't like that solution, because it implies writing the name of the class in two or more places. If you ever change the name, you'll have to remember to run a search and replace. Another solution is more cryptic than it should be, but it doesn't have the problem I just mentioned:

def __init__(self):
super(Chair, self).__init__()
self._backrest_height = 30

In this solution, I call super, passing the current class and the current object, and it allows me to make a call to the parent class using the current object. Here we may have a problem if we change the name of the class itself, but running a search and replace on the file is a good idea when making that kind of change. You'd want to change the documentation as well. The real problem with this solution is hard to understand and to explain—it has to do with multiple inheritance. For more information, read “Python's Super Considered Harmful”. Personally, I've been using this second solution without any problems.

You'll see that all classes I defined inherit from object. That is the most basic class—the root (or top) class. It is a good idea to make all your classes inherit from it unless they inherit from another class. If you don't do that, your class will be an old-style class, and some things won't work, such as super. It is important to know this, because you may encounter old-style classes anywhere, and you should be prepared.
Python 2.5

During the process of writing this article, with much excitement and fanfare, Python 2.5 was released. It is the most important release in almost two years, and it comes with many promises.

It promises to be more reliable due to improvements in the testing procedures used by the Python development team. It now has Buildbot, a program that continuously builds and tests Python, and whenever there's something wrong, it raises an alarm for all the world to see. The shame of being the developer who made the error will make all the developers more careful—at least, that's what happened to me when I had a Buildbot watching my code.

For some, like this author who had a new release at the worst possible time, the most important thing is that Python 2.5 is backward-compatible. All that you've learned here will work. And, not only will it work, it is still the way to do it.

The new release also promises to be faster and has many new advanced features, including new modules and packages. The future is bright for Python and Python coders.
What Now?

This was nothing but a short introduction to Python; there's still much to learn. A good place to start is the official Python Tutorial. You also can read Dive Into Python, a book that you can buy or read for free on the Web. And, of course, a lot of other books and tutorials are available. I learned Python mainly from the Python Tutorial, which is very good.

Whenever you are creating a program in Python, never, and I repeat, never, do anything without checking whether it has been done before. Python has a lot of features and a lot of built-in libraries. And, if that isn't enough, there are hundreds, maybe thousands of third-party Python libraries. In fact, the huge amount of code that's already written in Python is one of the reasons to use it.

The first stop is Python's Documentation. There we have the previously mentioned tutorial, the library reference and the language reference.

The language reference can be a bit hard to use and understand. Programming languages tend to be difficult to understand and so are their references, which often have exclusive jargon, such as lexical analysis, tokens, identifiers, keywords or delimiters. This piece of documentation can be particularly useful in showing how to use language constructs, such as for, if, while and more complex ones that I haven't mentioned, such as yield, break or continue.

The library references let us know about all the classes, methods and functions that Python already provides. It is so important and useful that I always have it open when I am programming on Python. In the second chapter, you can read about the built-in functions and classes. Getting familiar with them is always useful. The rest of the documentation is very specific, and each chapter deals with subjects ranging from runtime internals to string, from the Python debugger to some generic operating systems services. In that chapter, a very important module is documented: os. I can't remember making a single program that didn't use that module.

Finding what you want in so much documentation can be a difficult task. A trick that I find very useful is to use Google to search in a specific site. That is achieved by adding “site:python.org” or “site:docs.python.org” to the search query. The first one is more generic and sometimes leads to countless mailing-list posts that have nothing to do with what you are looking for. In that situation, use the second one. To give it a try, search for “print site:python.org” or “options site:python.org”.

What if all of your searches return nothing? Then, you need to do a broader search to find some third-party libraries or frameworks. If you want to make a graphical user interface, I recommend both PyGTK and PyQt, both are very good and include support for their respective desktops, GNOME and KDE. I've heard good opinions of wxPython, but I've not used it myself.

If you want to build a Web application, I see two paths. If you want something not so spectacular but that gets you there fast, I recommend Django. Django is very similar to Ruby on Rails. It's a framework in which you use the model-view-controller paradigm and a relational database such as MySQL or PostgreSQL; both are well supported on Python.

The other way to build Web sites (that I know of) is Zope. Zope is a big framework with a Web server and object-oriented database. The database is different from other relational databases, and it is very powerful. It allows you to store information in a much more flexible way. Zope 3—I don't recommend the previous versions unless you have to use the award-winning content management system Plone—is prepared to help you build reliable and robust code by means of interfaces, unit testing, adapters and much more.

If you need to build any kind of dæmon—those little applications running in the background making the earth turn—take a look at Twisted Matrix. Twisted Matrix is an event-based framework that solves a lot of the common problems of building dæmons, including separation of protocol and logic. It comes with many protocols already built in, and it allows you to create new protocols. A proof of its usefulness is that Zope, after years of shipping its own Web sever, has migrated to using the Twisted Matrix HTTP server.

Resources

Python Tutorial: docs.python.org/tut/tut.html

Dive Into Python: www.diveintopython.org

Python Documentation: www.python.org/doc

PyGTK: www.pygtk.org

PyQt: www.riverbankcomputing.co.uk/pyqt

Django: www.djangoproject.com

Zope: zope.org

Python's Super Considered Harmful: fuhm.net/super-harmful

José P. E. “Pupeno” Fernández has been programming since...at what age is a child capable of siting in a chair and reaching a keyboard? He has experimented with more languages than can be listed on this page. His Web site is at pupeno.com, and he always can be reached, unless you are a spammer, at pupeno@pupeno.com.

No comments: