Is __ prefix considered unpythonic ?
by Tarek Ziadé
EDIT: Justus provided in the comments a great link where Jim Fulton argues that __ should be marked deprecated, folllowed by Neal Norwiz and Tim Peters answers. This helps a lot understanding what __ should be used for: http://mail.python.org/pipermail/python-dev/2005-December/058555.html
I am writing on Python OOP best practices and I was wondering what are the best ways to name attributes in classes. My main concern is about the distinction between private and protected attributes.
- private attributes are attributes that cannot be seen or used outside of the class, even buy subclasses. The Python parser calls the name mangling algorithm when it finds them to prevent name collision;
- protected attributes are attributes that can be used and seen in subclasses and should not be used outside. Nothing is done on them and they can be used like public attributes.
When should we use them ?
If you read PEP8, it’s clearly said that name mangling (using a ‘__’ prefix) is the best way to protect an attribute from beeing accessed or overriden. So it should be used for all class internals that is not intended to be overriden.
But in the biggest open source code bases like Zope or Plone, ‘__’ usage is very uncommon. The simple ‘_’ prefix is often used instead, to mark attributes that are private to the class or to the module. So there are no real distinction between private and protected attributes. It seems that the ‘private’ concept is not even used, and people often cut their class code in two parts: public and protected.
In other languages (like Delphi) that define protected and private levels though, protected attributes are not used a lot, and people tend to cut their code in private and public parts and make the protected layer as slim as possible.
Practical rules
Based on these remarks, here’s a tentative of ‘__’ and ‘_’ prefixes best usages in Python, for the use cases I know :
- use __ with property. since properties cannot use overriden methods and are tied to the class, the methods used with it should always be private;
- use __ for methods that works with private attributes. If your methods works for private attributes, make them private too;
- use _ on methods when they are clearly intended to be overriden;
- use __ for all module functions and variables that are private. A protected level is not needed since a module cannot be overriden.
Following these rules would probably make 90% of class attributes private instead of protected, and change all base code conventions. So I am wondering: am I a bit unpythonic if I try to follow this standard in attribute naming ? My guess is that most base code are not clean enough in that matter. For instance, many of them use both new-style and old-style classes under Python 2.5, which lead to a MRO algorithm that differs depending on the classes !
I would love to hear how you people deal with these conventions.
Stop trying to write Java.
I don’t think you read enough of PEP8 🙂
‘We don’t use the term “private” here, since no attribute is really private in Python (without a generally unnecessary amount of work).’
‘Note 3: Not everyone likes name mangling. Try to balance the need to avoid accidental name clashes with potential use by advanced callers.’
Most people have had the experience of wanting to do something with an object only to discover that the initial implementor — in their infinite wisdom of being able to foresee all possible uses and reuses of their code for all futurity — decided to make a very helpful method private. Then what do you do? Cut and paste? Fork the code base just to change “private” to “protected”?
I decided long ago to stop trying to figure out how other people might end up using my code. If someone wants to call a “private” method and it destroys their universe, that’s their perogative. At some point I need to assume a certain level of competence among coworkers/fellow programmers.
I think it is better to document the private/public interface but let “advanced” users do whatever they want instead of having the compiler enforce things.
As you can see from the first two comments (with varying degrees of politeness), the Python community has a culture of “data encapsulation with a carrot” rather than “data encapsulation with a stick”.
You should think of the underscore simply as documentation. Methods/attributes without the underscore form the public API, and methods/attributes with an underscore are “private” only in the sense that you aren’t making any promises about whether they will stay the same in the future. It’s probably good for everyone if you document both public and private, though you would be forgiven for being more terse with private method documentation. 🙂
By making “private” a documentation convention, you enable people to do more invasive subclassing, and you also making things like whitebox testing easier. Since Python modules are nearly always distributed in source-form, it’s not like people subclassing your objects can’t figure out what the expected base class behavior is.
@Justus and Stan: I’m not trying to lock the code, I was just wondering if __ could be conventionaly used instead of _ to mark privacy, as a *OOP documentation*. Now if you would rather use _ for all non public attributes, why this name mangling was intended for at the first place ?
PEP8 says it is to avoid name collision. In other words it is intended to mark the attribute as beeing private (the lock stuff is a detail):
If your class is intended to be subclassed, and you have attributes
that you do not want subclasses to use, consider naming them with
double leading underscores and no trailing underscores.
This invokes Python’s name mangling algorithm, where
the name of the class is
mangled into the attribute name. This helps avoid attribute name
collisions should subclasses inadvertently contain attributes
with the same name
What you say is that it’s up to the developers to be careful with such collisions through documenting (which I agree with, I was just wondering if
this self-documented API could use __ in some cases instead of _).
OK, fine, then why it exists ? If it’s such a controversial feature, and You don’t use it, maybe it should be removed from the language…
Double underscores and name mangling exists so that your class will call the method you defined, even if it is subclassed. Basically, only use it if you know that you need to use it.
The single underscore is used to mark something as not being a part of the public API. It’s just a marker and has no effect.
So there is nothing that really is Private in Python (and a good thing too, as the idea of preventing your subclasses from accessing a method is useless and a pain in the ass when implemented). And there definitely isn’t anything Protected. 🙂
On-topic: the __ prefix is not unpythonic per se, but many uses of it (e.g. using it to document that your attributes are private) would be unpythonic.
Slightly off-topic: why do all your post titles say just “tarek” on Planet Python? A bug in the RSS feed?
@Marius:
on-topic: Do you have pythonic examples ?
off-topic:
I’ve seen that. I tried feedparser over my feed and it works perfectly. I tried to read it on google reader and various readers and it works. I think it’s a bug in the planet because it occurs on planet plone and planet python. It happens to all people that have their blog at wordpress.com
I didn’t have the time to set up a planet to find this issue yet.
Tarek re: removing it:
Some people have argued for exactly that. See http://preview.tinyurl.com/3y5ntx where Jim Fulton asks “Can we officially mark __private as a mistake?” and suggests strongly discouraging it in the style guide. Which explains why you don’t see much of it in zope 🙂
Later in that thread Tim Peters clarifies that the real intended use for __private is for writing base classes that you expect third-parties to subclass to prevent name collisions. Others suggest a more pythonic approach is to do explicit name mangling yourself.
Avoiding name collision and marking something as private are certainly similar but not quite precisely the same goals.
Since my python code rarely has deep or complicated inheritance hierarchies I’ve never had much of a need for __name_mangling. If I were providing base libraries to other people, I might have a different opinion 😉
just discovered a case where you can bypass pythons name mangling…
>>> class test:
def __init__(self):
self.__private_atr = 15
def get(self):
# create new object
t = test()
# as long as class ‘t’ is the same as the
# containing class you can access
# ‘private’attributes…
return t.__private_atr
>>> t = test()
>>> print t.get()
15
>>>
@Justus: great link, thanks ! I am going to add in in the ticket because it is exactly what I was looking for to understand the history of __
@nih: yes, name mangling won’t be applied when you are within the class itself. Otherwise, the attribute could not be accessed at all.
ah no, i created another instance of class test inside class test and then accessed the private variable in the other instance, something i accidently discovered.