Python, Objects, and some more..

Everything in Python is an object, what does that mean? This post tries to discuss some very basic concepts.

What does the following assignment do?


a = 1

Of course, anyone dabbled in code knows this. The statement above creates a container `a` and stores the value `1` in it.

But it seem that’s not exactly what’s happening, at least from Python’s view-point.

When a = 1 is entered or executed by the python interpreter, the following happens in the backend, seemingly unknown to the user.

  • The Python interpreter evaluates the literal 1 and tries to understand what data type can be assigned for it.
    • There are several in-built data types such as str, float, bool, list, dict, set etc..
    • Builtin types are classes implemented in the python core.
    • For a full list of types and explanation, read the python help at python-> help()-> topics -> TYPES
    • Read the help sections for builtin types, eg.. help(int), help(list) etc..
  • The interpreter finds the appropriate builtin type for the literal. Since the literal 1 fits the type int, the interpreter creates an instance from class int() in memory.
    • This instance is called an object since it’s just a blob with some metadata.
    • This object has a memory address, a value, a name in one or more namespace, some metadata etc..
    • type(a) helps in understanding the instance type.
    • In short, an assignment statement simply creates an instance in memory from a pre-defined class.
  • The interpreter reads the LHS (Left hand side) of the statement a = 1, and creates the name a in the current namespace.
    • The name in the namespace is a reference to the object in memory.
    • Through this reference, we can access the data portion as well as the attributes of that object.
    • A single object can have multiple names (references).
  • The name a created in the current namespace is linked to the corresponding object in memory.

When a name that’s already defined is entered at the python prompt, the interpreter reads the namespace, finds the name (reference), goes to the memory location it’s referring to, and pull the value of the object, and prints it on-screen.

Every object has the following features:

  • A single value, available in its data section.
In [1]: a = 1

In [2]: a
Out[2]: 1
  • A single type, since the object is an instance of a pre-defined type class such as int , float etc..
In [3]: type(a)
Out[3]: int
  • Attributes either inherited from the parent type class or defined by the user.
In [10]: dir(a)
Out[10]:
['__abs__',
'__add__',
'__and__',
'__bool__',
'__ceil__',
'__class__',
'__delattr__',
'__dir__',
'__divmod__',
'__doc__',
'__eq__',
'__float__',
...[content omitted]
'__setattr__',
'__sizeof__',
'__str__',
'__sub__',
'__subclasshook__',
'__truediv__',
'__trunc__',
'__xor__',
'bit_length',
'conjugate',
'denominator',
'from_bytes',
'imag',
'numerator',
'real',
'to_bytes']
  • One or more base classes. All new-stlye classes in Python ultimately inherits from the object class.
In [4]: type(a)
Out[4]: int

In [5]: int.mro()
Out[5]: [int, object]

NOTE: a is an instance of the int class, and int inturn inherits from the object class. Read more on Method Resolution Order.


  • A unique ID representing the object.
In [6]: id(a)
Out[6]: 140090033476640
  • Zero, One, or more names.
    • Use dir() to check the current namespace.
    • Use dir(<object-name>) to refer the indirect namespace.

Several other builtins are available in the default namespace without defining them specifically, possible due to the inclusion of the builtin module available under the reference __builtin__ in the current namespace.

For a full list of the pre-defined variables, refer dir(__builtins__)help(__builtin__) or help(builtins) after an import builtins.

A few questions and observations:

Q1. How can an assignment have zero names in the namespace?

Ans: An assignment such as a = 1 creates an object in memory and creates a corresponding name (a in our case) in the namespace. a acts as a reference to the object in memory.

But, simply entering 1 at the python prompt creates an object in memory which is an instance of a type class, without creating the reference in the namespace.

Objects which don’t have a reference from the current namespace are usually garbage-collected due to lack of references. Hence, an object which doesn’t have a reference (a name), or had multiple references (more than one names) but had them deleted (for example, del() gets garbage-collected by python.

If the assignment 1 happens to be at a python prompt, it echoes the literal back after creating the object and reference since the prompt is essentially a REPL (Read Eval Print loop)

Q2. Can an object have more than one name references?

Ans: It’s perfectly fine to have more than one reference to a single object. The example below should explain things very well.

In [1]: a = 5000

In [2]: id(a)
Out[2]: 140441367080400

In [3]: b = a

In [4]: b
Out[4]: 5000

In [5]: id(b)
Out[5]: 140441367080400

In [6]: c = 5000

In [7]: id(c)
Out[7]: 140441367080432

In [8]: a is b
Out[8]: True

In [9]: a == b
Out[9]: True

In [10]: a is c
Out[10]: False

In [11]: a == c
Out[11]: True

The example shown above creates an object with value 5000 and assign it a name a in the current namespace. We checked the identifier of the object using id(a) and found out it to be 140441367080400.

As the next step, we created another name in the namespace, ie.. b which takes in whatever a points to. Hence, b would default to 5000 and it will have the same identifier as a.

This shows that an object in memory can have multiple references in a namespace.

Another object of value 5000 is created with a name c , but we can see that the identifier differs from what id(a) and id(b) is. This shows that c points to an entirely different object in memory.

To test if a is exactly the same object as b, use the keyword is. Meanwhile, if you want to test if two objects contain the same value, use the equality == symbol.

Max file-name length in an EXT4 file system.

A recent discussion at work brought up the question “What can be the length of a file name in EXT4”. Or in other words, what would be the maximum character length of the name for a file in EXT4?

Wikipedia states that it’s 255 Bytes, but how does that come to be? Is it 255 Bytes or 255 characters?

In the kernel source for the 2.6 kernel series (the question was for a RHEL6/EXT4 combination), in  fs/ext4/ext4.h, we’d be able to see the following:


#define EXT4_NAME_LEN 255

struct ext4_dir_entry {
    __le32 inode;             /* Inode number */
    __le16 rec_len;           /* Directory entry length */
    __le16 name_len;          /* Name length */
    char name[EXT4_NAME_LEN]; /* File name */
};

/*
* The new version of the directory entry. Since EXT4 structures are
* stored in intel byte order, and the name_len field could never be
* bigger than 255 chars, it's safe to reclaim the extra byte for the
* file_type field.
*/

struct ext4_dir_entry_2 {
    __le32 inode;             /* Inode number */
    __le16 rec_len;           /* Directory entry length */
    __u8 name_len;            /* Name length */
    __u8 file_type;
    char name[EXT4_NAME_LEN]; /* File name */
};

This shows that there are two versions of the directory entry structure, ie.. ext4_dir_entry and ext4_dir_entry_2

A directory entry structure carries the file/folder name and the corresponding inode number under every directory.

Both structs use an element named name_len to denote the length of the file/folder name.

If the EXT filesystem feature filetype is not set, the directory entry structure falls to the first method ext4_dir_entry, else it’s the second, ie.. ext4_dir_entry_2.

By default, the file system feature filetype is set, hence the directory entry structure is ext4_dir_entry_2 . As seen above, in this case, the name_len field is set to 8 bits.

__u8 represents an unsigned 8-bit integer in C, and can store values from 0 to 255.

ie.. 2^8 = 255 (0 t0 255 == 256)

ext4_dir_entry has a name_len of __le16, but it seems that the file-name length can only go to a max of 256.

Observations:

  1. The maximum name length is 255 characters on Linux machines.
  2. The actual name length of a file/folder is stored in name_len in each directory entry, under its parent folder. So if the file name length is 5 characters, 5 would be the value set for name_len for that particular file. ie.. the actual length.
  3. A character will consume a byte of storage, so the number of characters in a file name will map to the respective number bytes. If so, a file with a name_len of 5 will be using 5 bytes of memory to store the name.

Hence, name_len denotes the number of characters that a file can have. Since U8 is 8-bits, name_len can store a file name with upto 255 chars.

Now the actual memory being consumed for storing these characters is not denoted by name_len. Since the size of a character translates to a byte, the maximum size wrt memory that a file name can have is 255 Bytes.

NOTE:

The initial dir entry structure ext4_dir_entry had __le16 for name_len, it was later re-sized to __u8 in ext4_dir_entry_2 , by culling 8 bits from the existing 16 bits of name_len.

The remaining free space culled from name_len was assigned to store the file type, in ext4_dir_entry_2. It was named file_type with size __u8.

file_type helps to identity the file types such as regular files, sockets, character devices, block devices etc..

References:

  1. RHEL6 kernel-2.6.32-573.el6 EXT4 header file (ext4.h)
  2. EXT4 Wiki – Disk layout
  3. http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs

Inheritance and super() – Object Oriented Programming

super() is a feature through which inherited methods can be accessed, which has been overridden in a class. It can also help with the MRO lookup order in case of multiple inheritance. This may not be obvious first, but a few examples should help to drive the point home.

Inheritance and method overloading was discussed in a previous post, where we saw how inherited methods can be overloaded or enhanced in the child classes.

In many scenarios, it’s needed to overload an inherited method, but also call the actual method defined in the Parent class.

Let’s start off with a simple example based on Inheritance, and build from there.

Example 0:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class!")

class ChildClass(MyClass):
    pass

my_instance_1 = ChildClass()
my_instance_1.func()

This outputs:

In [18]: %run /tmp/super-1.py
I'm being called from the Parent class

In Example 0, we have two classes, MyClass and ChildClass. The latter inherits from the former, and the parent class MyClass has a method named func defined.

Since ChildClass inherits from MyClass, the child class has access to the methods defined in the parent class. An instance is created my_instance_2, for ChildClass.

Calling my_instance_1.func() will print the statement from the Parent class, due to the inheritance.

Building up on the first example:

Example 1:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm being called from the Child class")

my_instance_1 = MyClass()
my_instance_2 = ChildClass()

my_instance_1.func()
my_instance_2.func()

This outputs:

In [19]: %run /tmp/super-1.py
I'm being called from the Parent class
I'm being called from the Child class

This example has a slight difference, both the child class as well as the parent class have the same method defined, ie.. func. In this scenario, the parent class’ method is overridden by the child class method.

ie.. if we call the func() method from the instance of ChildClass, it need not go a fetch the method from its Parent class, since it’s already defined locally.

NOTE: This is due to the Method Resolution Order, discussed in an earlier post.

But what if there is a scenario that warranties the need for specifically calling methods defined in the Parent class, from the instance of a child class?

ie.. How to call the methods defined in the Parent class, through the instance of the Child class, even if the Parent class method is overloaded in the Child class?

In such a case, the inbuilt function super() can be used. Let’s add to the previous example.

Example 2:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm actually being called from the Child class")
        print("But...")
        # Calling the `func()` method from the Parent class.
        super(ChildClass, self).func()

my_instance_2 = ChildClass()
my_instance_2.func()

This outputs:

In [21]: %run /tmp/super-1.py
I'm actually being called from the Child class
But...
I'm being called from the Parent class

How is the code structured?

  1. We have two classes MyClass and ChildClass.
  2. The latter is inheriting from the former.
  3. Both classes have a method named func
  4. The child class ChildClass is instantiated as my_instance_2
  5. The func method is called from the instance.

How does the code work?

  1. When the func method is called, the interpreter searches it using the Method Resolution Order, and find the method defined in the class ChildClass.
  2. Since it finds the method in the child class, it executes it, and prints the string “I’m actually being called from the Child class”, as well “But…”
  3. The next statement is super which calls the method func defined in the parent class of ChildClass
  4. Since the control is now passed onto the func method in the Parent class via super, the corresponding print() statement is printed to stdout.

Example 2 can also be re-written as :

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm actually being called from the Child class")
        print("But...")
        # Calling the `func()` method from the Parent class.
        # super(ChildClass, self).func()
        MyClass.func(self)  # Call the method directly via Parent class

my_instance_2 = ChildClass()
my_instance_2.func()

 

NOTE: The example above uses the Parent class directly to access it’s method. Even though it works, it is not the best way to do it since the code is tied to the Parent class name. If the Parent class name changes, the child/sub class code has to be changed as well. 

Let’s see another example for  super() . This is from our previous article on Inheritance and method overloading.

Example 3:

import abc

class MyClass(object):

    __metaclass__ = abc.ABCMeta

    def my_set_val(self, value):
        self.value = value

    def my_get_val(self):
        return self.value

    @abc.abstractmethod
    def print_doc(self):
        return

class MyChildClass(MyClass):

    def my_set_val(self, value):
        if not isinstance(value, int):
            value = 0
        super(MyChildClass, self).my_set_val(self)

    def print_doc(self):
        print("Documentation for MyChild Class")

my_instance = MyChildClass()
my_instance.my_set_val(100)
print(my_instance.my_get_val())
print(my_instance.print_doc())

The code is already discussed here. The my_set_val method is defined in both the child class as well as the parent class.

We overload the my_set_val method defined in the parent class, in the child class. But after enhancing/overloading it, we call the my_set_val method specifically from the Parent class using super() and thus enhance it.

Takeaway:

  1. super() helps to specifically call the Parent class method which has been overridden in the child class, from the child class.
  2. The super() in-built function can be used to call/refer the Parent class without explicitly naming them. This helps in situations where the Parent class name may change. Hence, super() helps in avoiding strong ties with class names and increases maintainability.
  3. super() helps the most when there are multiple inheritance happening, and the MRO ends up being complex. In case you need to call a method from a specific parent class, use super().
  4. There are multiple ways to call a method from a Parent class.
    1. <Parent-Class>.<method>
    2. super(<ChildClass>, self).<method>
    3. super().<method>

References:

  1. https://docs.python.org/2/library/functions.html#super
  2. https://rhettinger.wordpress.com/2011/05/26/super-considered-super/
  3. https://stackoverflow.com/questions/222877/how-to-use-super-in-python

Sharding the Ceph RADOS Gateway bucket index

Sharding is the process of breaking down data onto multiple locations so as to increase parallelism, as well as distribute load. This is a common feature used in databases. Read more on this at Wikipedia.

The concept of sharding is used in Ceph, for splitting the bucket index in a RADOS Gateway.

RGW or RADOS Gateway keeps an index for all the objects in its buckets for faster and easier lookup. For each RGW bucket created in a pool, the corresponding index is created in the XX.index pool.

For example, for each of the buckets created in .rgw pool, the bucket index is created in .rgw.buckets.index pool. For each bucket, the index is stored in a single RADOS object.

When the number of objects increases, the size of the RADOS object increases as well. Two problems arise due to the increased index size.

  1. RADOS does not work good with large objects since it’s not designed as such. Operations such as recovery, scrubbing etc.. work on a single object. If the object size increases, OSDs may start hitting timeouts because reading a large object may take a long time. This is one of the reason that all RADOS client interfaces such as RBD, RGW, CephFS use a standard 4MB object size.
  2. Since the index is stored in a single RADOS object, only a single operation can be done on it at any given time. When the number of objects increases, the index stored in the RADOS object grows. Since a single index is handling a large number of objects, and there is a chance the number of operations also increase, parallelism is not possible which can end up being a bottleneck. Multiple operations will need to wait in a queue since a single operation is possible at a time.

In order to work around these problems, the bucket index is sharded into multiple parts. Each shard is kept on a separate RADOS object within the index pool.

Sharding is configured with the tunable bucket_index_max_shards . By default, this tunable is set to 0 which means that there are no shards.

How to check if Sharding is set?

  1. List the buckets
    # radosgw-admin metadata bucket list
    [
     "my-new-bucket"
    ]
    
  2. Get information on the bucket in question
    
    # radosgw-admin metadata get bucket:my-new-bucket
    {
        "key": "bucket:my-new-bucket",
        "ver": {
            "tag": "_bGZAVUgayKVwGNgNvI0328G",
            "ver": 1
        },
        "mtime": 1458940225,
        "data": {
            "bucket": {
                "name": "my-new-bucket",
                "pool": ".rgw.buckets",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_pool": ".rgw.buckets.index",
                "marker": "default.2670570.1",
                "bucket_id": "default.2670570.1"
            },
            "owner": "rgw_user",
            "creation_time": 1458940225,
            "linked": "true",
            "has_bucket_info": "false"
        }
    }
    
    
  3. Use the bucket ID to get more information, including the number of shards.
radosgw-admin metadata get bucket.instance:my-new-bucket:default.2670570.1
{
    "key": "bucket.instance:my-new-bucket:default.2670570.1",
    "ver": {
        "tag": "_xILkVKbfQD7reDFSOB4a5VU",
        "ver": 1
    },
    "mtime": 1458940225,
    "data": {
        "bucket_info": {
            "bucket": {
                "name": "my-new-bucket",
                "pool": ".rgw.buckets",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_pool": ".rgw.buckets.index",
                "marker": "default.2670570.1",
                "bucket_id": "default.2670570.1"
            },
            "creation_time": 1458940225,
            "owner": "rgw_user",
            "flags": 0,
            "region": "default",
            "placement_rule": "default-placement",
            "has_instance_obj": "true",
            "quota": {
                "enabled": false,
                "max_size_kb": -1,
                "max_objects": -1
            },
            "num_shards": 0,
            "bi_shard_hash_type": 0
        },
        "attrs": [
            {
                "key": "user.rgw.acl",
                "val": "AgKPAAAAAgIaAAAACAAAAHJnd191c2VyCgAAAEZpcnN0IFVzZXIDA2kAAAABAQAAAAgAAAByZ3dfdXNlcg8AAAABAAAACAAAAHJnd191c2VyAwM6AAAAAgIEAAAAAAAAAAgAAAByZ3dfdXNlcgAAAAAAAAAAAgIEAAAADwAAAAoAAABGaXJzdCBVc2VyAAAAAAAAAAA="
            },
            {
                "key": "user.rgw.idtag",
                "val": ""
            },
            {
                "key": "user.rgw.manifest",
                "val": ""
            }
        ]
    }
}

Note that `num_shards` is set to 0, which means that sharding is not enabled.

How to configure Sharding?

To configure sharding, we need to first dump the region info.

NOTE: By default, RGW has a region named default even if regions are not configured.

# radosgw-admin region get > /tmp/region.txt 

# cat /tmp/region.txt
{
    "name": "default",
    "api_name": "",
    "is_master": "true",
    "endpoints": [],
    "hostnames": [],
    "master_zone": "",
    "zones": [
        {
            "name": "default",
            "endpoints": [],
            "log_meta": "false",
            "log_data": "false",
            "bucket_index_max_shards": 0
        }
    ],
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": []
        }
    ],
    "default_placement": "default-placement"
}

Edit the file /tmp/region.txt, change the value for `bucket_index_max_shards` to the needed shard value (we’re setting it to 8 in this example), and inject it back to the region.

# radosgw-admin region set < /tmp/region.txt
{
    "name": "default",
    "api_name": "",
    "is_master": "true",
    "endpoints": [],
    "hostnames": [],
    "master_zone": "",
    "zones": [
        {
            "name": "default",
            "endpoints": [],
            "log_meta": "false",
            "log_data": "false",
            "bucket_index_max_shards": 8
        }
    ],
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": []
        }
    ],
    "default_placement": "default-placement"
}

Reference:

  1. Red Hat Ceph Storage 1.3 Rados Gateway documentation
  2. https://en.wikipedia.org/wiki/Shard_(database_architecture)

Inheritance and Method overloading – Object Oriented Programming

Inheritance is a usual theme in Object Oriented Programming. Because of Inheritance, the functions/methods defined in parent classes can be called in Child classes which enables code reuse, and several other features. In this article, we try to understand some of those features that come up with Inheritance.

We’ve discussed Abstract Methods in an earlier post, which is a feature part of Inheritance, and can be applied on child classes that inherits from a Parent class.

E the methods which are inherited can also be seen as another feature or possibility in Inheritance. In many cases, it’s required to override or specialize the methods inherited from the Parent class. This is of course possible, and is called as ‘Method Overloading’.

Consider the two classes and its methods defined below:

Example 0:

import abc

class MyClass(object):

    __metaclass__ = abc.ABCMeta

    def __init__(self):
        pass

    def my_set_method(self, value):
        self.value = value

    def my_get_method(self):
        return self.value

    @abc.abstractmethod
    def printdoc(self):
        return

class MyChildClass(MyClass):

    def my_set_method(self, value):
        if not isinstance(value, int):
            value = 0
        super(MyChildClass, self).my_set_method(self)

    def printdoc(self):
        print("\nDocumentation for MyChildClass()")

instance_1 = MyChildClass()
instance_1.my_set_method(10)
print(instance_1.my_get_method())
instance_1.printdoc()

 

We have two classes, the parent class being MyClass and the child class being MyChildClass.

MyClass has three methods defined.

  • my_set_method()
  • my_get_method()
  • printdoc()

The printdoc() method is an Abstract method, and hence should be implemented in the Child class as a mandatory method.

The child class MyChildClass inherits from MyClass and has access to all it’s methods.

Normally, we can just go ahead and use the methods defined in MyClass , in MyChildClass. But there can be situations when we want to improve or build upon the methods inherited. As said earlier, this is called Method Overloading.

MyChildClass extends the parent’s my_set_method() function by it’s own implementation. In this example, it does an additional check to understand if the input value is an int or not, and then calls the my_set_method() of it’s parent class using super. Hence, this method in the child class extends the functionality prior calling method in the parent. A post on super is set for a later time.

Even though this is a trivial example, it helps to understand how the features inherited from other classes can be extended or improved upon via method overloading.

The my_get_method() is not overridden in the child class but still called from the instance, as instance_1.my_get_method(). We’re using it as it is available via Inheritance. Since it’s defined in the parent class, it works in the child class’ instance when called, even if not overridden.

The printdoc() method is an abstract method and hence is mandatory to be implemented in the child class, and can be overridden with what we choose to do.

Inheritance is possible from python builtins, and can be overridden as well. Let’s check out another example:

Example 1:

class MyList(list):

    def __getitem__(self, index):
        if index == 0:
            raise IndexError
        if index &gt; 0:
            index -= 1
        return list.__getitem__(self, index)

    def __setitem__(self, index, value):
        if index == 0:
            raise IndexError
        if index &gt; 0:
            index -= 1
        list.__setitem__(self, index, value)

x = MyList(['a', 'b', 'c'])
print(x)
print("-" * 10)

x.append('d')
print(x)
print("-" * 10)

x.__setitem__(4, 'e')
print(x)
print("-" * 10)

print(x[1])
print(x.__getitem__(1))
print("-" * 10)

print(x[4])
print(x.__getitem__(4))

This outputs:

['a', 'b', 'c']
----------
['a', 'b', 'c', 'd']
----------
['a', 'b', 'c', 'e']
----------
a
a
----------
e
e

How does the code work?

The class MyList() inherits from the builtin list. Because of the inheritance, we can use list’s available magic methods such as __getitem__() , __setitem__() etc..

NOTE: In order to see the available methods in list, use dir(list).

  1. We create two functions/methods named `__getitem__()` and `__setitem__()` to override the inherited methods.
  2. Within these functions/methods, we set our own conditions.
  3. Wie later call the builtin methods directly within these functions, using
    1. list.__getitem__()
    2. list.__setitem__()
  4. We create an instance named x from MyList().
  5. We understand that
    1. x[1] and x.__getitem__(1) are same.
    2. x[4, 'e'] and x.__setitem__(4, 'e') are same.
    3. x.append(f) is same as x.__setitem__(<n>, f) where <n> is the element to the extreme right which the python interpreter iterates and find on its own.

Hence, in Inheritance, child classes can:

  • Inherit from parent classes and use those methods.
    • Parent classes can either be user-defined classes or buitins like list , dict etc..
  • Override (or Overload) an inherited method.
  • Extend an inherited method in its own way.
  • Implement an Abstract method the parent class requires.

Reference:

  1. Python beyond the basics – Object Oriented Programming

 

Abstract Base Classes/Methods – Object Oriented Programming

Abstract classes, in short, are classes that are supposed to be inherited or subclassed, rather than instantiated.

Through Abstract Classes, we can enforce a blueprint on the subclasses that inherit the Abstract Class. This means that Abstract classes can be used to define a set of methods that must be implemented by it subclasses.

Abstract classes are used when working on large projects where classes have to be inherited, and need to strictly follow certain blueprints.

Python supports Abstract Classes via the module abc from version 2.6. Using the abc module, its pretty straight forward to implement an Abstract Class.

Example 0:

import abc

class My_ABC_Class(object):
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def set_val(self, val):
        return

    @abc.abstractmethod
    def get_val(self):
        return

# Abstract Base Class defined above ^^^

# Custom class inheriting from the above Abstract Base Class, below

class MyClass(My_ABC_Class):

    def set_val(self, input):
        self.val = input

    def get_val(self):
        print("\nCalling the get_val() method")
        print("I'm part of the Abstract Methods defined in My_ABC_Class()")
        return self.val

    def hello(self):
        print("\nCalling the hello() method")
        print("I'm *not* part of the Abstract Methods defined in My_ABC_Class()")

my_class = MyClass()

my_class.set_val(10)
print(my_class.get_val())
my_class.hello()

In the code above, set_val() and get_val() are both abstract methods defined in the Abstract Class My_ABC_Class(). Hence it should be implemented in the child class inheriting from My_ABC_Class().

In the child class MyClass() , we have to strictly define the abstract classes defined in the Parent class. But the child class is free to implement other methods of their own. The hello() method is one such.

This will print :

# python abstractclasses-1.py

Calling the get_val() method
I'm part of the Abstract Methods defined in My_ABC_Class()
10

Calling the hello() method
I'm *not* part of the Abstract Methods defined in My_ABC_Class()

The code gets executed properly even if the  hello() method is not an abstract method.

Let’s check what happens if we don’t implement a method marked as an abstract method, in the child class.

Example 1:

import abc

class My_ABC_Class(object):
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def set_val(self, val):
        return

    @abc.abstractmethod
    def get_val(self):
        return

# Abstract Base Class defined above ^^^

# Custom class inheriting from the above Abstract Base Class, below

class MyClass(My_ABC_Class):

    def set_val(self, input):
        self.val = input

    def hello(self):
        print("\nCalling the hello() method")
        print("I'm *not* part of the Abstract Methods defined in My_ABC_Class()")

my_class = MyClass()

my_class.set_val(10)
print(my_class.get_val())
my_class.hello()

Example 1 is the same as Example 0 except we don’t have the get_val() method defined in the child class.

This means that we’re breaking the rule of abstraction. Let’s see what happens:

# python abstractclasses-2.py
Traceback (most recent call last):
  File "abstractclasses-2.py", line 50, in
    my_class = MyClass()
TypeError: Can't instantiate abstract class MyClass with abstract methods get_val

The traceback clearly states that the child class MyClass() cannot be instantiated since it does not implement the Abstract methods defined in it’s Parent class.

We mentioned that an Abstract class is supposed to be inherited rather than instantiated. What happens if we try instantiating an Abstract class?

Let’s use the same example, this time we’re instantiating the Abstract class though.

Example 2:

import abc

class My_ABC_Class(object):
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def set_val(self, val):
        return

    @abc.abstractmethod
        def get_val(self):
            return

# Abstract Base Class defined above ^^^

# Custom class inheriting from the above Abstract Base Class, below

class MyClass(My_ABC_Class):

    def set_val(self, input):
        self.val = input

    def hello(self):
        print("\nCalling the hello() method")
        print("I'm *not* part of the Abstract Methods defined in My_ABC_Class()")

my_class = My_ABC_Class()    # <- Instantiating the Abstract Class

my_class.set_val(10)
print(my_class.get_val())
my_class.hello()

What does this output?

# python abstractclasses-3.py
Traceback (most recent call last):
    File "abstractclasses-3.py", line 54, in <module>
       my_class = My_ABC_Class()
TypeError: Can't instantiate abstract class My_ABC_Class with abstract methods get_val, set_val

As expected, the Python interpreter says that it can’t instantiate the abstract class My_ABC_Class.

Takeaway: 

  1. An Abstract Class is supposed to be inherited, not instantiated.
  2. The Abstraction nomenclature is applied on the methods within a Class.
  3. The abstraction is enforced on methods which are marked with the decorator @abstractmethod or @abc.abstractmethod, depending on how you imported the module, from abc import abstractmethod or import abc.
  4. It is not mandatory to have all the methods defined as abstract methods, in an Abstract Class.
  5. Subclasses/Child classes are enforced to define the methods which are marked with @abstractmethod in the Parent class.
  6. Subclasses are free to create methods of their own, other than the abstract methods enforced by the Parent class.

Reference:

  1. https://pymotw.com/2/abc/
  2. Python beyond the basics – Object Oriented Programming

Instance, Class, and Static methods – Object Oriented Programming

Functions defined under a class are also called methods. Most of the methods are accessed through an instance of the class.

There are three types of methods:

  1. Instance methods
  2. Static methods
  3. Class methods

Both Static methods and Class methods can be called using the @staticmethod and @classmethod syntactic sugar respectively.

Instance methods

Instance methods are also called Bound methods since the instance is bound to the class via self. Read a simple explanation on self here.

Almost all methods are Instance methods since they are accessed through instances.

For example:

class MyClass(object):

def set_val(self, val):
    self.value = val

def get_val(self):
    print(self.value)
    return self.value

a = MyClass()
b = MyClass()

a.set_val(10)
b.set_val(100)

a.get_val()
b.get_val()

The above code snippet shows manipulating the two methods  set_val() and get_val() . These are done through the instances a and b. Hence these methods are called Instance methods.

NOTE: Instance methods have self as their first argument. self is the instance itself.

All methods defined under a class are usually called via the instance instantiated from the class. But there are methods which can work without instantiating an instance.

Class methods and Static methods don’t require an instance, and hence don’t need self as their first argument.

Static methods

Static methods are functions/methods which doesn’t need a binding to a class or an instance.

  1. Static methods, as well as Class methods, don’t require an instance to be called.
  2. Static methods doesn’t need  self or cls as the first argument since it’s not bound to an instance or class.
  3. Static methods are normal functions, but within a class.
  4. Static methods are defined with the keyword @staticmethod above the function/method.
  5. Static methods are usually used to work on Class Attributes.

=============================
A note on class attributes

Attributes set explicitly under a class (not under a function) are called Class Attributes.

For example:

class MyClass(object):
    value = 10

    def my_func(self):
        pass

In the code snippet above, value = 10 is an attribute defined under the class MyClass() and not under any functions/methods. Hence, it’s called a Class attribute.
=============================

Let’s check out an example on static methods and class attributes:

class MyClass(object):
# A class attribute
    count = 0

    def __init__(self, name):
        print("An instance is created!")
        self.name = name
        MyClass.count += 1

    # Our class method
    @staticmethod
    def status():
        print("The total number of instances are ", MyClass.count)

print(MyClass.count)

my_func_1 = MyClass("MyClass 1")
my_func_2 = MyClass("MyClass 2")
my_func_3 = MyClass("MyClass 3")

MyClass.status()
print(MyClass.count)

This prints the following:

# python statismethod.py

0
An instance is created!
An instance is created!
An instance is created!

The total number of instances are 3
3

How does the code work?

  1. The example above has a class  MyClass() with a class attribute count = 0.
  2. An __init__ magic method accepts a name variable.
  3. The __init__ method also increments the count in the count counter at each instantiation.
  4. We define a staticmethod status() which just prints the number of the instances being created. The work done in this method is not necessarily associated with the class or any functions, hence its defined as a staticmethod.
  5. We print the initial value of the counter count via the class, as MyClass.count. This will print 0since the counter is called before any instances are created.
  6. We create three instances from the class  MyClass
  7. We can check the number of instances created through the status() method and the count counter.

Another example:

class Car(object):

    def sound():
        print("vroom!")

The code above shows a method which is common to all the Car instances, and is not limited to a specific instance of Car. Hence, this can be called as a staticmethod since it’s not necessarily bound to a Class or Instance to be called.

class Car(object):

    @staticmethod
    def sound():
        print("vroom!")

Class methods

We can define functions/methods specific to classes. These are called Class methods.

The speciality of a class methods is that an instance is not required to access a class method. It can be called directly via the Class name.

Class methods are used when it’s not necessary to instantiate a class to access a method.

NOTE: A method can be set as a Class method using the decorator @classmethod.

Example:

class MyClass(object):
    value = 10

    @classmethod
    def my_func(cls):
        print("Hello")


NOTE: Class methods have cls as their first argument, instead of self.

Example:

class MyClass(object):
    count = 0

    def __init__(self, val):
        self.val = val
        MyClass.count += 1

    def set_val(self, newval):
        self.val = newval

    def get_val(self):
        return self.val

    @classmethod
    def get_count(cls):
        return cls.count

object_1 = MyClass(10)
print("\nValue of object : %s" % object_1.get_val())
print(MyClass.get_count())

object_2 = MyClass(20)
print("\nValue of object : %s" % object_2.get_val())
print(MyClass.get_count())

object_3 = MyClass(40)
print("\nValue of object : %s" % object_3.get_val())
print(MyClass.get_count())

Here, we use a get_count() function to get the number of times the counter was incremented. The counter is incremented each time an instance is created.

Since the counter is not really tied with the instance but only counts the number of instance, we set it as a classmethod, and calls it each time using MyClass.get_count()when an instance is created. The output looks as following:

# python classmethod.py

Value of object : 10
1

Value of object : 20
2

Value of object : 40
3

 

Courtsey: This was written as part of studying class and static methods. Several articles/videos have helped including but not limited to the following:

  1. https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/
  2. Python beyond the basics – Object Oriented Programming – O’Reilly Learning Paths