Callables in Python

Introduction

A callable is an object in Python that can be called / executed when called with parantheses ( ). Classes and functions are callable.

Callables can be a class, a function, or an instance of a class. In simple terms, a class/function/instance/builtin is callable if it gets executed while being called with the parantheses ().

Example 1:


In [1]: help()
Welcome to Python 3.6's help utility!
-- content omitted --
--------
In [2]: int()
Out[2]: 0

In [3]: callable(int)
Out[3]: True

--------
In [4]: callable(help)
Out[4]: True

--------
In [5]: def hello():
        ..: print("Howdy!!")

In [6]: hello()
Howdy!!

In [7]: callable(hello)
Out[7]: True

In Example 1, we can see that the builtins like help(), a pre-defined type such as int(), and a custom function hello() are all callable. These can be executed while being called with parantheses.

The __call__() method

The callable() builtin helps to determine if an object is callable or not. Internally, it translates to the magic method __call__().

In short:

my_object(*args) translates to my_object.__call__(*args)

All classes and functions are callable, as well as instances of classes with the __call__ magic method. An instance of a class/function is usually not callable (even though the class/function itself is), unless the class carries a __call__ magic method.

In other words, an instance is callable only if the class it is instantiated from contains the __call__ magic method.

  • The inbuilt documentation on callable states:
In [1]: print(callable.__doc__)
Return whether the object is callable (i.e., some kind of function).

Note that classes are callable, as are instances of classes with a
__call__() method.

Example 2:


In [5]: def hello():
       ...: print("Howdy!!")

In [6]: hello()
Howdy!!

In [7]: hello.__call__()
Howdy!!

In [8]: callable(hello)
Out[8]: True

Example 2 shows that a function when called with the parantheses (including any required arguments) is equivalent to calling the __call__() magic method. ie.. calling a function/class with parantheses translates to calling the __call__() magic method.

NOTE:
Read more on Magic methods in Python

Example 3: Non-callable Instances


In [1]: type(1)
Out[1]: int

In [2]: callable(1)
Out[2]: False

In [3]: x = 1

In [4]: type(x)
Out[4]: int

In [5]: callable(int)
Out[5]: True

In [6]: callable(x)
Out[6]: False

Example 3 above shows that even though the int class is callable, the instances created from the int class are not.

Remember that instances will only be callable if the class from which it was instantiated contains a __call__ method. Inspecting the methods of class int reveals that it does not have a __call__ method.

NOTE: You can view the methods of the int class using help(int) or dir(int).

Example 4: Another example with Classes


In [52]: class tell:
        ...: def __call__(self):
            ...: pass

In [53]: telling = tell()

In [54]: callable(tell)
Out[54]: True

In [55]: callable(telling)
Out[55]: True

--------

In [56]: class test:
        ...: pass

In [57]: testing = test()

In [58]: callable(test)
Out[58]: True

In [59]: callable(testing)
Out[59]: False

Since all classes are by default callable, both the classes tell and test in Example 4 are callable. But the instances of these classes necessarily need not be so. Since the class tell has the magic method __call__, the instance telling is callable. But the instance testing instantiated from the class test is not since the class does not have the magic method. Another set of examples.

Example 5: Non-callable instance of a class


In [1]: class new:
       ...: def foo(self):
           ...: print("Hello")

In [2]: n = new()

In [3]: n()
------------------
TypeError Traceback (most recent call last)
 in module()
----> 1 n()

TypeError: 'new' object is not callable


Example 6: Callable instance of the same class

In [4]: class new:
...: def __call__(self):
    ...: print("I'm callable!")

In [5]: n = new()

In [6]: n
Out[6]: __main__.new at 0x7f7a614b1f98

In [7]: n()
I'm callable!

Example 5  and Example 6 shows how a class is itself callable, but unless it carries a __call__() method, the instances spawned out of it are not so.

 

References:

  1. http://docs.python.org/3/library/functions.html#callable
  2. http://eli.thegreenplace.net/2012/03/23/python-internals-how-callables-work/
  3. https://docs.python.org/3/reference/datamodel.html#object.__call__

Writing a minimalistic kernel module in Linux – Part 1

Introduction

Loadable Kernel Modules (LKM) are object code that can be loaded into memory, often used for supporting hardware or enable specific features. Kernel modules enable the core kernel to be minimal and have features to be loaded as required.

A kernel module is a normal file usually suffixed with .ko denoting it’s a kernel object file. It contains compiled code from one or more source files, gets linked to the kernel when loaded, and runs in kernel space. It can dynamically adds functionality to a running kernel, without requiring a reboot.

Linux kernel modules are written in C (not sure if anything else like C++ is possible), and is compiled for a specific kernel version. This is the ideal practice since kernel data structures may change across versions, and using a module compiled for a specific version may break for another.

Since kernel modules can be loaded and unloaded at will, it is pretty easy to unload an older version and load a newer one. This helps immensely in testing out new features since it is easy to change the source code, re-compile, unload the older version, load the newer version, and test the functionality.

Structure

Modules are expected to be under /lib/modules/$(uname -r)/ within directories specified according to use case.

[root@centos7 3.10.0-514.26.2.el7.x86_64]# ls -l
total 2940
lrwxrwxrwx. 1 root root 43 Jul 8 05:10 build -> /usr/src/kernels/3.10.0-514.26.2.el7.x86_64
drwxr-xr-x. 2 root root 6 Jul 4 11:17 extra
drwxr-xr-x. 12 root root 128 Jul 8 05:10 kernel
-rw-r--r--. 1 root root 762886 Jul 8 05:11 modules.alias
-rw-r--r--. 1 root root 735054 Jul 8 05:11 modules.alias.bin
-rw-r--r--. 1 root root 1326 Jul 4 11:17 modules.block
-rw-r--r--. 1 root root 6227 Jul 4 11:15 modules.builtin
-rw-r--r--. 1 root root 8035 Jul 8 05:11 modules.builtin.bin
-rw-r--r--. 1 root root 240071 Jul 8 05:11 modules.dep
-rw-r--r--. 1 root root 343333 Jul 8 05:11 modules.dep.bin
-rw-r--r--. 1 root root 361 Jul 8 05:11 modules.devname
-rw-r--r--. 1 root root 132 Jul 4 11:17 modules.drm
-rw-r--r--. 1 root root 110 Jul 4 11:17 modules.modesetting
-rw-r--r--. 1 root root 1580 Jul 4 11:17 modules.networking
-rw-r--r--. 1 root root 90643 Jul 4 11:15 modules.order
-rw-r--r--. 1 root root 89 Jul 8 05:11 modules.softdep
-rw-r--r--. 1 root root 350918 Jul 8 05:11 modules.symbols
-rw-r--r--. 1 root root 432831 Jul 8 05:11 modules.symbols.bin
lrwxrwxrwx. 1 root root 5 Jul 8 05:10 source -> build
drwxr-xr-x. 2 root root 6 Jul 4 11:17 updates
drwxr-xr-x. 2 root root 95 Jul 8 05:10 vdso
drwxr-xr-x. 2 root root 6 Jul 4 11:17 weak-updates

As we can see, there are several files that deals with the inter-dependencies of modules, which is used by modprobe to understand which modules to load before the one being actually requested to load.

For example:

  • modules.block lists the modules for block devices
  • modules.networking lists the ones for network devices.
  • modules.builtin lists the path of modules included in the kernel.
  • modules.devname lists the ones that would be loaded automatically if a particular device is created.

The kernel folder contains modules divided according to their use cases.

[root@centos7 3.10.0-514.26.2.el7.x86_64]# ls -l kernel/
total 16
drwxr-xr-x. 3 root root 17 Jul 8 05:10 arch
drwxr-xr-x. 3 root root 4096 Jul 8 05:10 crypto
drwxr-xr-x. 67 root root 4096 Jul 8 05:10 drivers
drwxr-xr-x. 26 root root 4096 Jul 8 05:10 fs
drwxr-xr-x. 3 root root 19 Jul 8 05:10 kernel
drwxr-xr-x. 5 root root 222 Jul 8 05:10 lib
drwxr-xr-x. 2 root root 32 Jul 8 05:10 mm
drwxr-xr-x. 33 root root 4096 Jul 8 05:10 net
drwxr-xr-x. 11 root root 156 Jul 8 05:10 sound
drwxr-xr-x. 3 root root 17 Jul 8 05:10 virt

Each directory within kernel contains modules depending on the area they’re used for. For example, kernel/fs/ contains filesystem drivers.

[root@centos7 3.10.0-514.26.2.el7.x86_64]# ls -l kernel/fs
total 48
-rw-r--r--. 1 root root 21853 Jul 4 11:51 binfmt_misc.ko
drwxr-xr-x. 2 root root 22 Jul 8 05:10 btrfs
drwxr-xr-x. 2 root root 27 Jul 8 05:10 cachefiles
drwxr-xr-x. 2 root root 21 Jul 8 05:10 ceph
drwxr-xr-x. 2 root root 21 Jul 8 05:10 cifs
drwxr-xr-x. 2 root root 23 Jul 8 05:10 cramfs
drwxr-xr-x. 2 root root 20 Jul 8 05:10 dlm
drwxr-xr-x. 2 root root 23 Jul 8 05:10 exofs
drwxr-xr-x. 2 root root 21 Jul 8 05:10 ext4
drwxr-xr-x. 2 root root 51 Jul 8 05:10 fat
drwxr-xr-x. 2 root root 24 Jul 8 05:10 fscache
drwxr-xr-x. 2 root root 36 Jul 8 05:10 fuse
drwxr-xr-x. 2 root root 21 Jul 8 05:10 gfs2
drwxr-xr-x. 2 root root 22 Jul 8 05:10 isofs
drwxr-xr-x. 2 root root 21 Jul 8 05:10 jbd2
drwxr-xr-x. 2 root root 22 Jul 8 05:10 lockd
-rw-r--r--. 1 root root 19597 Jul 4 11:51 mbcache.ko
drwxr-xr-x. 6 root root 128 Jul 8 05:10 nfs
drwxr-xr-x. 2 root root 40 Jul 8 05:10 nfs_common
drwxr-xr-x. 2 root root 21 Jul 8 05:10 nfsd
drwxr-xr-x. 2 root root 4096 Jul 8 05:10 nls
drwxr-xr-x. 2 root root 24 Jul 8 05:10 overlayfs
drwxr-xr-x. 2 root root 24 Jul 8 05:10 pstore
drwxr-xr-x. 2 root root 25 Jul 8 05:10 squashfs
drwxr-xr-x. 2 root root 20 Jul 8 05:10 udf
drwxr-xr-x. 2 root root 20 Jul 8 05:10 xfs

depmod, and related commands

Modules can export the features it carry, called symbols which can be used by other modules. If module A depends on a symbol exported by module B, module B should be loaded first followed by module A.

depmod creates a list of symbol dependencies each module has, so that modprobe can go ahead and load the modules serving the symbols, prior loading the actual module.

depmod works by:

  1. Creating a list of symbols each module exports.
  2. Creating a list of symbol dependencies each module has.
  3. Dumping the list of symbols each module exports, to lib/modules/$(uname -r)/modules.symbols.bin and /lib/modules/$(uname -r)/modules.symbols
  4. Dumping the module dependency information to /lib/modules/$(uname -r)/modules.dep.bin and /lib/modules/$(uname -r)/modules.dep.
  5. Creating /lib/modules/$(uname -r)/modules.devnames which contains the device file information (device type, major:minor number) that gets created at boot for this module to function properly.

NOTE:

  • modprobe refers /lib/modules/$(uname -r)/modules.dep.bin to understand the dependencies each module require. A human-readable version of this file is maintained at /lib/modules/$(uname -r)/modules.dep but modprobe does not refer this.
  • The binary file modules.symbols.bin carry the symbols exported (if any) by each module, one symbol per line. A human-readable version of the same is kept at modules.symbols.

A sneak peek into modules.symbols and modules.dep:

modules.symbols

[root@centos7 3.10.0-514.26.2.el7.x86_64]# head modules.symbols
# Aliases for symbols, used by symbol_request().
alias symbol:cfg80211_report_obss_beacon cfg80211
alias symbol:drm_dp_link_train_channel_eq_delay drm_kms_helper
alias symbol:__twofish_setkey twofish_common
alias symbol:mlx4_db_free mlx4_core
alias symbol:nf_send_unreach nf_reject_ipv4
alias symbol:sdhci_remove_host sdhci
alias symbol:videobuf_dma_init_kernel videobuf_dma_sg
alias symbol:ar9003_paprd_is_done ath9k_hw
alias symbol:cxgbi_ep_disconnect libcxgbi

modules.dep

[root@centos7 3.10.0-514.26.2.el7.x86_64]# head modules.dep
kernel/arch/x86/kernel/cpu/mcheck/mce-inject.ko:
kernel/arch/x86/kernel/test_nx.ko:
kernel/arch/x86/kernel/iosf_mbi.ko:
kernel/arch/x86/crypto/ablk_helper.ko: kernel/crypto/cryptd.ko
kernel/arch/x86/crypto/glue_helper.ko:
kernel/arch/x86/crypto/camellia-x86_64.ko: kernel/crypto/xts.ko kernel/crypto/lrw.ko kernel/crypto/gf128mul.ko kernel/arch/x86/crypto/glue_helper.ko
kernel/arch/x86/crypto/blowfish-x86_64.ko: kernel/crypto/blowfish_common.ko
kernel/arch/x86/crypto/twofish-x86_64.ko: kernel/crypto/twofish_common.ko
kernel/arch/x86/crypto/twofish-x86_64-3way.ko: kernel/arch/x86/crypto/twofish-x86_64.ko kernel/crypto/twofish_common.ko kernel/crypto/xts.ko kernel/crypto/lrw.ko kernel/crypto/gf128mul.ko kernel/arch/x86/crypto/glue_helper.ko
kernel/arch/x86/crypto/salsa20-x86_64.ko:

lsmod is a parser that reads through /proc/modules and presents it in an easy-to-read format.

Note how lsmod parse throug the content of /proc/modules below:

[root@centos7 3.10.0-514.26.2.el7.x86_64]# head /proc/modules
test 12498 0 - Live 0xffffffffa0492000 (POE)
binfmt_misc 17468 1 - Live 0xffffffffa048c000
uhid 17369 0 - Live 0xffffffffa0486000
ipt_MASQUERADE 12678 2 - Live 0xffffffffa0481000
nf_nat_masquerade_ipv4 13412 1 ipt_MASQUERADE, Live 0xffffffffa0451000
xt_addrtype 12676 2 - Live 0xffffffffa044c000
br_netfilter 22209 0 - Live 0xffffffffa0468000
dm_thin_pool 65565 1 - Live 0xffffffffa046f000
dm_persistent_data 67216 1 dm_thin_pool, Live 0xffffffffa0456000
dm_bio_prison 15907 1 dm_thin_pool, Live 0xffffffffa043f000

[root@centos7 3.10.0-514.26.2.el7.x86_64]# lsmod | head
Module Size Used by
test 12498 0
binfmt_misc 17468 1
uhid 17369 0
ipt_MASQUERADE 12678 2
nf_nat_masquerade_ipv4 13412 1 ipt_MASQUERADE
xt_addrtype 12676 2
br_netfilter 22209 0
dm_thin_pool 65565 1
dm_persistent_data 67216 1 dm_thin_pool

NOTE:

  1. The first field lists the module name.
  2. The second field lists the size of the module in memory.
  3. The third field lists the number of times the module is in use. `0` means the module is not used despite it being loaded.
  4. The fourth field lists the modules which uses this module as their dependency.

Creating a dummy module

The steps for creating a kernel module includes:

  1. Writing the module file.
  2. Writing the Makefile for the module.
  3. Compile the module file using make , which will refer the Makefile to build it.

The module file and its corresponding Makefile are put in a separate directory so as to keep the kernel module directory clean. Once the module code and the Makefile are ready, the following make command is used to build the module, the $(PWD) being the directory where the module code and its Makefile is present.

# make -C /lib/modules/$(uname -r)/build M=$PWD modules

The make command above does the following:

  1. Change to the path mentioned after -C, ie.. to the location where the kernel Makefile is present. (/lib/modules/$(uname -r)/build/)
  2. Use the kernel Makefile’s macro M which denotes the location from which the code should be compiled, ie.. in this case, the PWD where the module code/Makefile is present.
  3. Use the target modules which tells make to build the module.

Hence, the above command is trying to build a module in the current working directory, using the kernel Makefile at /lib/modules/$(uname -r)/build/Makefile

If we have a module file named test.c and its corresponding Makefile in $(PWD), the make command would follow the steps below:

  1. make calls the modules target and refers to the kernel Makefile.
  2. The kernel Makefile looks for the module Makefile in $PWD.
  3. The kernel Makefile read the module’s Makefile and gets a list of the objects assigned to the macro obj-m.
  4. The make command builds modules for each object assigned to the macro obj-m.

Writing a simple module

The following is a very simple module, which prints a message while loading, and another one while unloading.

#include
#include
#include 	

int test_module(void)
{
    printk("Loading the test module!\n");
    return 0;
}

void unload_test(void)
{
    printk("Unloading the test module!\n");
}

module_init(test_module)
module_exit(unload_test)

This has two functions, test_module() and unload_test() which simply prints a text banner upon loading and unloading respectively.

module_init() is used to load the module, and can call whatever functions that needs to initialize the module. We load our test_module() function into module_init() so that it gets initialized when the module is loaded.

module_exit() is called whenever the module has to be unloaded, and it can take in whatever functions are required to do a proper cleanup (if required) prior the module being unloaded. We load our unload_test() function in module_exit().

Writing a Makefile

Since the kernel Makefile will look in for the obj-m macro in the module Makefile with the object filename as its argument, add the following in the Makefile:

obj-m := test.o

make will create an object file test.o from test.c, and then create a kernel object file test.ko.

Compiling the module with `make`

Let’s compile the module

[root@centos7 test]# pwd
/lib/modules/3.10.0-514.26.2.el7.x86_64/test
[root@centos7 test]# ls
Makefile  test.c
[root@centos7 test]# make -C /lib/modules/$(uname -r)/build M=$PWD modules
make: Entering directory `/usr/src/kernels/3.10.0-514.26.2.el7.x86_64'
  CC [M]  /lib/modules/3.10.0-514.26.2.el7.x86_64/test/test.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /lib/modules/3.10.0-514.26.2.el7.x86_64/test/test.mod.o
  LD [M]  /lib/modules/3.10.0-514.26.2.el7.x86_64/test/test.ko
make: Leaving directory `/usr/src/kernels/3.10.0-514.26.2.el7.x86_64'

Listing the contents show lot of new files, including the module code, the Makefile, the object file test.o created from test.c, the kernel object file test.ko.

test.mod.c contains code which should be the one ultimately being built to test.ko, but that should be for another post since much more is yet to be read/learned on what’s happening there.

[root@centos7 test]# ls -l
total 292
-rw-r--r--. 1 root root     16 Jul 27 11:52 Makefile
-rw-r--r--. 1 root root     60 Jul 27 11:57 modules.order
-rw-r--r--. 1 root root      0 Jul 27 11:57 Module.symvers
-rw-r--r--. 1 root root    281 Jul 27 11:53 test.c
-rw-r--r--. 1 root root 137768 Jul 27 11:57 test.ko
-rw-r--r--. 1 root root    787 Jul 27 11:57 test.mod.c
-rw-r--r--. 1 root root  52912 Jul 27 11:57 test.mod.o
-rw-r--r--. 1 root root  87776 Jul 27 11:57 test.o

Loading/Unloading the module

Loading and unloading the module should print the messages passed via printk in dmesg.

[root@centos7 test]# insmod ./test.ko
[root@centos7 test]# lsmod | grep test
test                   12498  0
[root@centos7 test]# rmmod test

Checking dmesg shows the informational messages in the module code:

[root@centos7 test]# dmesg | tail
[35889.187282] test: loading out-of-tree module taints kernel.
[35889.187288] test: module license 'unspecified' taints kernel.
[35889.187290] Disabling lock debugging due to kernel taint
[35889.187338] test: module verification failed: signature and/or required key missing - tainting kernel
[35889.187548] Loading the test module!
[35899.216954] Unloading the test module!

Note the messages about the module test tainting the kernel. Read more on how a module can taint the kernel, at https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html.

More on customizing the Makefile in another post.

Recursion – Algorithm Study

Recursion is a technique by which a function calls itself until a condition is met.

Introduction

Loops or repetitive execution based on certain conditions are inevitable in programs. Usual loops include if, while and for loops. Recursion is an entirely different way to deal with such situations, and in many cases, easier.

Recursion is a when a function calls itself in each iteration till a condition is met. Ideally, the data set in each iteration gets smaller until it reach the required condition, after which the recursive function exists.

A typical example of recursion is a factorial function.

How does Recursion work?

A recursive function ideally contains a Base case and a Recursive case.

Recursive case is when the function calls itself, until the Base case is met. Each level of iteration in the Recursive case moves the control to the next level.

Once a specific level finishes execution, the control is passed back to the previous level of execution. A Recursive function can go several layers deep until the Base condition is met. In short, a Recursive case is a loop in which the function calls itself.

The Base case is required so that the function doesn’t continue running in the Recursive loop forever. Once the Base case is met, the control moves out of the Recursive case, executes the conditions in the Base case (if any), and exits.

As mentioned in the Introduction, a factorial function can be seen as an example of recursion.

NOTE:

The Base case for a factorial function is when n == 1

Consider n!:

n! can be written as:

n x (n – 1) x (n – 2) x (n – 3) x …. x 1

n! can also be represented as:

 n!       = n * (n - 1)! ---> [Step 1]
(n - 1)! = (n - 1) * (n - 2)! ---> [Step 2]
(n - 2)! = (n - 2) * (n - 3)! ---> [Step 3]
.
..
...
(n - (n - 1)) = 1 ---> [Base case]

Each level/step is a product of a value and all the levels below it. Hence, Step 1 will end up moving to Step 2 to get the factorial of elements below it, then to Step 3 and so on.

ie.. the control of execution move as:

[Step 1] -> [Step 2] -> [Step 3] -> ….. [Step n]

In a much easier-to-grasp example, a 5! would be:

5! = 5 * 4! ---> [Step 1]
4! = 4 * 3! ---> [Step 2]
3! = 3 * 2! ---> [Step 3]
2! = 2 * 1! ---> [Step 4]
1! = 1      ---> [Step 5] / [Base case]

The order of execution will be :

[Step 1] -> [Step 2] -> [Step 3] -> [Step 4] -> [Step 5]

As we know, in Recursion, each layer pause itself and pass the control to the next level. Once it reach the end or the Base case, it returns the result back to the previous level one by one until it reaches where it started off.

In this example, once the control of execution reaches Step 5 / Base case ,  the control is returned back to its previous level Step 4 . This level returns the result back to Step 3 which completes its execution and returns to Step 2 , so on and so forth until it reach  Step 1 .

The return control flow would be as:

[Base case / Step 5] -> [Step 4] -> [Step 3] -> [Step 2] -> [Step 1] -> Result.

This can be summed up using an awesome pictorial representation, from the book Grokking Algorithms by Adit. Please check out the References section for the link for more information about this awesome book.

factorial_recursion

Figure 1: Recursion, Recursive case and Base case (Copyright Manning Publications, drawn by adit.io)

 Code

Example 1:

  • A factorial function in a while loop
def fact(n):
factorial = 1
while n > 1:
factorial = factorial * n
n = n - 1
return factorial

print("Factorial of {0} is {1}".format(10, fact(10)))
print("Factorial of {0} is {1}".format(20, fact(20)))
  • The same function above, in a recursive loop
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

print("Factorial of {0} is {1}".format(10, factorial(10)))
print("Factorial of {0} is {1}".format(20, factorial(20)))

Example 2:

  • A function to sum numbers in a normal for loop.
def my_sum(my_list):
    num = 0
    for i in my_list:
        num += i
    return num

print(my_sum([10, 23, 14, 12, 11, 94, 20]))
  • The same function to add numbers, in a recursive loop
def my_sum(my_list):
    if my_list == []:
        return 0
    else:
        return my_list[0] + my_sum(my_list[1:])

print(my_sum([10, 23, 14, 12, 11, 94, 20]))

Code explanation

Both Example 1 and Example 2 are represented as an iterative function as well as a recursive function.

The iterative function calls the next() function on the iterator sum.__iter__() magic method iterate over the entire data set. The recursive function calls itself to reach a base case and return the result.

Observations:

While a recursive function does not necessarily give you an edge on performance, it is much easier to understand and the code is cleaner.

Recursion has a disadvantage though, for large data sets. Each loop is put on a call stack until it reaches a Base case. Once the Base case is met, the call stack is rewound back to reach where it started, executing each of the previous levels on the way. The examples above showed a sum function and a factorial function. In large data sets, this can lead to a large call stack which in turns take a lot of memory.

References:

  1. Grokking Algorithms
  2. Data Structures and Algorithms in Python

 

Selection Sort – Algorithm Study

Selection Sort is a sorting algorithm used to sort a data set either in incremental or decremental order.

It goes through the entire elements one by one and hence it’s not a very efficient algorithm to work on large data sets.

How does Selection sort work?

Selection sort starts with an unsorted data set. With each iteration, it builds up a sub dataset with the sorted data.

By the end of the sorting process, the sub dataset contains the entire elements in a sorted order.

1. Iterate through the data set one element at a time.
2. Find the biggest element in the data set (Append it to another if needed)
3. Reduce the sample space by the biggest element just found. The new data set becomes `n – 1` compared to the previous iteration.
4. Start over the iteration again, on the reduced sample space.
5. Continue till we have a sorted data set, either incremental or decremental

How does the data sample change in each iteration?

Consider the data set [10, 4, 9, 3, 6, 19, 8]

[10, 4, 9, 3, 6, 19, 8]      – Data set
[10, 4, 9, 3, 6, 8] – [19] – After Iteration 1
[4, 9, 3, 6, 8] – [10, 19] – After Iteration 2
[4, 3, 6, 8] – [9, 10, 19] – After Iteration 3
[4, 3, 6] – [8, 9, 10, 19] – After Iteration 4
[4, 3] – [6, 8, 9, 10, 19] – After Iteration 5
[3] – [4, 6, 8, 9, 10, 19] – After Iteration 6
[3, 4, 6, 8, 9, 10, 19]      – After Iteration 7 – Sorted data set

Let’s check what the Selection Sort algorithm has to go through in each iteration.

Dataset – [10, 4, 9, 3, 6, 19, 8]
Iter 1 – [10, 4, 9, 3, 6, 8]
Iter 2 – [4, 9, 3, 6, 8]
Iter 3 – [4, 3, 6, 8]
Iter 4 – [4, 3, 6]
Iter 5 – [4, 3]
Iter 6 – [3]

Sorted Dataset – [3, 4, 6, 8, 9, 10, 19]

Performance / Time Complexity

Selection Sort has to go through all the elements in the data set, no matter what.

Hence, the Worst case, Best case and Average Time Complexity would be O(n^2).

Since `Selection Sort` takes in `n` elements while starting, and goes through the data set `n` times (each step reducing the data set size by 1 member), the iterations would be:

n + [ (n – 1) + (n – 2) + (n – 3) + (n – 4) + … + 2 + 1 ]

We are more interested in the worse-case scenario. In a very large data set, an `n – 1`, `n – 2` etc.. won’t make a difference.

Hence we can re-write the above iterations as:

n + [n + n + n + n ….. n]

Or also as:

n * n = (n^2)

Code

def find_smallest(my_list):
    smallest = my_list[0]
    smallest_index = 0

for i in range(1, len(my_list)):
    if my_list[i] < smallest:
        smallest = my_list[i]
        smallest_index = i
return smallest_index

def selection_sort(my_list):
    new_list = []
    for i in range(len(my_list)):
        smallest = find_smallest(my_list)
        new_list.append(my_list.pop(smallest))
return new_list

Observations:

1.Selection Sort is an algorithm to sort a data set, but it is not particularly fast.
2. It takes `n` iterations in each step to find the biggest element in that iteration.
3. The next iteration has to run on a data set of `n – 1` elements compared to the previous iteration.
4. For `n` elements in a sample space, Selection Sort takes `n x n` iterations to sort the data set.

References:

1. https://en.wikipedia.org/wiki/Selection_sort
2. http://bigocheatsheet.com
3. https://github.com/egonschiele/grokking_algorithms

 

Binary Search – Algorithm Study

Introduction

Binary Search is a search method used to find an object in a data set. This is much faster compared to the Linear Search algorithm we saw in a previous post.

This algorithm works on the Divide and Conquer principle. Binary Search gets its speed by essentially dividing the list/array in half in each iteration, thus reducing the dataset size for the next iteration.

Imagine searching for an element in a rather large dataset. Searching for an element one by one using Linear Search would take n iterations. In a worst case scenario, if the element being searched is not present in the dataset or is at the end of the dataset, the time taken to find the object/element would be proportional to the size of the dataset.

The element of interest is returned if it is present in the dataset, else a NULL/None value is.

Note:

  • Binary search will only work effectively on a Sorted collection.
  • The code implementation will need minor changes depending on how the dataset is sorted, ie.. either in an increasing order or in a decreasing order.

Performance

1. Worst-case performance: log(n)

As discussed in the post on, Linear Search a worst-case analysis is done with the upper bound of the running time of the algorithm. ie.. the case when the maximum number of operations are needed/executed to find/search the element of interest in the dataset.

Of course, the worst-case scenario for any search algorithms is when the element of interest is not present in the dataset. The maximum number of searches has to be done in such a case, and it still ends up with no fruitful result. A similar but less worse case is when the element is found in the final (or somewhere near the last) iteration.

Due to the divide-and-conquer method, the maximum number of iterations needed for a dataset of n elements is, log(n) where the log base is 2.

Hence, for a data set of 10240 elements, Binary Search takes a maximum of 13 iterations.

In [1]: import math

In [2]: math.log(10240, 2)
Out[2]: 13.321928094887364

For a data set of 50,000 elements, Binary Search takes 16 iterations in the worst case scenario while a Linear Search may take 50,000 iterations in a similar case.

In [1]: import math

In [2]: math.log(50000, 2)
Out[2]: 15.609640474436812

ie.. the Worst case for Binary search takes log(n) iterations to find the element.

2. Best-case performance: O(1)

The best case scenario is when the element of interest is found in the first iteration itself. Hence the best-case would be where the search finishes in one iteration.

ie.. The best-case scenario would be O(1).

How does Binary Search work?

Imagine a sorted dataset of 100 numbers and we’re searching for  98 is in the list. A simple search would start from index 0 , moves to the element at index 1, progresses element by element until the one in interest is found. Since we’re searching for 98, it’ll take n iterations depending on the number of elements between the first element in the dataset and 98.

Binary Search uses the following method, provided the dataset is sorted.

  1. Find the length of the data set.
  2. Find the lowest (index 0), highest (index n), and the middle index of the dataset.
  3. Find the subsequent elements residing in the first, last, and middle index.
  4. Check if the element of interest is the middle element.
  5. If not, check if the element-of-interest is higher or lower than the middle element.
  6. If it is higher, assuming the dataset is sorted in an increasing order, move the lower index to one above the middle index.
  7. if it is lower, move the highest index to one below the middle index.
  8. Check if the element of interest is the middle element in the new/shorter dataset.
  9. Continue till the element of interest is found.
binary_search_depiction-svg
Binary Search – Source: Wikipedia

The figure above shows how Binary Search works on a dataset of 16 elements, to find the element 7.

  • Index 0 , Index 16, and the middle index are noted.
  • Subsequent values/elements at these indices are found out as well.
  • Check if the element of interest 7 is equal to, lower, or higher than the middle element 14 at index 8.
  • Since it’s lower and the dataset is sorted in an increasing order, the search moves to the left of the middle index, ie.. from index 0 to index 7.
  • —-
  • The lower index is now 0, the higher index is now 7, and the middle index is now 3, the element in the middle index is 6.
  • Check if the element of interest 7 is lower or higher than the middle element 6 at index 3.
  • Since it’s higher and the dataset is sorted in an increasing order, the search moves to the right of the middle index, ie.. from index 4 to index 7.
  • —-
  • So on and so forth.. till we arrive at the element of interest, ie.. 7.

As noted earlier, the data set is divided into half in each iteration. A numeric representation on how Binary search progress can be seen as:

100 elements -> 50 elements -> 25 elements -> 12 elements -> 6 elements – 3 elements -> 1 element

Code

Example 1 : (Data set sorted in Increasing order)

def binary_search(my_list, item):
    low_position = 0
    high_position = len(my_list) - 1

    while low_position = high_position:
        mid_position = (low_position + high_position) // 2
        mid_element = my_list[mid_position]

        if mid_element == item:
            print("\nYour search item {0} is at index {1}".format(
                  item, mid_position))
            return mid_element

        elif mid_element <= item:
            high_position = mid_position - 1

        else:
            low_position = mid_position + 1
    return None

if __name__ == "__main__":
    my_list = [1, 2, 3, 4, 5, 6]
    binary_search(my_list, 3)

Example 2 : (Same as above, with statements on how the search progresses)

def binary_search(my_list, item):

    # Find and set the low and high positions of the data set
    # Note that these are not the values, but just positions.
    low_position = 0
    high_position = len(my_list) - 1

    # Calculate the Complexity
    import math
    complexity = math.ceil(math.log(len(my_list), 2))

    # Print some info on the dataset
    print("\nDataset size : {0} elements".format(len(my_list)))
    print("Element of interest : {0}".format(item))
    print("Maximum number of iterations to find {0} : {1}\n".format(
item, complexity))

    while low_position <= high_position:

        # Find the middle position from the low and high positions
        mid_position = (low_position + high_position) // 2

        # Find the element residing in the middle position of the data set.
        mid_element = my_list[mid_position]

        print("Element at min index : {0}".format(my_list[low_position]))
        print("Element at max index : {1}".format(high_position,
            my_list[high_position]))
        print("Element at mid index {0} : {1}".format(mid_position,
            mid_element))

        if mid_element == item:
            print("\nYour search item {0} is at index {1}".format(
                item, mid_position))
            return mid_element

        elif mid_element > item:
            high_position = mid_position - 1
            print("{0} in the left subset, omitting the right subset\n".format(item))

        else:
            low_position = mid_position + 1
            print("{0} in the right subset, omitting the left subset\n".format(item))

    print("Element of interest not in dataset\n")
    return None

if __name__ == "__main__":
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
binary_search(my_list, 13)

Observations:

  1. Binary Search needs a Sorted dataset to work, either increasing or decreasing.
  2. It finds the element of interest in logarithmic time, hence is also known as, Logarithmic Search.
  3. Binary Search searches through a dataset of n elements in log(n) iterations, in the worst case scenario.

NOTE:

All the examples used in this blog are available at  https://github.com/arvimal/DataStructures-and-Algorithms-in-Python, with more detailed notes.

References:

  1. https://en.wikipedia.org/wiki/Binary_search_algorithm
  2. http://quiz.geeksforgeeks.org/binary-search/
  3. https://www.hackerearth.com/practice/algorithms/searching/binary-search/tutorial/
  4. http://research.cs.queensu.ca/home/cisc121/2006s/webnotes/search.html

Linear Search – Algorithm Study

Introduction

Linear Search is an way to search a data set for an element of interest. It is one of the many search algorithms available and is also the most direct and simple of the lot.

Linear search looks for the element of interest in a dataset starting from the first element and moves on to the consecutive elements till it finds the one we’re interested in. Due to this behaviour, it’s not the fastest search algorithm around.

In the worst case, when the element of interest is the last (or near-last) in the data set, linear-search has to sift through till the end. Hence, in a worst-case scenario, the larger the data set is, the more the iterations it take to find the element of interest. Hence, the performance of Linear search takes a toll as the data set grows.

Linear search works on sorted and unsorted data sets equally, since it has to go through the elements one by one and so doesn’t mind if the data is ordered or not.

Performance

1. Worst-case performance: O(n)

A worst-case analysis is done with the upper bound of the running time of the algorithm. ie.. the case when the maximum number of operations are executed.

The worst-case scenario for a linear search happens when the element-of-interest is not present in the dataset. A near worst-case scenario is when the element-of-interest is the last element of the dataset. In the first case, the search has to go through each element only to find that the element is not present in the dataset. In the second scenario, the search has to be done till the last element, which still takes n iterations.

In the worst-case, the performance is O(n), where  n  is the number of elements in the dataset.

2. Best-case performance: O(1)

In the best-case, where the element-of-interest is the first element in the dataset, only one search/lookup is needed. Hence the performance is denoted as O(1), for n elements.

3. Average performance: O(n/2)

On an average, the performance can be denoted as O(n/2).

Observations:

  • Linear Search iterates through every element in the dataset until it finds the match.
  • In Linear Search, the number of iterations grows linearly if the data set grows in size.
  • This algorithm is called  Linear Search  due to this linear increase in the complexity depending on the dataset.
  • The best case scenario is when the first iteration finds the element.
  • The Worst case is when the element of interest is not present in the dataset.
  • A very near worse case is when the element of interest is the last one in the dataset.

How does Linear Search work?

Linear search progresses as following:

1. Takes in a dataset as well as an element of interest.
2. Checks if the first element is the element of interest.
3. If yes, returns the element.
4. If not, move to the next element in the dataset.
5. Iterate till the dataset is exhausted.
6. Return  None if the element of interest is not present in the dataset.

Code

def linear_search(my_list, item):
    """Linear search"""

    low_position = 0
    high_position = len(my_list) - 1

    while low_position < high_position:

        if my_list[low_position] == item:
            print("Your search item {0} is at position {1}".format(
                item, low_position))
            return low_position
        else:
            low_position += 1

if __name__ == "__main__":
    my_list = [1, 2, 3, 4, 5, 6]
    linear_search(my_list, 5)

 

Reference:

  1. http://quiz.geeksforgeeks.org/linear-search/
  2. http://research.cs.queensu.ca/home/cisc121/2006s/webnotes/search.html

Accessor and Mutator methods – Python

A method defined within a class can either be an Accessor or a Mutator method.

An Accessor method returns the information about the object, but do not change the state or the object.

A Mutator method, also called an Update method, can change the state of the object.

Consider the following example:

In [10]: a = [1,2,3,4,5]

In [11]: a.count(1)
Out[11]: 1

In [12]: a.index(2)
Out[12]: 1

In [13]: a
Out[13]: [1, 2, 3, 4, 5]

In [14]: a.append(6)

In [15]: a
Out[15]: [1, 2, 3, 4, 5, 6]

The methods a.count() and a.index() are both Accessor methods since it doesn’t alter the object a in any sense, but only pulls the relevant information.

But a.append() is a mutator method, since it effectively changes the object (list a) to a new one.

In short, knowing the behavior of a method is helpful to understand how it alters the objects it acts upon.

Python, Objects, and some more..

Everything in Python is an object, what does that mean? This post tries to discuss some very basic concepts.

What does the following assignment do?


a = 1

Of course, anyone dabbled in code knows this. The statement above creates a container `a` and stores the value `1` in it.

But it seem that’s not exactly what’s happening, at least from Python’s view-point.

When a = 1 is entered or executed by the python interpreter, the following happens in the backend, seemingly unknown to the user.

  • The Python interpreter evaluates the literal 1 and tries to understand what data type can be assigned for it.
    • There are several in-built data types such as str, float, bool, list, dict, set etc..
    • Builtin types are classes implemented in the python core.
    • For a full list of types and explanation, read the python help at python-> help()-> topics -> TYPES
    • Read the help sections for builtin types, eg.. help(int), help(list) etc..
  • The interpreter finds the appropriate builtin type for the literal. Since the literal 1 fits the type int, the interpreter creates an instance from class int() in memory.
    • This instance is called an object since it’s just a blob with some metadata.
    • This object has a memory address, a value, a name in one or more namespace, some metadata etc..
    • type(a) helps in understanding the instance type.
    • In short, an assignment statement simply creates an instance in memory from a pre-defined class.
  • The interpreter reads the LHS (Left hand side) of the statement a = 1, and creates the name a in the current namespace.
    • The name in the namespace is a reference to the object in memory.
    • Through this reference, we can access the data portion as well as the attributes of that object.
    • A single object can have multiple names (references).
  • The name a created in the current namespace is linked to the corresponding object in memory.

When a name that’s already defined is entered at the python prompt, the interpreter reads the namespace, finds the name (reference), goes to the memory location it’s referring to, and pull the value of the object, and prints it on-screen.

Every object has the following features:

  • A single value, available in its data section.
In [1]: a = 1

In [2]: a
Out[2]: 1
  • A single type, since the object is an instance of a pre-defined type class such as int , float etc..
In [3]: type(a)
Out[3]: int
  • Attributes either inherited from the parent type class or defined by the user.
In [10]: dir(a)
Out[10]:
['__abs__',
'__add__',
'__and__',
'__bool__',
'__ceil__',
'__class__',
'__delattr__',
'__dir__',
'__divmod__',
'__doc__',
'__eq__',
'__float__',
...[content omitted]
'__setattr__',
'__sizeof__',
'__str__',
'__sub__',
'__subclasshook__',
'__truediv__',
'__trunc__',
'__xor__',
'bit_length',
'conjugate',
'denominator',
'from_bytes',
'imag',
'numerator',
'real',
'to_bytes']
  • One or more base classes. All new-stlye classes in Python ultimately inherits from the object class.
In [4]: type(a)
Out[4]: int

In [5]: int.mro()
Out[5]: [int, object]

NOTE: a is an instance of the int class, and int inturn inherits from the object class. Read more on Method Resolution Order.


  • A unique ID representing the object.
In [6]: id(a)
Out[6]: 140090033476640
  • Zero, One, or more names.
    • Use dir() to check the current namespace.
    • Use dir(<object-name>) to refer the indirect namespace.

Several other builtins are available in the default namespace without defining them specifically, possible due to the inclusion of the builtin module available under the reference __builtin__ in the current namespace.

For a full list of the pre-defined variables, refer dir(__builtins__)help(__builtin__) or help(builtins) after an import builtins.

A few questions and observations:

Q1. How can an assignment have zero names in the namespace?

Ans: An assignment such as a = 1 creates an object in memory and creates a corresponding name (a in our case) in the namespace. a acts as a reference to the object in memory.

But, simply entering 1 at the python prompt creates an object in memory which is an instance of a type class, without creating the reference in the namespace.

Objects which don’t have a reference from the current namespace are usually garbage-collected due to lack of references. Hence, an object which doesn’t have a reference (a name), or had multiple references (more than one names) but had them deleted (for example, del() gets garbage-collected by python.

If the assignment 1 happens to be at a python prompt, it echoes the literal back after creating the object and reference since the prompt is essentially a REPL (Read Eval Print loop)

Q2. Can an object have more than one name references?

Ans: It’s perfectly fine to have more than one reference to a single object. The example below should explain things very well.

In [1]: a = 5000

In [2]: id(a)
Out[2]: 140441367080400

In [3]: b = a

In [4]: b
Out[4]: 5000

In [5]: id(b)
Out[5]: 140441367080400

In [6]: c = 5000

In [7]: id(c)
Out[7]: 140441367080432

In [8]: a is b
Out[8]: True

In [9]: a == b
Out[9]: True

In [10]: a is c
Out[10]: False

In [11]: a == c
Out[11]: True

The example shown above creates an object with value 5000 and assign it a name a in the current namespace. We checked the identifier of the object using id(a) and found out it to be 140441367080400.

As the next step, we created another name in the namespace, ie.. b which takes in whatever a points to. Hence, b would default to 5000 and it will have the same identifier as a.

This shows that an object in memory can have multiple references in a namespace.

Another object of value 5000 is created with a name c , but we can see that the identifier differs from what id(a) and id(b) is. This shows that c points to an entirely different object in memory.

To test if a is exactly the same object as b, use the keyword is. Meanwhile, if you want to test if two objects contain the same value, use the equality == symbol.

Inheritance and super() – Object Oriented Programming

super() is a feature through which inherited methods can be accessed, which has been overridden in a class. It can also help with the MRO lookup order in case of multiple inheritance. This may not be obvious first, but a few examples should help to drive the point home.

Inheritance and method overloading was discussed in a previous post, where we saw how inherited methods can be overloaded or enhanced in the child classes.

In many scenarios, it’s needed to overload an inherited method, but also call the actual method defined in the Parent class.

Let’s start off with a simple example based on Inheritance, and build from there.

Example 0:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class!")

class ChildClass(MyClass):
    pass

my_instance_1 = ChildClass()
my_instance_1.func()

This outputs:

In [18]: %run /tmp/super-1.py
I'm being called from the Parent class

In Example 0, we have two classes, MyClass and ChildClass. The latter inherits from the former, and the parent class MyClass has a method named func defined.

Since ChildClass inherits from MyClass, the child class has access to the methods defined in the parent class. An instance is created my_instance_2, for ChildClass.

Calling my_instance_1.func() will print the statement from the Parent class, due to the inheritance.

Building up on the first example:

Example 1:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm being called from the Child class")

my_instance_1 = MyClass()
my_instance_2 = ChildClass()

my_instance_1.func()
my_instance_2.func()

This outputs:

In [19]: %run /tmp/super-1.py
I'm being called from the Parent class
I'm being called from the Child class

This example has a slight difference, both the child class as well as the parent class have the same method defined, ie.. func. In this scenario, the parent class’ method is overridden by the child class method.

ie.. if we call the func() method from the instance of ChildClass, it need not go a fetch the method from its Parent class, since it’s already defined locally.

NOTE: This is due to the Method Resolution Order, discussed in an earlier post.

But what if there is a scenario that warranties the need for specifically calling methods defined in the Parent class, from the instance of a child class?

ie.. How to call the methods defined in the Parent class, through the instance of the Child class, even if the Parent class method is overloaded in the Child class?

In such a case, the inbuilt function super() can be used. Let’s add to the previous example.

Example 2:

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm actually being called from the Child class")
        print("But...")
        # Calling the `func()` method from the Parent class.
        super(ChildClass, self).func()

my_instance_2 = ChildClass()
my_instance_2.func()

This outputs:

In [21]: %run /tmp/super-1.py
I'm actually being called from the Child class
But...
I'm being called from the Parent class

How is the code structured?

  1. We have two classes MyClass and ChildClass.
  2. The latter is inheriting from the former.
  3. Both classes have a method named func
  4. The child class ChildClass is instantiated as my_instance_2
  5. The func method is called from the instance.

How does the code work?

  1. When the func method is called, the interpreter searches it using the Method Resolution Order, and find the method defined in the class ChildClass.
  2. Since it finds the method in the child class, it executes it, and prints the string “I’m actually being called from the Child class”, as well “But…”
  3. The next statement is super which calls the method func defined in the parent class of ChildClass
  4. Since the control is now passed onto the func method in the Parent class via super, the corresponding print() statement is printed to stdout.

Example 2 can also be re-written as :

class MyClass(object):

    def func(self):
        print("I'm being called from the Parent class")

class ChildClass(MyClass):

    def func(self):
        print("I'm actually being called from the Child class")
        print("But...")
        # Calling the `func()` method from the Parent class.
        # super(ChildClass, self).func()
        MyClass.func(self)  # Call the method directly via Parent class

my_instance_2 = ChildClass()
my_instance_2.func()

 

NOTE: The example above uses the Parent class directly to access it’s method. Even though it works, it is not the best way to do it since the code is tied to the Parent class name. If the Parent class name changes, the child/sub class code has to be changed as well. 

Let’s see another example for  super() . This is from our previous article on Inheritance and method overloading.

Example 3:

import abc

class MyClass(object):

    __metaclass__ = abc.ABCMeta

    def my_set_val(self, value):
        self.value = value

    def my_get_val(self):
        return self.value

    @abc.abstractmethod
    def print_doc(self):
        return

class MyChildClass(MyClass):

    def my_set_val(self, value):
        if not isinstance(value, int):
            value = 0
        super(MyChildClass, self).my_set_val(self)

    def print_doc(self):
        print("Documentation for MyChild Class")

my_instance = MyChildClass()
my_instance.my_set_val(100)
print(my_instance.my_get_val())
print(my_instance.print_doc())

The code is already discussed here. The my_set_val method is defined in both the child class as well as the parent class.

We overload the my_set_val method defined in the parent class, in the child class. But after enhancing/overloading it, we call the my_set_val method specifically from the Parent class using super() and thus enhance it.

Takeaway:

  1. super() helps to specifically call the Parent class method which has been overridden in the child class, from the child class.
  2. The super() in-built function can be used to call/refer the Parent class without explicitly naming them. This helps in situations where the Parent class name may change. Hence, super() helps in avoiding strong ties with class names and increases maintainability.
  3. super() helps the most when there are multiple inheritance happening, and the MRO ends up being complex. In case you need to call a method from a specific parent class, use super().
  4. There are multiple ways to call a method from a Parent class.
    1. <Parent-Class>.<method>
    2. super(<ChildClass>, self).<method>
    3. super().<method>

References:

  1. https://docs.python.org/2/library/functions.html#super
  2. https://rhettinger.wordpress.com/2011/05/26/super-considered-super/
  3. https://stackoverflow.com/questions/222877/how-to-use-super-in-python

Inheritance and Method overloading – Object Oriented Programming

Inheritance is a usual theme in Object Oriented Programming. Because of Inheritance, the functions/methods defined in parent classes can be called in Child classes which enables code reuse, and several other features. In this article, we try to understand some of those features that come up with Inheritance.

We’ve discussed Abstract Methods in an earlier post, which is a feature part of Inheritance, and can be applied on child classes that inherits from a Parent class.

E the methods which are inherited can also be seen as another feature or possibility in Inheritance. In many cases, it’s required to override or specialize the methods inherited from the Parent class. This is of course possible, and is called as ‘Method Overloading’.

Consider the two classes and its methods defined below:

Example 0:

import abc

class MyClass(object):

    __metaclass__ = abc.ABCMeta

    def __init__(self):
        pass

    def my_set_method(self, value):
        self.value = value

    def my_get_method(self):
        return self.value

    @abc.abstractmethod
    def printdoc(self):
        return

class MyChildClass(MyClass):

    def my_set_method(self, value):
        if not isinstance(value, int):
            value = 0
        super(MyChildClass, self).my_set_method(self)

    def printdoc(self):
        print("\nDocumentation for MyChildClass()")

instance_1 = MyChildClass()
instance_1.my_set_method(10)
print(instance_1.my_get_method())
instance_1.printdoc()

 

We have two classes, the parent class being MyClass and the child class being MyChildClass.

MyClass has three methods defined.

  • my_set_method()
  • my_get_method()
  • printdoc()

The printdoc() method is an Abstract method, and hence should be implemented in the Child class as a mandatory method.

The child class MyChildClass inherits from MyClass and has access to all it’s methods.

Normally, we can just go ahead and use the methods defined in MyClass , in MyChildClass. But there can be situations when we want to improve or build upon the methods inherited. As said earlier, this is called Method Overloading.

MyChildClass extends the parent’s my_set_method() function by it’s own implementation. In this example, it does an additional check to understand if the input value is an int or not, and then calls the my_set_method() of it’s parent class using super. Hence, this method in the child class extends the functionality prior calling method in the parent. A post on super is set for a later time.

Even though this is a trivial example, it helps to understand how the features inherited from other classes can be extended or improved upon via method overloading.

The my_get_method() is not overridden in the child class but still called from the instance, as instance_1.my_get_method(). We’re using it as it is available via Inheritance. Since it’s defined in the parent class, it works in the child class’ instance when called, even if not overridden.

The printdoc() method is an abstract method and hence is mandatory to be implemented in the child class, and can be overridden with what we choose to do.

Inheritance is possible from python builtins, and can be overridden as well. Let’s check out another example:

Example 1:

class MyList(list):

    def __getitem__(self, index):
        if index == 0:
            raise IndexError
        if index &gt; 0:
            index -= 1
        return list.__getitem__(self, index)

    def __setitem__(self, index, value):
        if index == 0:
            raise IndexError
        if index &gt; 0:
            index -= 1
        list.__setitem__(self, index, value)

x = MyList(['a', 'b', 'c'])
print(x)
print("-" * 10)

x.append('d')
print(x)
print("-" * 10)

x.__setitem__(4, 'e')
print(x)
print("-" * 10)

print(x[1])
print(x.__getitem__(1))
print("-" * 10)

print(x[4])
print(x.__getitem__(4))

This outputs:

['a', 'b', 'c']
----------
['a', 'b', 'c', 'd']
----------
['a', 'b', 'c', 'e']
----------
a
a
----------
e
e

How does the code work?

The class MyList() inherits from the builtin list. Because of the inheritance, we can use list’s available magic methods such as __getitem__() , __setitem__() etc..

NOTE: In order to see the available methods in list, use dir(list).

  1. We create two functions/methods named `__getitem__()` and `__setitem__()` to override the inherited methods.
  2. Within these functions/methods, we set our own conditions.
  3. Wie later call the builtin methods directly within these functions, using
    1. list.__getitem__()
    2. list.__setitem__()
  4. We create an instance named x from MyList().
  5. We understand that
    1. x[1] and x.__getitem__(1) are same.
    2. x[4, 'e'] and x.__setitem__(4, 'e') are same.
    3. x.append(f) is same as x.__setitem__(<n>, f) where <n> is the element to the extreme right which the python interpreter iterates and find on its own.

Hence, in Inheritance, child classes can:

  • Inherit from parent classes and use those methods.
    • Parent classes can either be user-defined classes or buitins like list , dict etc..
  • Override (or Overload) an inherited method.
  • Extend an inherited method in its own way.
  • Implement an Abstract method the parent class requires.

Reference:

  1. Python beyond the basics – Object Oriented Programming