Python, Objects, and some more..

Everything in Python is an object, what does that mean? This post tries to discuss some very basic concepts.

What does the following assignment do?

a = 1

Of course, anyone dabbled in code knows this. The statement above creates a container `a` and stores the value `1` in it.

But it seem that’s not exactly what’s happening, at least from Python’s view-point.

When a = 1 is entered or executed by the python interpreter, the following happens in the backend, seemingly unknown to the user.

  • The Python interpreter evaluates the literal 1 and tries to understand what data type can be assigned for it.
    • There are several in-built data types such as str, float, bool, list, dict, set etc..
    • Builtin types are classes implemented in the python core.
    • For a full list of types and explanation, read the python help at python-> help()-> topics -> TYPES
    • Read the help sections for builtin types, eg.. help(int), help(list) etc..
  • The interpreter finds the appropriate builtin type for the literal. Since the literal 1 fits the type int, the interpreter creates an instance from class int() in memory.
    • This instance is called an object since it’s just a blob with some metadata.
    • This object has a memory address, a value, a name in one or more namespace, some metadata etc..
    • type(a) helps in understanding the instance type.
    • In short, an assignment statement simply creates an instance in memory from a pre-defined class.
  • The interpreter reads the LHS (Left hand side) of the statement a = 1, and creates the name a in the current namespace.
    • The name in the namespace is a reference to the object in memory.
    • Through this reference, we can access the data portion as well as the attributes of that object.
    • A single object can have multiple names (references).
  • The name a created in the current namespace is linked to the corresponding object in memory.

When a name that’s already defined is entered at the python prompt, the interpreter reads the namespace, finds the name (reference), goes to the memory location it’s referring to, and pull the value of the object, and prints it on-screen.

Every object has the following features:

  • A single value, available in its data section.
In [1]: a = 1

In [2]: a
Out[2]: 1
  • A single type, since the object is an instance of a pre-defined type class such as int , float etc..
In [3]: type(a)
Out[3]: int
  • Attributes either inherited from the parent type class or defined by the user.
In [10]: dir(a)
...[content omitted]
  • One or more base classes. All new-stlye classes in Python ultimately inherits from the object class.
In [4]: type(a)
Out[4]: int

In [5]: int.mro()
Out[5]: [int, object]

NOTE: a is an instance of the int class, and int inturn inherits from the object class. Read more on Method Resolution Order.

  • A unique ID representing the object.
In [6]: id(a)
Out[6]: 140090033476640
  • Zero, One, or more names.
    • Use dir() to check the current namespace.
    • Use dir(<object-name>) to refer the indirect namespace.

Several other builtins are available in the default namespace without defining them specifically, possible due to the inclusion of the builtin module available under the reference __builtin__ in the current namespace.

For a full list of the pre-defined variables, refer dir(__builtins__)help(__builtin__) or help(builtins) after an import builtins.

A few questions and observations:

Q1. How can an assignment have zero names in the namespace?

Ans: An assignment such as a = 1 creates an object in memory and creates a corresponding name (a in our case) in the namespace. a acts as a reference to the object in memory.

But, simply entering 1 at the python prompt creates an object in memory which is an instance of a type class, without creating the reference in the namespace.

Objects which don’t have a reference from the current namespace are usually garbage-collected due to lack of references. Hence, an object which doesn’t have a reference (a name), or had multiple references (more than one names) but had them deleted (for example, del() gets garbage-collected by python.

If the assignment 1 happens to be at a python prompt, it echoes the literal back after creating the object and reference since the prompt is essentially a REPL (Read Eval Print loop)

Q2. Can an object have more than one name references?

Ans: It’s perfectly fine to have more than one reference to a single object. The example below should explain things very well.

In [1]: a = 5000

In [2]: id(a)
Out[2]: 140441367080400

In [3]: b = a

In [4]: b
Out[4]: 5000

In [5]: id(b)
Out[5]: 140441367080400

In [6]: c = 5000

In [7]: id(c)
Out[7]: 140441367080432

In [8]: a is b
Out[8]: True

In [9]: a == b
Out[9]: True

In [10]: a is c
Out[10]: False

In [11]: a == c
Out[11]: True

The example shown above creates an object with value 5000 and assign it a name a in the current namespace. We checked the identifier of the object using id(a) and found out it to be 140441367080400.

As the next step, we created another name in the namespace, ie.. b which takes in whatever a points to. Hence, b would default to 5000 and it will have the same identifier as a.

This shows that an object in memory can have multiple references in a namespace.

Another object of value 5000 is created with a name c , but we can see that the identifier differs from what id(a) and id(b) is. This shows that c points to an entirely different object in memory.

To test if a is exactly the same object as b, use the keyword is. Meanwhile, if you want to test if two objects contain the same value, use the equality == symbol.

Ceph Rados Block Device (RBD) and TRIM

I recently came across a scenario where the objects in a RADOS pool used for an RBD block device doesn’t get removed, even if the files created through the mount point were removed.

I had an RBD image from an RHCS1.3 cluster mapped to a RHEL7.1 client machine, with an XFS filesystem created on it, and mounted locally. Created a 5GB file, and I could see the objects being created in the rbd pool in the ceph cluster.

1.RBD block device information

# rbd info rbd_img
rbd image 'rbd_img':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1fcbe.2ae8944a
format: 1

An XFS file system was created on this block device, and mounted at /test.

2.Write a file onto the RBD mapped mount point. Used ‘dd’ to write a 5GB file.

# dd if=/dev/zero of=/mnt/rbd_image.img bs=1G count=5
 5+0 records in
 5+0 records out
 5368709120 bytes (5.4 GB) copied, 8.28731 s, 648 MB/s

3.Check the objects in the backend RBD pool

# rados -p rbd ls | wc -l
 &lt; Total number of objects in the 'rbd' pool&gt;

4.Delete the file from the mount point.

# rm /test/rbd_image.img -f
 # ls /test/

5.List the objects in the RBD pool

# rados -p rbd ls | wc -l
< Total number of objects in the 'rbd' pool >

The number of objects doesn’t go down as we expect, after the file deletion. It remains the same, wrt to step 3.

Why does this happen? This is due to the fact that traditional file systems do not delete the underlying data blocks even if the files are deleted.

The process of writing a file onto a file system involves several steps like finding free blocks and allocating them for the new file, creating an entry in the directory entry structure of the parent folder, setting the name and inode number in the directory entry structure, setting pointers from the inode to the data blocks allocated for the file etc..

When data is written to the file, the data blocks are used to store the data. Additional information such as the file size, access times etc.. are updated in the inode after the writes.

Deleting a file involves removing the pointers from the inode to the corresponding data blocks, and also clearing the name<->inode mapping from the directory entry structure of the parent folder. But, the underlying data blocks are not cleared off, since that is a high I/O intensive operation. So, the data remains on the disk, even if the file is not present. A new write will make the allocator take these blocks for the new data, since they are marked as not-in-use.

This applies for the files created on an RBD device as well. The files created on top of the RBD-mapped mount point will ultimately be mapped to objects in the RADOS cluster. When the file is deleted from the mount point, since the entry is removed, it doesn’t show up in the mount point.

But, since the file system doesn’t clear off the underlying block device, the objects remain in the RADOS pool. These would be normally over-written when a new file is created via the mount point.

But this has a catch though. Since the pool contains the objects even if the files are deleted, it consumes space in the rados pool (even if they’ll be overwritten). An administrator won’t be able to get a clear understanding on the space usage, if the pool is used heavily, and multiple writes are coming in.

In order to clear up the underlying blocks, or actually remove them, we can rely on the TRIM support most modern disks offer. Read more about TRIM at Wikipedia.

TRIM is a set of commands supported by HDD/SSDs which allow the operating systems to let the disk know about the locations which are not currently being used. Upon receiving a confirmation from the file system layer, the disk can go ahead and mark the blocks as not used.

For the TRIM commands to work, the disks and the file system has to have the support. All the modern file systems have built-in support for TRIM. Mount the file system with the ‘discard‘ option, and you’re good to go.

# mount -o discard /dev/rbd{X}{Y} /{mount-point}