Noah Watkins

github
twitter
linkedin

Dynamic RADOS object interfaces with Lua

In this post I’m going to demonstrate how to dynamically extend the interface of objects in RADOS using the Lua scripting language, and then build an example service for image thumbnail generation and storage that performs remote image processing inside a target object storage device (OSD). We’re gonna have a lot of fun.

Note that this is a re-post of the article appearing at http://ceph.com/rados/dynamic-object-interfaces-with-lua/ which was published on October 29, 2013.

RADOS Object Classes

One of the less publicized features of the RADOS object store is the ability to extend the object interface by writing C/C++ plugins that add new remote execution targets that may perform arbitrary operations on object data. The ability to add user-defined functionality to the OSD is a very powerful feature allowing applications to reduce network round-trips and data movement, exploit remote resources, and simplify otherwise complex interfaces by taking advantage of the transactional context within which remote operations execute. But that’s enough marketing—here is a very simple example that computes the MD5 hash of an object without transferring the object payload over the network.

Example: MD5 Hash of Object

The straightforward method for a client to compute the MD5 hash of an object is to first retrieve the entire object and then apply the MD5 hash function to the data locally. Using librados and the crypotpp library, this might look something like the following:

bufferlist data;
size_t size;

ioctx.read("my_obj", data, 0, 0);

byte digest[AES::BLOCKSIZE];
MD5().CalculateDigest(digest, (byte*)data.c_str(), data.length());

Here the client first reads the entire object over the network, and then computes the MD5 hash of the object data. However, transferring the entire object to the client can be avoided by introducing a custom object interface for computing the MD5 hash within the storage system. The following code snippet illustrates the basics of how an MD5 hash could be computed using the object class facility. Note that the following code would in practice be compiled into a shared library and loaded dynamically into a running OSD process, but we have omitted the deployment details to keep things simple (there are links at the end of this section to more information on getting started with object classes).

int compute_md5(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
  size_t size;
  int ret = cls_cxx_stat(hctx, &size, NULL);
  if (ret < 0)
    return ret;

  bufferlist data;
  ret = cls_cxx_read(hctx, 0, size, data);
  if (ret < 0)
    return ret;

  byte digest[AES::BLOCKSIZE];
  MD5().CalculateDigest(digest, (byte*)data.c_str(), data.length());

  out->append(digest, sizeof(digest));
  return 0;
}

Before explaining the function compute_md5, let’s see how a client would remotely invoke compute_md5 to calculate the hash:

bufferlist input, output;
ioctx.exec("my_obj", "my_hash_class", "compute_md5", input, output);

Here the client runs the librados exec method to invoke the compute_md5 function remotely on the object named “my_obj”. Note that the “my_hash_class” is a name that identifies the plugin (not shown in this tutorial), and may contain many functions that can be invoked remotely. Now, through the power of networking, and lots of hand waving, a client can invoke the compute_md5 function above which will run remotely on the OSD storing the target object (these are lots of gory details about how this actually happens that are beyond the scope of this document). When the remote method is executed, it performs a transaction that atomically reads the object payload and computes the MD5 hash, all within the OSD process, avoiding any network transfer of object data. At the end of the compute_md5 function the digest is written into the out parameter that will be marshaled back to the client.

Now that is some pretty magical stuff right there. But, there are situations where the overhead of compiling C/C++ into a shared library–potentially with multiple target architectures–is too heavy weight. It’d be nice if we could inject and alter object interfaces on-the-fly. To address this need, we have created a mechanism for defining new object classes using the Lua scripting language, which I’ll describe next.

Additional Resources: Object Class Development

While it was necessary to introduce the concept of object classes, unfortunately a full tutorial on the subject is not in the scope of this post. Located on Github is a “Hello, World” example object class containing extensive documentation. This resource is a good starting point, and if you have questions, please do not hesitate to ask questions on the Ceph mailing lists or IRC channels.

Dynamic Object Classes With Lua

In order to support dynamic generation of object interfaces, we have embedded the LuaJIT VM inside the OSD process. Why Lua, you may ask? The Lua language and its run-time are specifically designed as an embedded language, and when coupled with the LuaJIT virtual machine, near native performance can be achieved. Briefly, the current implementation expects a Lua script defining any number of functions to be sent to the OSD along with a client request that specifies which specific function in the script to execute. Now let’s dig into the details.

A Lua object class is an arbitrary Lua script containing at least one exported function handler that a client may invoke remotely. By building up a collection of handlers, new and interesting interfaces to objects can be constructed and dynamically loaded into a running RADOS cluster. The basic structure of a Lua object class is shown in the following code snippet:

-- helper modules
-- helper functions
-- etc...

function helper()
end

function handler1(input, output)
  helper()
end

function handler2(input, output)
end

objclass.register(handler1)
objclass.register(handler2)

In the above Lua script any number of functions and modules can be used to support the behavior exported by the functions handler1 and handler2. A client can remotely execute any registered function, provide an arbitrary input, and receive an arbitrary output.

Handler Registration

Object classes written in Lua may have many functions, only a subset of which are handlers available to be directly invoked by a client. In order to make a Lua function available, the function must be exported by registering it. This is done using the objclass.register function. The following code snippet illustrates how this works.

function helper()
  -- help out with stuff
end

function thehandler(input, output)
  helper()
end

objclass.register(thehandler)

In the above example objclass.register(thehandler) exports the function thehandler, making it available for clients to call. A client that attempts to call the helper function (an unregistered function), will receive a return value of -ENOTSUPP.

Error Handling Semantics

In the previous section we presented an example object class method written in C++ that calculated the MD5 hash of an object. Returning to this example, notice that each operation on the object is carefully checked for failure, and an error code is returned if any operation fails. When a negative value is returned from an object class handler the current transaction will be aborted, and the return value is passed back to the client. When the handler has completed successfully a return value of zero will commit the transaction. While in C++ we must perform these checks explicitly, in Lua this common pattern for handling errors can be fully managed. Take as an example the following C++ object class handler:

int handle1(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
  int ret = cls_cxx_create(hctx, true);
  if (ret < 0)
    return ret;
  ...
  return 0;
}

The handler handle1 will return -EEXIST if the object already exists (or any other error encountered when running cls_cxx_create), and return zero if the handler complete successfully. The same functionality can be constructed in Lua, but when error handling fits this common pattern of aborting automatically, the Lua object class run-time will automagically select the correct return value. For instance in the following example, handle2 and handle3 have identical semantics to handle1 defined above in C++.

function handle2(input, output)
  objclass.create(true);
  return 0;
end

function handle3(input, output)
  objclass.create(true);
end

objclass.register(handle2)
objclass.register(handle3)

Some operations return error codes that we may want to handle directly. For example, when retrieving a value from the object map, -ENOENT is used to indicate that the given key was not found. If the handler code can deal with this case (e.g. creating and initializing a new key), then it is simple enough to just return all other error codes. This exact scenario is shown in the following C++ handler, in which we abort on any error code that is not -ENOENT.

int handle(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
  string key;
  ::decode(key, *in);
  int ret = cls_cxx_map_get_val(hctx, key, &bl);
  if (ret < 0 && ret != -ENOENT)
    return ret;
  if (ret == -ENOENT) {
    /* initialize new key */
  }
  ...
  return 0;
}

The same handler can be constructed in Lua as follows:

function handle(input, output)
  key = input:str()
  ok, ret_or_val = pcall(objclass.map_get_val, key)
  if not ok then
    if ret_or_val ~= -objclass.ENOENT then
      return ret_or_val
    else
      -- initialize new key
    end
  end
  val = ret_or_val
  ...
  return 0
end

The trick here is to call the objclass.map_get_val in protected mode via the Lua pcall function, which prevents any errors from being automatically propagated to the caller, allowing our handler to examine the return value.

Logging

An object class can write into the OSD log (e.g. /var/log/ceph/osd-0.log) to record debugging information using the objclass.log function. The function takes any number of arguments which are converted into strings and separated by spaces in the final output. If the first argument is numeric then it is interpreted as a log-level. If no log-level is specified a default log-level is used.

objclass.log('hi')         -- will log 'hi'
objclass.log(0, 'ouch')    -- log 'ouch' at log-level = 0
objclass.log('foo', 'bar') -- log 'foo bar'
objclass.log(1)            -- will log '1' at default log-level

Logging is useful in debugging script execution and can also be used to provide more detailed error information.

Object Payload I/O

The payload data of an object can be read from and written to using the objclass.read and objclass.write functions. Each function takes an offset and length parameter.

size, mtime = objclass.stat()
data = objclass.read(0, size)          -- size bytes from offset 0
objclass.write(0, data:length(), data) -- length of data at offset 0

Index Access

A key/value store supporting range queries (based on Google’s LevelDB) can be accessed using the objclass.map_set_val and objclass.map_get_val functions. A key can be any string and a value is a standard blob of any size.

function handler(input, output)
  objclass.map_set_val("foo", input)
  data = objclass.map_get_val("foo")
  assert(data == input)
end

Additional Resources

The Lua object class facility is not yet in the mainline Ceph tree. The feature is located in the cls-lua branch, and can be checked out from github:

git clone git://github.com/ceph/ceph.git cls-lua

The normal procedures for building and installing Ceph from source apply, and the only dependency is that LuaJIT development libraries be installed. These dependencies are available on Ubuntu. In addition, more functionality than is listed in this post has been implemented, and a set of unit tests are available in the source tree demonstrating the full range of features.

Lua Client Libraries

Before we jump into the sample application, I’ll introduce two additional components that will make our life easier. The first is Lua bindings for librados, and the second is a Lua library that hides the details of serializing Lua scripts for execution within the OSD.

Lua-RADOS

Lua bindings for the librados client library are available on Github in the lua-rados project. Here we will provide a brief overview for context. Please consult the full documentation for additional information. Ok, let’s jump right in. The following code snippet shows how to connect to a RADOS cluster:

local rados = require "rados"

local cluster = rados.create()
cluster:conf_read_file()
cluster:connect()

Next, open a client I/O context for a particular pool:

local ioctx = cluster:open_ioctx('data')

Now the Lua client can interact with objects, such as setting an extended attribute:

local name = 'xattr key'
local data = 'i am some important data'
ioctx:setxattr('my_obj', name, data, #data)

Those are the basics of writing RADOS clients in Lua. Now, let’s run some remote scripts from a Lua client.

Cls-Lua Client

The protocol for sending a script to an OSD is fairly simple, but is easily wrapped up in a convenience library. The cls-lua-client library does just that, building on top of the lua-rados library described in the previous section. Assuming that we have connected to a RADOS cluster and constructed an I/O context object, a remote Lua script can be executed as in the following example. First, let’s create a Lua string containing the script we want to execute.

local script = [[
function say_hello(input, output)
  output:append("Hello, ")
  if #input == 0 then
    output:append("world")
  else
    output:append(input:str())
  end
  output:append("!")
end
objclass.register(say_hello)
]]

The script above will send to its output the string “Hello, world!” if the input is zero-length. Otherwise, it will reply with “Hello, input!”, where input is substituted with the input sent from the client. This can be remotely executed using the cls-lua-client library as follows:

local ret, outdata = clslua.exec(ioctx, "oid", script, "say_hello", "")
print(outdata)

local ret, outdata = clslua.exec(ioctx, "oid", script, "say_hello", "John")
print(outdata)

Executing this would produce the output:

Hello, world!
Hello, John!

Great, now we have all the pieces to start building a sample application!

Example Application: Image Thumbnail Service

As a driving example we will construct a service on top of RADOS that stores and generates image thumbnails. The service is very simple, and has the following properties.

  • Writing an image into an object sets the “base” or “original” image data.
  • A thumbnail computed from the base image can be generated remotely inside the OSD.
  • The original image and any generated thumbnail can be retrieved.

In the following examples I’ll demonstrate the core of the service. In practice these routines would be added to a larger project or executable, and of course made more robust against errors and different edge case scenarios. A fully functional example of this can be found in the cls-lua-client project on github.

Storing an Image

To store an image in RADOS we first read it from a local file, and then write it to the object. In order to support storage and retrieval of different thumbnails, we record the location and size of an image blob in the object index under a key describing it. In this simple example writing an image sets its base image, so we store it under the key “original”.

function put(object, filename)
  -- read in image blob from file
  local file = io.open(filename, "rb")
  local img = file:read("*all")

  -- write the blob into the object
  local size, offset = #img, 0
  ioctx:write(object, img, size, offset)

  -- record size/offset in the object index
  local loc_spec = size .. "@" .. offset
  ioctx:omapset(object, {
    original = loc_spec,
  })
end

Reducing Round-trips

In the previous example two round-trips were required to 1) set the object data and 2) update the index. These can be done atomically in a single round-trip by using a co-designed interface, demonstrated in the following script:

function put_smart(object, filename)
  -- define the script to execute remotely
  local script = [[
  function put(img)
    -- write the input blob
    local size, offset = #img, 0
    objclass.write(offset, size, img)

    -- update the leveldb index
    local loc_spec_bl = bufferlist.new()
    local loc_spec = size .. "@" .. offset
    loc_spec_bl:append(spec)
    objclass.map_set_val("original", loc_spec_bl)
  end
  objclass.register(store)
  ]]

  -- read the input image blob from the file
  local file = io.open(filename, "rb")
  local img = file:read("*all")

  -- remotely execute script with image as input
  clslua.exec(ioctx, object, script, "put", img)
end

The script reads the image from the file and sends the image as the input to a script which executes on the OSD, taking care of the write and index update at the same time. Neat!

Retrieving an Image

To read a particular version of an image we need to look-up the offset and length for the target image blob stored in the object index. In the following example the index look-up and object read are performed remotely, and the image is returned to the client if it exists. In the next section I’ll show how the spec string is stored, but for context it describes the specification for creating a thumbnail (e.g. 500×400 pixels).

function get(object, filename, spec)
  local script = [[
  function get(input, output)
    -- lookup the location of the image given the spec
    local loc_spec_bl = objclass.map_get_val(input:str())
    local size, offset = string.match(loc_spec_bl:str(), "(%d+)@(%d+)")

    -- read and return the image blob from the object
    out_bl = objclass.read(offset, size)
    output:append(out_bl:str())
  end
  objclass.register(get)
  ]]

  -- execute script remotely
  ret, img = clslua.exec(ioctx, object, script, "get", spec)

  -- write image to output file
  local file = io.open(filename, "wb")
  file:write(img)
end

The image returned from the script is then written to the output file.

Generating Thumbnails

Thumbnails are generated using Lua wrappers to ImageMagick available on github at https://github.com/leafo/magick. A thumbnail is generated using the magick.thumb function, passing in an image blob and a thumbnail specification string (e.g. 500×300 pixels). The script that runs remotely first reads the original image, computes the thumbnail, appends the thumbnail to the object payload, and then records the offset and size of the thumbnail in the object index under a key equal to the specification string.

function thumb(object, spec_string)
  local script = [[
  (*local magick = require "magick"

  function get_orig_img()
    -- lookup the location of the original image
    local loc_spec_bl = objclass.map_get_val("original")
    local size, offset = string.match(loc_spec_bl:str(), "(%d+)@(%d+)")

    -- read image into memory
    return objclass.read(offset, size)
  end

  function thumb(input, output)
    -- apply thumbnail spec to original image
    local spec_string = input:str()
    local blob = get_orig_img()
    local img = assert(magick.load_image_from_blob(blob:str()))
    img = magick.thumb(img, spec_string)

    -- append thumbnail to object
    local obj_size = objclass.stat()
    local img_bl = bufferlist.new()
    img_bl:append(img)
    objclass.write(obj_size, #img_bl, img_bl)

    -- save location in leveldb
    local loc_spec = #img_bl .. "@" .. obj_size
    local loc_spec_bl = bufferlist.new()
    loc_spec_bl:append(loc_spec)
    objclass.map_set_val(spec_string, loc_spec_bl)
  end

  objclass.register(thumb)*)
  ]]

  clslua.exec(ioctx, object, script, "thumb", spec_string)
end

And that’s it folks… on-the-fly custom RADOS object interfaces! Want to contribute? We are continually improving the Lua bindings and the internal Lua object class API and are always looking for feedback. Thanks for stopping by!

30bd762cd913e5b33d66499bed483624ef44ed89