QuantCell Feature Blog

Here we discuss upcoming features and talk about interesting use cases.

Deployment Into Production From QuantCell

Agust Egilsson - Saturday, November 16, 2013

This blog entry contains a tutorial that explains the inner workings of a how QuantCell addresses deployment of models and applications created in the system as production ready Java APIs using the Tools language.

Deployment and Packaging

QuantCell allows the developer to create models and applications using code snippets from a variety of programming languages. Currently the spreadsheet supports expressions written as traditional spreadsheet formulas, Java code snippets or SQL, R, Python or Groovy code. In addition to these languages the system supports expressions from a so-called “Tools” language that is QuantCell specific and is used to support deployment of solutions created in QuantCell directly into production environments or to cloud infrastructures. An example of such a sheet, combining SQL, code snippets and spreadsheet syntax into an application is shown here below.

Figure: SQL and other expressions displayed in QuantCell's JavaFX client.

The user application created is interactive just like a regular spreadsheet, at nonetheless every piece of it is represented as Java code and then translated into byte code by techniques from the Compiler API (JSR 199), before being further optimized by the JVM.

The Tools Language

In order for applications created in QuantCell to be immediately useful outside the spreadsheet we have built deployment paths that take a model or application created in QuantCell and represent it as a service, Java API’s packaged in a jar file or otherwise as a particular implementation of class. These are just a few of the deployment options that need to be available to the developer and the end-user. Deployment is implemented using the Tools language in QuantCell. It is the only language in the QuantCell environment that is specifically designed to address deployment from QuantCell and to make models and applications independent of QuantCell. The features and commands in the Tools language will grow as we add new deployment paths. Currently, for example, the Tools language supports deployment of QuantCell models as Java archive files (.jar files). When using the Tools language to deploy anonymous classes from models, as compiled byte code or source code stored in Java archives, one can use the package command from the language as follows:

(Anonymously defined cell variables, …) = [tools] package

-jar <preferred location of jar containing byte code>

-java <preferred location of jar containing source code>

-source <source code version>

-target <preferred byte code version>

-override

A typical usage of the Tools language in QuantCell that we will come back to later looks like this:

Figure: Tools language command

This is useful when writing Mappers, Reducers and Combiners for Hadoop jobs running from QuantCell. But let’s start with a less complicated example.

Assume you create an icon/path using JavaFX from QuantCell as shown in cell d7 here:

Figure: Simple deployment example

By issuing the package command seen in cell d9, i.e., “(d7) = [tools] package -jar c:\temp\q-icon-code.jar -java c:\temp\q-icon-javafx.jar –overwrite” QuantCell returns the application created in this sheet as a Java program and API written to the “temp” directory. The actual code created by the package command is shown as below, when opened and formatted in an IDE.

Figure: Deployed code

The class name, shown above, looks a little mechanical, but then again, it is mechanically created anyway.

Hadoop and MapReduce

The above example explains a feature of the Tools language used in QuantCell for deployment. On the other hand, the example is not necessarily practical. Let’s look at a more comprehensive example where we build mappers and reducers in QuantCell and send the analysis to a Rackspace Hadoop cluster for evaluation on a particular data set.  In this example, the packaging command becomes necessary since the analysis is run on an outside cluster requiring us to deploy our code onto the cluster. In this case the MapReduce analysis is created in many pieces in the form of custom functions, reference data and formulas created in different areas of the sheet. The QuantCell client is used to interactively write, test and execute the analysis. It also supports logging of activities on the cluster and callback in addition to providing a form of documentation of the code. Here below is a screenshot, showing the complete sheet and, in particular, the code used to create the Hadoop Mapper in cell c3 of the sheet.

Figure: MapReduce analysis

The Mapper object in cell c3 is seen to depend on the function “traffic” defined in cell c16 which itself depends on cells c14, c15 and g4. The packager in the Tools language has to account for all these relationships when creating the byte code which is sent to the Apache Hadoop cluster. Since deployment of code is core functionality in QuantCell, it is efficiently handled by the system internals. The package command creates as many classes in the resulting jar as specified by the input variables. In the case at hand, QuantCell creates only the Mapper and the Reducer class in cells c3 and c4 and makes sure that all the logic is incorporated into these two classes. The Tools command used in cell c5 to interactively package the analysis is “(c3,c4) = [tools] package -jar .\temp\mr-class.jar  -java .\temp\mr-java.jar -source 1.6 -target 1.6 –overwrite”, this expression should be self-explanatory, but it should be noted that the jar location is just some location where the Hadoop client can find the API and send it to the Hadoop server.

It remains to be explained how the Hadoop client is able to take advantage of this code. This is something that is part of Hadoop. Basically, one only has to tell the Hadoop configuration object where to locate the code and in our example above, this is done in cell c6 that contains the Configuration object.

Figure: Location of the Java archive specified

Line 5 in the code (cell c6), above, points Apache Hadoop to the location of the QuantCell sheet containing the Mapper and the Reducer class and, as required, other class definitions. The Java archive pointed to is always kept up to date. In other words, changes in the analysis in the QuantCell sheet are immediately written to the jar by the package statement in cell c5. This is, essentially, what enables the Hadoop server to use the QuantCell sheet as the backbone of the analysis being performed. 

QuantCell Independent Deployment

Just like in the previous, shorter, example the Java archive created by the packaging command only contains user defined code. In other words, no part of the QuantCell environment is mixed in within the API communicated to the Hadoop server. This is important since it mimics how other more traditional IDEs work, i.e., the code created does not depend on the IDE used by the user to write the analysis.

The above example can be experimented with by installing QuantCell, available from quantcell.com, and by opening one of the examples included “CDH4 on Rackspace - Traffic”. The Hadoop setup used in the example is based on Cloudera’s Open Source Distribution including Hadoop (CDH4) distributed from Cloudera’s homepage (cloudera.com). A CDH4 client is retrieved and installed when the example is opened, but the example is not specific to CDH4 only, other Hadoop environments can be configured as well.

Polyglot Programming Features Scheduled for Fall

Agust Egilsson - Wednesday, July 17, 2013
There are numerous new Polyglot programming features scheduled for the Fall/Winter release of QuantCell. Polyglot programming means that the user is able to use multiple languages when creating spreadsheet formulas in the QuantCell environment. We already support four languages in the QuantCell spreadsheet listed below:
  • Java
  • SQL
  • the Tools language
  • 4gl wizards
the languages being added now include
  • R
  • Scala
  • Python (Jython)
and we may take on some other language projects soon if realistic proposals are suggested!

Using Java code snippets to define spreadsheet functions and formulas is an extremely powerful way to work within the QuantCell spreadsheet. The advantage of using Java code snippets is that this allows the user to access any Java library of algorithms and methods directly, that is available to the user as proprietary or open source. This is especially nice since most Big Data Analytics libraries are built using Java and can therefore be used immediately after release in spreadsheet functions within QuantCell.

SQL is really a requirement for analysts and data scientists, most have a good understanding of the language. We started supporting SQL last spring, and have tested it against most large databases including Oracle and MySQL. We have also used it to run real time big data queries against Cloudera´s Impala.

Tools language. The tools language is an evolving scripting languages used for simplifying deployment of solutions. It is specifically designed so that models created in the QuantCell spreadsheet can be easily exported to other systems, clouds, services and as Java libraries.

4gl wizards. The visual formula generators used in QuantCell form an evolving and visual programming language that automate creating complicated formulas.

With respect to ongoing work we are eager to see R realized as one of the languages being supported from QuantCell when creating formulas. This is scheduled for being released this fall or winter. It will give statisticians something that they are used to working with and it will be fun to experiment with, especially once we are able to import the extensive R formula libraries into QuantCell. Scala and Jython are also scheduled to be included in releases this fall or winter and we know that there are users eager to take advantage of these languages, especially within the financial industry.