Why Visi?

Table of Contents

1 Why Visi?

1.1 What is Visi?

Visi is a computer language that allows expression of big data jobs in a syntax that's not frightening to people familiar with Excel formula functions. There is more in Visi than there is in Excel formula functions, but the "more" is adding to what's in Excel.

Let's start with a simple expression… a thing that executes code and returns a value:

1 + 1

That's simple, the expression evaluates to 2. Yay!

In Excel, each cell is a variable… something that can contain a value. In Excel, cells can be constants (values that do not change) or formulas. If a cell contains a formula and that formula refers to another cell, when the predicate cell's value changes, all the dependent cells update their values.

In Visi, you name the placeholders for values. So, to create a names:

x = 5

y = x + 5

The above code created names "x" and "y". "x" is a constant and "y" is dependent on "x".

In Visi and Excel, you can apply a function to a value:

## Define a name that's associated with a String
a1 = "The Brown Cow"

## Calculate the length of the string
length(a1)
13

Visi groups names of functions and other values (in Visi functions are values) into different packages. Functions in packages can be referenced with package::name like:

Math::sin(0.5)

Computes to:

0.479425538604203

In Excel, there are lots and lots of different functions. They are all in the same group, although the user interface (Insert Function) lets you pick functions from categories like "Statistical" or "Financial".

In Visi, functions are grouped in separately named "packages" so that it's easier to find a function you're looking for and sometimes, like in the above example, we have to be explicit about which package the function lives in. The :: between Math and sin tells Visi to look in the Math package for the function.

With Visi, you can use any Java package as part of your Visi notebook. We will cover Java "Host Interoperability" and how to load Java JAR files into your Visi notebook later in this document.

A declaration is the association of a value with a name such that the named declaration can be accessed by other parts of your notebook. For example:

Math::cos(Math::PI / 3) ## expression

cos_third_pi = Math::cos(Math::PI / 3) ## declaration

cos_third_pi ## referencing the name

In Visi, you can create your own functions. So you can use the same logic in many places without having to copy/paste that logic over and over:

calc_tax(income) = income * if(income > 5,000, 40%, 20%)

The above function computes the tax amount depending on the income level.

And functions can call other functions:

tax_rate(income) = if(income > 5,000, 40%, 20%)

calc_tax(income) = income * tax_rate(income)

Visi syntax can span multiple lines and if/then/else can be a formula and it has its own syntax that might be easier to read:

tax_rate(income) =
  if income > 5,000 then
    40%
  else
    20%

calc_tax(income) = income * tax_rate(income)

And Visi code can contain comments so that you and others who read the code can understand what's going on:

tax_rate(income) =
  if income > 5,000 then
    40% ## for high incomes
  else
    20% ## for lower incomes

calc_tax(income) = income * tax_rate(income)

To summarize, Visi syntax is a lot like Excel syntax, but there are extensions to what you are familiar with in Excel to make writing more complex Visi notebooks easier.

1.2 Core Concepts

In Visi, there are three constructs: expressions, declarations, and housekeeping.

An expression is a value that's computed. Expressions can evaluate to either values or functions. Or more precisely, functions are values just like numbers and strings and collections.

What's a collection? It's a group of other values. For example:

[1, 2, 3, 4]

Is a collection of numbers. The specific collection type is called a Vector. It's a 1 dimensional ordered array of values. Ordered means that the Vector retains the elements in the order that it was originally created.

Visi also supports sets. Sets are unordered collections that contain unique values. So:

#{"foo", "bar", "baz", "foo", "dog", "dog", "baz"}

Only contains 4 elements:

#{"foo", "bar", "dog", "baz"}

Another collection is an association. An association is collection of unique keys and values. Both the keys and values can be any Visi value. Let's create an association of people and ages:

{"David" 51, "Archer" 11, "Tessa" 2}

Note that like sets, the keys in an association are unordered.

If you are familiar with JavaScript, associations in Visi are like Objects in JavaScript… collections of key/value pairs.

Visi also has a keyword type. Keywords are handy for giving a common/shared name to a value in an association. For example:

[{name: "David", type: "Human", age: 51},
 {name: "Archer", type: "Dog", age: 11},
 {name: "Tessa", type: "Cat", age: 2}]

If you're familiar with JavaScript or JSON, the Visi syntax for defining an association looks just like JSON. Familiarity is good.

In Visi, functions are values just like strings and numbers and keywords. The declaration of a function is just fancy syntax, both of the following declarations mean the same thing (don't worry about the function expression syntax for the moment):

## assign a function to a name
plus_one = x => x + 1

another_plus_one(x) = x + 1

z = 99

plus_one(z) == another_plus_one(z)

Are the two functions the same?

true

The first example, assigns the expression x => x + 1 to plus_one. That expression evaluates to a function. The second example does the same thing with different syntax. The latter is "syntactic sugar" for the former.

You may be asking, "why do you have more than one way to say the same thing?" Good question. Visi creates syntactic sugar to give you a more concise or more natural way of expressing the same code. In different contexts, the different syntax may seem more natural. For example, the first declaration looks kind of odd where the second looks like the way we learned functions in math class. We will get to some examples of passing functions as parameters in a little while.

In Visi, top level names (those names defined outside another assignment) can be accessed by any expression in a Visi notebook, just like values in cells in Excel.

1.2.1 But why functions?

Functions take parameters and perform operations on the parameters and return a value.

You're familiar with functions in Excel. Functions are built into Excel, can be added via add-ins, and via VisualBasic.

In Visi, there are plenty of sources of functions. Some are built in, some can be packaged as JAR files and downloaded, and some can be defined in your notebook.

Functions in Visi can be applied to every element of a collection. This is called a map operation. Hey… that's part of MapReduce… yes!

We're going to build a map/reduce job and see how it works locally. The cool thing about Visi is that you can explore a job locally and then deploy the same job to a cluster of computers running Apache Storm, Apache Spark, Apache Tez, and other cluster frameworks. This is because the way you describe a pipeline of operations in Visi can be converted into a pipeline in many different frameworks.

Let's get started.

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

get_age(r) = get(r, age:)

## Get the ages
ages = map(get_age, residents)

sink the_ages = ages

And we get:

[51, 11, 3]

Note that we define the source and sink for the calculation. A source is the name of a place that the information comes from. When you're using your Visi notebook in interactive mode, you can define a sample of data for a source. But when you deploy the Visi notebook to your big data cluster, you can associate sources and sinks with platform dependent data… for example files on your HDFS cluster. This mechanism let's your interactively play with your calculations, get them right, and then deploy the same Visi notebook to your cluster.

Okay… back to creating a map/reduce job.

We've seen that we can map over data to get the age of each record, the syntax is a little verbose. Let's build an inline function that does the same thing:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Get the ages
ages = map(r => get(r, age:), residents)

sink the_ages = ages
[51, 11, 3]

A little better, but still more verbose than we'd like. It turns out that a keyword is also a function, so we can shorten the function to age: like so:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Get the ages
ages = map(age:, residents)

sink the_ages = ages
[51, 11, 3]

Now, let's compute the sum of ages using the reduce function. Yep, we're going to map and reduce… woo hoo:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Get the ages
ages = map(age:, residents)

age_sum = reduce((+), ages)

sink sum = age_sum
65

The reduce((+ ), ages) expression adds up each element in the ages collection. The (+) expression creates a function that takes two parameters and adds them up. The (operator) syntax is another way of creating a function.

And we can compute the average age:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Get the ages
ages = map(age:, residents)

age_sum = reduce((+), ages)

sink sum = age_sum / count(residents)
65/3

Note that Visi can express rational numbers… numbers that are ratios of each other. This means that Visi avoids some floating point related issues.

So, yay… we have created our first map/reduce job, except for one thing… the count function doesn't work so well over cluster, so we'll have to aggregate the count as part of the reduce phase:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Get the ages
ages = map(# {count: 1, age: it.age}, residents)

age_sum = reduce(| merge_with((+)), ages)

sink sum = age_sum.age / age_sum.count
65/3

We've done a couple of things above… the first is we've introduced another mechanism for defining a function. Putting a # and then a space before an expression turns the expression into a function that takes one parameter and that parameter is named it (a little homage to Groovy).

The next thing we've introduced is another mechanism for getting a value from an association. The it.age syntax is JavaScript-like syntax that does the same thing as get(age: it).

In the age_sum expression, we reduce over the ages collection. But instead of just adding the elements, we must merge the elements because they are associations. merge_with takes a function and two associations and merges them. Each for any common key in each association, the function is called to combine the values at each shared key.

The | followed by a space is yet another syntax for creating functions. In this case, they are partially applied functions. A partially applied function has some parameters filled in and when it's called, the balance of the parameters are filled in. The following function declarations each do the same thing:

  • (x,y) => merge_ with((q, r) => q + r, x, y)
  • #2 merge_with((+ ), it1, it2)
  • | merge_with((+))

So, we've created a series of steps to run a map/reduce job in Visi. But Visi has some syntax that helps to make map/reduce jobs easier to write and more obvious. Visi allows you to write "transformation pipes" that allow you to express the transformations in a single pipeline.

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Run the whole job
job = residents |> map # {count: 1,
                          age: it.age}
                |> reduce | merge_with((+))

sink average_age = job.age / job.count
65/3

Finally, Visi has helper functions that make your life a lot easier… like creating count associations based on a key or merge/summing:

## Who lives with us?
source residents =
 [{name: "David", type: "Human", age: 51},
  {name: "Archer", type: "Dog", age: 11},
  {name: "Tessa", type: "Cat", age: 3}]

## Run the whole job
job = residents |> map count_for(age:)
                |> reduce merge_sum

sink average_age = job.age / job.count
65/3

The count_for function returns a function that builds a count association based on the key. The merge_sum function is the same as merge_with((+), x, y).

So, you've built your first map/reduce job in Visi and it's pretty simple.

Next, we're going to go through some Visi pieces/parts.

You can also declare names that are only visible to expressions that come after the declaration in the expression. This allows you to compute an expression and use the result in many places within a larger expression. For example:

test_income(income) =
  mag = Math::log10(income) ## the magnitude of the income
  if mag < 3 then "low"
  else if mag < 5 then "med"
  else "high"

map(x => str(x, ": ", test_income(x)),
    [300, 50,000, 250,000])
["300: low", "50000: med", "250000: high"]

We compute the value of mag and then reference that name in the if/then/else expression.

Also, note that the comma can be used as a number place separator. Put spaces after commas to help the Visi parser distinguish between [100,240] and [100, 240].

Visi supports multiline, complex strings. A normal String is enclosed in double quotes: ="I am a String"=. But sometimes, you might want to have a double-quote in a string… for example if you paste a bunch of data you got off the Internet, you don't want to have to escape the Strings. Visi has a handy, string literal. Any sequence of characters that starts with # and two or more single quote ('), double quote (") or carrot (^) and ends with the same number/type of delimiter will be treated as a single string. This is especially useful for putting CSV data into your Visi code:

[
 #'''Year,Make,Model,Description,Price''',
 #'''1997,Ford,E350,"ac, abs, moon",3000.00''',
 #'''1999,Chevy,"Venture ""Extended Edition""","",4900.00''',
 #'''1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00''',
 #'''1996,Jeep,Grand Cherokee,"MUST SELL!
 air, moon roof, loaded",4799.00'''
 ]
["Year,Make,Model,Description,Price", "1997,Ford,E350,\"ac, abs, moon\",3000.00", "1999,Chevy,\"Venture \"\"Extended Edition\"\"\",\"\",4900.00", "1999,Chevy,\"Venture \"\"Extended Edition, Very Large\"\"\",,5000.00", "1996,Jeep,Grand Cherokee,\"MUST SELL!\n air, moon roof, loaded\",4799.00"]

Now that we've got the pieces out to the way, let's do a word count example in Visi:

source king_james =
 [
   #'''Gen|1|1| In the beginning God created the heaven and the earth.~''',
   #'''Gen|1|2| And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.~''',
   #'''Gen|1|3| And God said, Let there be light: and there was light.~''',
   #'''Gen|1|4| And God saw the light, that it was good: and God divided the light from the darkness.~''',
   #'''Gen|1|5| And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.~''',
   #'''Gen|1|6| And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters.~''',
   #'''Gen|1|7| And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so.~''',
   #'''Gen|1|8| And God called the firmament Heaven. And the evening and the morning were the second day.~''',
   #'''Gen|1|9| And God said, Let the waters under the heaven be gathered together unto one place, and let the dry land appear: and it was so.~''',
   #'''Gen|1|10| And God called the dry land Earth; and the gathering together of the waters called he Seas: and God saw that it was good.~''',
   #'''Gen|1|11| And God said, Let the earth bring forth grass, the herb yielding seed, and the fruit tree yielding fruit after his kind, whose seed is in itself, upon the earth: and it was so.~''',
   #'''Gen|1|12| And the earth brought forth grass, and herb yielding seed after his kind, and the tree yielding fruit, whose seed was in itself, after his kind: and God saw that it was good.~''',
   #'''Gen|1|13| And the evening and the morning were the third day.~''',
   #'''Gen|1|14| And God said, Let there be lights in the firmament of the heaven to divide the day from the night; and let them be for signs, and for seasons, and for days, and years:~''',
   #'''Gen|1|15| And let them be for lights in the firmament of the heaven to give light upon the earth: and it was so.~''',
   #'''Gen|1|16| And God made two great lights; the greater light to rule the day, and the lesser light to rule the night: he made the stars also.~''',
   #'''Gen|1|17| And God set them in the firmament of the heaven to give light upon the earth,~''',
   #'''Gen|1|18| And to rule over the day and over the night, and to divide the light from the darkness: and God saw that it was good.~''',
   #'''Gen|1|19| And the evening and the morning were the fourth day.~''',
   #'''Gen|1|20| And God said, Let the waters bring forth abundantly the moving creature that hath life, and fowl that may fly above the earth in the open firmament of heaven.~''',
   #'''Gen|1|21| And God created great whales, and every living creature that moveth, which the waters brought forth abundantly, after their kind, and every winged fowl after his kind: and God saw that it was good.~''',
   #'''Gen|1|22| And God blessed them, saying, Be fruitful, and multiply, and fill the waters in the seas, and let fowl multiply in the earth.~''',
   #'''Gen|1|23| And the evening and the morning were the fifth day.~''',
 ]

to_remove = #{"the", "and"}

counted_words = king_james
  |> map # re_replace(it, #//[^a-zA-Z]+//, " ") ## strip non-words
  |> map # toLowerCase(it) ## convert to lower case
  |> flatmap | re_seq(#//\w+//) ## split by word
  |> filter (|> #{"the", "and", "gen"} |> not) ## remove "the", "gen", and "and"
  |> map # {it 1} ## create association between word and number 1
  |> reduce merge_sum ## count by word
  |> sort second, descending ## sort by count (the second part of the word/count pair)

sink words = counted_words
[["god", 22], ["was", 13], ["waters", 11], ["let", 11], ["in", 10], ["it", 10], ["earth", 10], ["light", 10], ["day", 9], ["of", 9]]

1.3 Immutability

Like Excel, Visi doesn't let you change a value once it's computed. This is strange for users of Java and JavaScript. In addition to Visi's syntax which is simple and free from complex stuff from Java like void and class and such, Visi's types can be converted into a form that can be distributed across a cluster of computes. Visi data types can be turned into bytes that get turned back into the same data types on the other side of the network.

Between serialization and immutability, Visi provides tools that make writing distributed computing jobs simple.

1.4 Host Interoperability

Visi sits on top of two host systems. All Visi code is compiled to (well, more technically, transliterated to) Clojure. Then the Clojure code is compiled into Java Virtual Machine byte-code. This means that Visi code works with any Clojure and Java library and Visi code can be used in and called by any JVM-based Big Data platform. Being able to work with any Java and Clojure library means that you have a wealth of libraries to use. But how do you use these libraries in Visi?

First, let's deal with method calls. Java objects all have methods. Rather than using functions like Visi and Clojure, all the actions that can be done to a particular Object is by invoking a method on an Object.

Visi mostly makes method invocation seamless. If there's no function matching the function name, then Visi invokes that named method on the object. For example:

toString([1, 2, 3])
"[1 2 3]"

Boom, the toString method is invoked on the object. In this case, the Vector in Visi is actually represented by a Clojure Vector and we've called the .toString method on it.

You can be explicit about calling a method if there's ambiguity about a name being in scope. By prefixing a function name with $., Visi forces the method invocation:

$.toString($.size([1, 2, 3]))
"3"

Visi also supports Java/JavaScript style method invocation:

x = [1, 2, 3]
  x.size().toString()
"3"

And if you want to force a method invocation rather than a function call (if there is a function in scope with the same name as the method you want to call):

x = [1, 2, 3]
  x$.size()$.toString()
"3"

Why the different ways to do the same thing? Mostly because there will be times when it's cleaner/easier to read/update when you use one style or the other.

Mostly with the $package declaration. Note the $. This means that you're explicitly using some form of host interoperability.

Using $package, you declare the name of the package your code runs in (usually, visi.notebook) and you load any packages you might need off the Internet, for example, the Apache Commons CSV package.

$package(visi.notebook,
  load([org.apache.commons/commons-csv, "1.0"]),
  import(org.apache.commons.csv.CSVParser,
         org.apache.commons.csv.CSVFormat))

(CSVParser::parse(#'''1,2,"hello, moose",33,"thing"''', CSVFormat::DEFAULT))
  |> first ## get the first element
  |> seq ## turn it into a sequence
["1", "2", "hello, moose", "33", "thing"]

Author: David Pollak

Created: 2015-02-07 Sat 15:50

Emacs 24.3.1 (Org mode 8.2.10)

Validate