Why Visi?
Table of Contents
1 Why Visi?
1.1 What is Visi?
Visi is a computer language that allows expression of big data jobs in a syntax that's not frightening to people familiar with Excel formula functions. There is more in Visi than there is in Excel formula functions, but the "more" is adding to what's in Excel.
Let's start with a simple expression… a thing that executes code and returns a value:
1 + 1
That's simple, the expression evaluates to 2
. Yay!
In Excel, each cell is a variable… something that can contain a value. In Excel, cells can be constants (values that do not change) or formulas. If a cell contains a formula and that formula refers to another cell, when the predicate cell's value changes, all the dependent cells update their values.
In Visi, you name the placeholders for values. So, to create a names:
x = 5 y = x + 5
The above code created names "x" and "y". "x" is a constant and "y" is dependent on "x".
In Visi and Excel, you can apply a function to a value:
## Define a name that's associated with a String a1 = "The Brown Cow" ## Calculate the length of the string length(a1)
13
Visi groups names of functions and other values (in Visi functions
are values) into different packages. Functions in packages can
be referenced with package::name
like:
Math::sin(0.5)
Computes to:
0.479425538604203
In Excel, there are lots and lots of different functions. They are all in the same group, although the user interface (Insert Function) lets you pick functions from categories like "Statistical" or "Financial".
In Visi, functions are grouped in separately named "packages" so that it's easier
to find a function you're looking for and sometimes, like in the
above example, we have to be explicit about which package the
function lives in. The ::
between Math
and sin
tells Visi
to look in the Math
package for the function.
With Visi, you can use any Java package as part of your Visi notebook. We will cover Java "Host Interoperability" and how to load Java JAR files into your Visi notebook later in this document.
A declaration is the association of a value with a name such that the named declaration can be accessed by other parts of your notebook. For example:
Math::cos(Math::PI / 3) ## expression cos_third_pi = Math::cos(Math::PI / 3) ## declaration cos_third_pi ## referencing the name
In Visi, you can create your own functions. So you can use the same logic in many places without having to copy/paste that logic over and over:
calc_tax(income) = income * if(income > 5,000, 40%, 20%)
The above function computes the tax amount depending on the income level.
And functions can call other functions:
tax_rate(income) = if(income > 5,000, 40%, 20%) calc_tax(income) = income * tax_rate(income)
Visi syntax can span multiple lines and if/then/else can be a formula and it has its own syntax that might be easier to read:
tax_rate(income) = if income > 5,000 then 40% else 20% calc_tax(income) = income * tax_rate(income)
And Visi code can contain comments so that you and others who read the code can understand what's going on:
tax_rate(income) = if income > 5,000 then 40% ## for high incomes else 20% ## for lower incomes calc_tax(income) = income * tax_rate(income)
To summarize, Visi syntax is a lot like Excel syntax, but there are extensions to what you are familiar with in Excel to make writing more complex Visi notebooks easier.
1.2 Core Concepts
In Visi, there are three constructs: expressions, declarations, and housekeeping.
An expression is a value that's computed. Expressions can evaluate to either values or functions. Or more precisely, functions are values just like numbers and strings and collections.
What's a collection? It's a group of other values. For example:
[1, 2, 3, 4]
Is a collection of numbers. The specific collection type is
called a Vector
. It's a 1 dimensional ordered array of values.
Ordered means that the Vector retains the elements in the
order that it was originally created.
Visi also supports sets. Sets are unordered collections that contain unique values. So:
#{"foo", "bar", "baz", "foo", "dog", "dog", "baz"}
Only contains 4 elements:
#{"foo", "bar", "dog", "baz"}
Another collection is an association. An association is collection of unique keys and values. Both the keys and values can be any Visi value. Let's create an association of people and ages:
{"David" 51, "Archer" 11, "Tessa" 2}
Note that like sets, the keys in an association are unordered.
If you are familiar with JavaScript, associations in Visi are like Objects in JavaScript… collections of key/value pairs.
Visi also has a keyword
type. Keywords are handy for
giving a common/shared name to a value in an association. For example:
[{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 2}]
If you're familiar with JavaScript or JSON, the Visi syntax for defining an association looks just like JSON. Familiarity is good.
In Visi, functions are values just like strings and numbers and keywords. The declaration of a function is just fancy syntax, both of the following declarations mean the same thing (don't worry about the function expression syntax for the moment):
## assign a function to a name plus_one = x => x + 1 another_plus_one(x) = x + 1 z = 99 plus_one(z) == another_plus_one(z)
Are the two functions the same?
true
The first example, assigns the expression x => x + 1
to plus_one
. That
expression evaluates to a function. The second example does the same thing
with different syntax. The latter is "syntactic sugar" for the former.
You may be asking, "why do you have more than one way to say the same thing?" Good question. Visi creates syntactic sugar to give you a more concise or more natural way of expressing the same code. In different contexts, the different syntax may seem more natural. For example, the first declaration looks kind of odd where the second looks like the way we learned functions in math class. We will get to some examples of passing functions as parameters in a little while.
In Visi, top level names (those names defined outside another assignment) can be accessed by any expression in a Visi notebook, just like values in cells in Excel.
1.2.1 But why functions?
Functions take parameters and perform operations on the parameters and return a value.
You're familiar with functions in Excel. Functions are built into Excel, can be added via add-ins, and via VisualBasic.
In Visi, there are plenty of sources of functions. Some are built in, some can be packaged as JAR files and downloaded, and some can be defined in your notebook.
Functions in Visi can be applied to every element of
a collection. This is called a map
operation.
Hey… that's part of MapReduce… yes!
We're going to build a map/reduce job and see how it works locally. The cool thing about Visi is that you can explore a job locally and then deploy the same job to a cluster of computers running Apache Storm, Apache Spark, Apache Tez, and other cluster frameworks. This is because the way you describe a pipeline of operations in Visi can be converted into a pipeline in many different frameworks.
Let's get started.
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] get_age(r) = get(r, age:) ## Get the ages ages = map(get_age, residents) sink the_ages = ages
And we get:
[51, 11, 3]
Note that we define the source
and sink
for the calculation.
A source
is the name of a place that the information comes from.
When you're using your Visi notebook in interactive mode, you
can define a sample of data for a source
. But when you deploy
the Visi notebook to your big data cluster, you can associate
sources and sinks with platform dependent data… for example files
on your HDFS cluster. This mechanism let's your interactively
play with your calculations, get them right, and then
deploy the same Visi notebook to your cluster.
Okay… back to creating a map/reduce job.
We've seen that we can map
over data to get the age
of each
record, the syntax is a little verbose. Let's build an inline
function that does the same thing:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Get the ages ages = map(r => get(r, age:), residents) sink the_ages = ages
[51, 11, 3]
A little better, but still more verbose than we'd
like. It turns out that a keyword
is also a function, so
we can shorten the function to age:
like so:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Get the ages ages = map(age:, residents) sink the_ages = ages
[51, 11, 3]
Now, let's compute the sum of ages using the reduce
function.
Yep, we're going to map and reduce… woo hoo:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Get the ages ages = map(age:, residents) age_sum = reduce((+), ages) sink sum = age_sum
65
The reduce((+ ), ages)
expression adds up
each element in the ages collection. The (+)
expression creates a function that takes two
parameters and adds them up. The (operator)
syntax is another way of creating a function.
And we can compute the average age:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Get the ages ages = map(age:, residents) age_sum = reduce((+), ages) sink sum = age_sum / count(residents)
65/3
Note that Visi can express rational numbers… numbers that are ratios of each other. This means that Visi avoids some floating point related issues.
So, yay… we have created our first map/reduce job, except for
one thing… the count
function doesn't work so well over cluster,
so we'll have to aggregate the count as part of the reduce phase:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Get the ages ages = map(# {count: 1, age: it.age}, residents) age_sum = reduce(| merge_with((+)), ages) sink sum = age_sum.age / age_sum.count
65/3
We've done a couple of things above… the first is
we've introduced another mechanism for defining a
function. Putting a #
and then a space before an expression
turns the expression into a function that takes one parameter
and that parameter is named it
(a little homage to Groovy).
The next thing we've introduced is another mechanism for getting
a value from an association. The it.age
syntax is JavaScript-like
syntax that does the same thing as get(age: it)
.
In the age_sum
expression, we reduce
over the ages
collection. But instead of just adding the elements, we must
merge the elements because they are associations. merge_with
takes a function and two associations and merges them. Each
for any common key in each association, the function is called
to combine the values at each shared key.
The |
followed by a space is yet another syntax for creating
functions. In this case, they are partially applied functions.
A partially applied function has some parameters filled in and when
it's called, the balance of the parameters are filled in. The following
function declarations each do the same thing:
- (x,y) => merge_ with((q, r) => q + r, x, y)
#2 merge_with((+ ), it1, it2)
| merge_with((+))
So, we've created a series of steps to run a map/reduce job in Visi. But Visi has some syntax that helps to make map/reduce jobs easier to write and more obvious. Visi allows you to write "transformation pipes" that allow you to express the transformations in a single pipeline.
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Run the whole job job = residents |> map # {count: 1, age: it.age} |> reduce | merge_with((+)) sink average_age = job.age / job.count
65/3
Finally, Visi has helper functions that make your life a lot easier… like creating count associations based on a key or merge/summing:
## Who lives with us? source residents = [{name: "David", type: "Human", age: 51}, {name: "Archer", type: "Dog", age: 11}, {name: "Tessa", type: "Cat", age: 3}] ## Run the whole job job = residents |> map count_for(age:) |> reduce merge_sum sink average_age = job.age / job.count
65/3
The count_for
function returns a function that builds
a count association based on the key. The merge_sum
function
is the same as merge_with((+), x, y)
.
So, you've built your first map/reduce job in Visi and it's pretty simple.
Next, we're going to go through some Visi pieces/parts.
You can also declare names that are only visible to expressions that come after the declaration in the expression. This allows you to compute an expression and use the result in many places within a larger expression. For example:
test_income(income) = mag = Math::log10(income) ## the magnitude of the income if mag < 3 then "low" else if mag < 5 then "med" else "high" map(x => str(x, ": ", test_income(x)), [300, 50,000, 250,000])
["300: low", "50000: med", "250000: high"]
We compute the value of mag
and then reference that
name in the if/then/else
expression.
Also, note that the comma can be used as a number place separator. Put
spaces after commas to help the Visi parser distinguish between
[100,240]
and [100, 240]
.
Visi supports multiline, complex strings. A normal String is enclosed
in double quotes: ="I am a String"=. But sometimes, you might
want to have a double-quote in a string… for example if you paste
a bunch of data you got off the Internet, you don't want to
have to escape the Strings. Visi has a handy, string literal.
Any sequence of characters that starts with #
and two or more
single quote ('), double quote (") or carrot (^) and ends with
the same number/type of delimiter will be treated as a single string.
This is especially useful for putting CSV data into your Visi code:
[ #'''Year,Make,Model,Description,Price''', #'''1997,Ford,E350,"ac, abs, moon",3000.00''', #'''1999,Chevy,"Venture ""Extended Edition""","",4900.00''', #'''1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00''', #'''1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00''' ]
["Year,Make,Model,Description,Price", "1997,Ford,E350,\"ac, abs, moon\",3000.00", "1999,Chevy,\"Venture \"\"Extended Edition\"\"\",\"\",4900.00", "1999,Chevy,\"Venture \"\"Extended Edition, Very Large\"\"\",,5000.00", "1996,Jeep,Grand Cherokee,\"MUST SELL!\n air, moon roof, loaded\",4799.00"]
Now that we've got the pieces out to the way, let's do a word count example in Visi:
source king_james = [ #'''Gen|1|1| In the beginning God created the heaven and the earth.~''', #'''Gen|1|2| And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.~''', #'''Gen|1|3| And God said, Let there be light: and there was light.~''', #'''Gen|1|4| And God saw the light, that it was good: and God divided the light from the darkness.~''', #'''Gen|1|5| And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.~''', #'''Gen|1|6| And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters.~''', #'''Gen|1|7| And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so.~''', #'''Gen|1|8| And God called the firmament Heaven. And the evening and the morning were the second day.~''', #'''Gen|1|9| And God said, Let the waters under the heaven be gathered together unto one place, and let the dry land appear: and it was so.~''', #'''Gen|1|10| And God called the dry land Earth; and the gathering together of the waters called he Seas: and God saw that it was good.~''', #'''Gen|1|11| And God said, Let the earth bring forth grass, the herb yielding seed, and the fruit tree yielding fruit after his kind, whose seed is in itself, upon the earth: and it was so.~''', #'''Gen|1|12| And the earth brought forth grass, and herb yielding seed after his kind, and the tree yielding fruit, whose seed was in itself, after his kind: and God saw that it was good.~''', #'''Gen|1|13| And the evening and the morning were the third day.~''', #'''Gen|1|14| And God said, Let there be lights in the firmament of the heaven to divide the day from the night; and let them be for signs, and for seasons, and for days, and years:~''', #'''Gen|1|15| And let them be for lights in the firmament of the heaven to give light upon the earth: and it was so.~''', #'''Gen|1|16| And God made two great lights; the greater light to rule the day, and the lesser light to rule the night: he made the stars also.~''', #'''Gen|1|17| And God set them in the firmament of the heaven to give light upon the earth,~''', #'''Gen|1|18| And to rule over the day and over the night, and to divide the light from the darkness: and God saw that it was good.~''', #'''Gen|1|19| And the evening and the morning were the fourth day.~''', #'''Gen|1|20| And God said, Let the waters bring forth abundantly the moving creature that hath life, and fowl that may fly above the earth in the open firmament of heaven.~''', #'''Gen|1|21| And God created great whales, and every living creature that moveth, which the waters brought forth abundantly, after their kind, and every winged fowl after his kind: and God saw that it was good.~''', #'''Gen|1|22| And God blessed them, saying, Be fruitful, and multiply, and fill the waters in the seas, and let fowl multiply in the earth.~''', #'''Gen|1|23| And the evening and the morning were the fifth day.~''', ] to_remove = #{"the", "and"} counted_words = king_james |> map # re_replace(it, #//[^a-zA-Z]+//, " ") ## strip non-words |> map # toLowerCase(it) ## convert to lower case |> flatmap | re_seq(#//\w+//) ## split by word |> filter (|> #{"the", "and", "gen"} |> not) ## remove "the", "gen", and "and" |> map # {it 1} ## create association between word and number 1 |> reduce merge_sum ## count by word |> sort second, descending ## sort by count (the second part of the word/count pair) sink words = counted_words
[["god", 22], ["was", 13], ["waters", 11], ["let", 11], ["in", 10], ["it", 10], ["earth", 10], ["light", 10], ["day", 9], ["of", 9]]
1.3 Immutability
Like Excel, Visi doesn't let you change a value once it's computed.
This is strange for users of Java and JavaScript. In addition to Visi's
syntax which is simple and free from complex stuff from Java like void
and class
and such, Visi's types can be converted into a form that can
be distributed across a cluster of computes. Visi data types can be
turned into bytes that get turned back into the same data types on
the other side of the network.
Between serialization and immutability, Visi provides tools that make writing distributed computing jobs simple.
1.4 Host Interoperability
Visi sits on top of two host systems. All Visi code is compiled to (well, more technically, transliterated to) Clojure. Then the Clojure code is compiled into Java Virtual Machine byte-code. This means that Visi code works with any Clojure and Java library and Visi code can be used in and called by any JVM-based Big Data platform. Being able to work with any Java and Clojure library means that you have a wealth of libraries to use. But how do you use these libraries in Visi?
First, let's deal with method calls. Java objects all have methods. Rather than using functions like Visi and Clojure, all the actions that can be done to a particular Object is by invoking a method on an Object.
Visi mostly makes method invocation seamless. If there's no function matching the function name, then Visi invokes that named method on the object. For example:
toString([1, 2, 3])
"[1 2 3]"
Boom, the toString
method is invoked on the object. In this case,
the Vector in Visi is actually represented by a Clojure Vector
and we've called the .toString
method on it.
You can be explicit about calling a method if there's ambiguity
about a name being in scope. By prefixing a function name with $.
,
Visi forces the method invocation:
$.toString($.size([1, 2, 3]))
"3"
Visi also supports Java/JavaScript style method invocation:
x = [1, 2, 3]
x.size().toString()
"3"
And if you want to force a method invocation rather than a function call (if there is a function in scope with the same name as the method you want to call):
x = [1, 2, 3]
x$.size()$.toString()
"3"
Why the different ways to do the same thing? Mostly because there will be times when it's cleaner/easier to read/update when you use one style or the other.
Mostly with the $package
declaration. Note the $
. This means
that you're explicitly using some form of host interoperability.
Using $package
, you declare the name of the package your code
runs in (usually, visi.notebook
) and you load any packages
you might need off the Internet, for example, the Apache
Commons CSV package.
$package(visi.notebook, load([org.apache.commons/commons-csv, "1.0"]), import(org.apache.commons.csv.CSVParser, org.apache.commons.csv.CSVFormat)) (CSVParser::parse(#'''1,2,"hello, moose",33,"thing"''', CSVFormat::DEFAULT)) |> first ## get the first element |> seq ## turn it into a sequence
["1", "2", "hello, moose", "33", "thing"]