|
D.2 Cluster Arrays
|
A cluster array contains rows and columns of data, like an in-memory
spreadsheet.
In fact, cluster arrays can easily contain spreadsheet information. Because
of this, Sheerpower includes a rich set of features to input spreadsheet
files directly into clusters and perform operations on clusters including
database-like operations such as sorting, including, excluding, and
searching.
Now, let's make a cluster array to hold student information. For this, we will
be using a cluster array called STUDENT. Each row of the cluster will be a
different student. We will keep track of each student's name, age, and grade
level:
cluster student: name$, age, level
The most common method to put data into a cluster is to add a new row to the
end of the cluster using the ADD CLUSTER statement.
add cluster student
student->name$ = "Joan Ark"
student->age = 18
student->level = 12
The
add cluster establishes a new cluster row. In this case, since the
cluster was empty, this is row one. The next lines of code store information
into each variable (think "column") of that cluster row. So, the name of the
first student is
"Joan Ark", she is 18 years old, and is at grade
level 12.
Now, let's put in two more students.
add cluster student
student->name$ = "John Smith"
student->age = 16
student->level = 10
add cluster student
student->name$ = "Desmond Jones"
student->age = 15
student->level = 10
Directly after adding a new row, that row is said to be CURRENT. In this
example, row three would be current.<
print student->name$ // "Desmond Jones"
print student->age // 15
print student->level // 10
In order to print information about row one, you would first make row one
current, and then access the variables in that row.
set cluster student: row 1
print student->name$ // Joan Ark
To find out how many rows are in a cluster, use the
SIZE()
function:
print size(student) // 3
To ask which row is current, use the
ask cluster statement:
set cluster student: row 2
ask cluster student: row x
print x // 2
When working with cluster arrays, we typically want to operate on each row,
one at a time. To do this we can use the COLLECT and END COLLECT
statements.
Let's print out the name, age, and grade level of each student and calculate
their average age (total ages divided by number of students).
ages = 0
counter = 0
collect cluster student
print student->name$, student->age, student->level
ages = ages + student->age
counter++
end collect
print 'The average age is '; ages/counter
The COLLECT/END COLLECT iterates through each row of a cluster. While doing
so, it creates a COLLECTION of rows. A collection can be a subset of the
entire cluster array and can be sorted by various criteria. To iterate
through a collection, use the FOR/NEXT statements. Let's sort the students by
their name and print out the sorted list:
collect cluster student
sort by student->name$
end collect
for each student
print student->name$; ' '; student->age
next student
We can also include or exclude students:
collect cluster student
include student->age > 16
sort by student->name$
end collect
for each student
print student->name$; ' '; student->age
next student
Any number of include, exclude, or sort statements can be used on a cluster
array.
We can also search a cluster array for information using the
findrow()
function. Given the cluster name, cluster variable to search in, and data to
be searched for,
findrow() returns either the first row where the data
was found or returns a zero if the data was not found. By default,
findrow() does a case-regardless search.
print findrow(student->name$, "Joan Ark") // 1
The
findrow() function is highly optimized. Over five million searches
can be done per second. This makes
findrow() ideal for tasks that
require fast lookups. If the search is successful, the cluster array row is
now current.