D.7 Using Clusters to Work with Spreadsheets

SheerPower clusters allow you to work with spreadsheets (.csv files) as if they are database tables. Clusters are very fast, processing millions of rows per second. Any field can be treated as a lookup key. Lookup speeds of over 10 million per second are typical.

More with Spreadsheets

The cities dataset is 10,000 rows of the largest cities in the world. It includes the city name, country code, and population for each one.

The cities spreadsheet is input into a program using the cluster input statement. You give the name of the spreadsheet CSV file, the number of headers, and the name of the cluster you will be using.

cluster cities: City$, Country$, Population, Region$, Latitude, Longitude cluster input name '@..\safe\safe_world_cities.csv', headers 1: cities print 'cities input: '; size(cities)

CSV File

City,Country,Population Tokyo,JP,37435191 Delhi,IN,29399141 Shanghai,CN,26317104

→ cluster input

In-Memory Cluster Array

City$	Country$	Population
Tokyo	JP	37435191
Delhi	IN	29399141
Shanghai	CN	26317104

Note: The CLUSTER INPUT statement has many options and can input JSON files, tab-separated-value (TSV) files, and even X12 files into clusters. For details see: Cluster Input

SheerPower can also output a cluster as a spreadsheet:

`PRINT CLUSTER` and `ASK CLUSTER` in SheerPower

`PRINT CLUSTER` Statement

The PRINT CLUSTER statement in SheerPower allows you to output the contents of a cluster array. You can choose to print all the rows in the cluster or just the current row, depending on how you invoke the command. By default, the rows are printed in CSV format, making it easy to export and share data directly.

Syntax:

print cluster clustername: all

clustername: The name of the cluster whose contents you want to print.
all: This optional keyword prints all rows of the cluster. If omitted, only the current row is printed.

Examples:

1. Printing All Rows of a Cluster:

print cluster cities: all

This prints every row in the cities cluster.

2. Printing the Current Row Only:

print cluster cities

This prints just the current row in the cities cluster.

`ASK CLUSTER` Statement

The ASK CLUSTER statement is used to retrieve metadata about the cluster, such as the column headers, into a variable.

The ASK CLUSTER clustername: headers x$ statement in SheerPower is used to retrieve the names of the fields (headers) in a cluster. The headers are stored as a comma-separated string in the variable x$. This allows you to programmatically access and use the headers of a cluster, for example, when exporting data or generating reports.

Syntax:

ask cluster clustername: headers x$

clustername: The name of the cluster you are working with.
headers: This keyword specifies that the column headers should be returned.
x$: A string variable where the headers will be stored.

Examples:

1. Retrieving and Printing Column Headers:

ask cluster cities: headers headers$ print headers$

This retrieves the column headers from the cities cluster and stores them in the string variable headers$. The print statement then outputs the headers.

Detailed Explanation:

PRINT CLUSTER: This statement is versatile for debugging or displaying cluster contents. When you use : all, it prints all rows, which is useful for reviewing or outputting the entire dataset. Without : all, it focuses on the current row, which is helpful when iterating through rows one by one.

ASK CLUSTER: The headers keyword is particularly useful when you need to work with column names programmatically. By storing the headers in a variable like x$, you can dynamically reference or output the structure of the data in your cluster.

These commands are essential tools in SheerPower for managing and displaying cluster data, making them powerful in scenarios involving data processing, reporting, or simply verifying the structure and content of clusters.

print cluster cities: all // print all rows of the cluster

If you just want to print the current row of a cluster:

print cluster cities // just the current row

Clusters and JSON Formatted Data

SheerPower also supports loading JSON files into clusters using the cluster input statement. Below are the details and examples:

Details

The cluster input statement allows you to read JSON files and load their data into a SheerPower cluster. The structure of the JSON file should have a consistent number of name/value pairs for each "row". The data is loaded sequentially into the cluster without regard to the field names.

JSON Example

Consider a JSON file named @json_prefixed.json with the following content:

{ "people3":[ { "id":"one", "name": { "first":"Fred", "last":"Smith" }, "price":234.56 }, { "id":"two", "name": { "first":"Sally", "last":"Sue" }, "price":123.45 } ] }

To load this JSON data into a SheerPower cluster, use the following code:

cluster people3: id$, name->first$, name->last$, price cluster input name '@json_prefixed.json': people3

This code will read the JSON file and load the data into the people3 cluster. Each object in the JSON array is treated as a row in the cluster.

Cluster and EDI X12 file formats

This program demonstrates how to process an X12 file into a SheerPower cluster:

version$ = "v1.0" record_delimiter$ = "~" field_delimiter$ = "*" cluster x12_data: segment$ cluster input name 'x12_file.txt', record record_delimiter$, field field_delimiter$: x12_data collect cluster x12_data end collect for each x12_data print x12_data->segment$ next x12_data stop

Sample X12 File (x12_file.txt):

ISA*00* *00* *ZZ*MYORG *ZZ*Interchange Rec*200831*2324*U*00501*100030537*0*P*>~ GS*SH*MYORG*Interchange Rec*20200831*2324*200030537*X*005010~ ST*856*300089443~ BSN*00*P1982123*20200831*2324*0001~ DTM*011*20200831*2324~ HL*1**S~ TD1*CTN*1****G*2*LB~ TD5**2*FDE*U********3D*3D*3D~ DTM*011*20200831~ N1*ST*SOO KIM*92*DROPSHIP CUSTOMER~ N3*26218 QUENTIN TURNPIKE~ N4*REANNAFORT*MS*51135~ HL*2*1*O~ PRF*1881225***20200831~ HL*3*2*P~ MAN*CP*037178019302492~ HL*4*3*I~ LIN*1*VC*11216.32*SK*RGR-11216.32~ SN1*1*1*EA~ PID*F****ALL TERRAIN ENTRY GUARD KIT 2020 JEEP WRANGLER J~ CTT*4~ SE*20*300089443~ GE*1*200030537~ IEA*1*100030537~

CSV Spreadsheet Example

In the example code below, you can input data into the a$ variable and then click run to try different cities.

cluster cities: City$, Country$, Population, Region$, Latitude, Longitude cluster input name '@..\safe\safe_world_cities.csv', headers 1: cities print 'cities input: '; size(cities) print mycity$ = a$ do row = findrow(cities->city$, mycity$) if row = 0 then print 'Unknown city: ['; mycity$;'] Try a different value for a$ -- like Duluth' exit do end if print sprintf('Population of %s is %m', mycity$, cities->population) end do collect cluster cities include cities->population > 10_000_000 // try other values sort by ucase$(cities->city$) end collect for each cities print cities->city$;' '; cities->country$; print sprintf(', population %m', cities->population) next cities print "=== CSV formatted, with headers ===" ask cluster cities: headers headers$ print headers$ for each cities print cluster cities next cities

Program Explanation:

1. Define Clusters:

The program defines a cluster named cities to store information about cities. The cluster fields include the city name (City$), country (Country$), population (Population), region (Region$), latitude (Latitude), and longitude (Longitude).

2. Load Data from CSV:

The program reads the CSV file safe_world_cities.csv into the cities cluster. The headers 1 option indicates that the first row of the CSV file contains the column headers, which match the fields defined in the cities cluster.

3. Print Number of Cities:

The program prints the total number of cities loaded into the cities cluster.

4. Assign User Input:

The value of the variable a$ is assigned to mycity$, representing the city name that the user wants to look up.

5. Search for the City:

The program searches the cities cluster for a row where the City$ field matches mycity$. The row number is stored in the variable row.

6. Handle Unknown City:

If the city is not found (row = 0), the program prints a message indicating that the city is unknown and suggests trying a different city. The loop then exits.

7. Print Population:

If the city is found, the program prints the population of the city using the sprintf function to format the output.

8. Filter and Sort Cities:

The program starts a collect block to filter and sort the cities cluster. It includes only cities with a population greater than 10 million and sorts the cities by their name, converted to uppercase.

9. Print Filtered Cities:

The program iterates over the filtered and sorted cities cluster, printing the city name, country, and population for each city.

Summary: The program reads a file containing data about world cities and allows the user to look up the population of a specific city. It also filters and displays cities with a population greater than 10 million.

Hide Description

a$ b$ x y

Enter or modify the code below, and then click on RUN

Looking for the full power of Sheerpower?
Check out the Sheerpower website. Free to download. Free to use.

D.7 Using Clusters to Work with Spreadsheets

CSV File

In-Memory Cluster Array

PRINT CLUSTER and ASK CLUSTER in SheerPower

PRINT CLUSTER Statement

1. Printing All Rows of a Cluster:

2. Printing the Current Row Only:

ASK CLUSTER Statement

1. Retrieving and Printing Column Headers:

Detailed Explanation:

Details

JSON Example

Cluster and EDI X12 file formats

CSV Spreadsheet Example

Enter or modify the code below, and then click on RUN

`PRINT CLUSTER` and `ASK CLUSTER` in SheerPower

`PRINT CLUSTER` Statement

`ASK CLUSTER` Statement