Excel Parser

From time to time we need to include Excel files in a Solution to keep data. In some cases the speadsheets are so massive that we need a special type of parsing to make it useful, but in most cases, the spreadsheets contains a limited number of rows. This parser will allow you to load the contents of the Excel sheet into a Groovy object that you can manipulate and use in the solution, and it should be able to take up to 1000 rows spread across any number of worksheets with both legacy(.xsl) and modern(.xlsx) formats.

Installation

Add files to Resources

We start off by adding the files from excelParserLibs.zip, to the Resources in your solution and set the path to /script_lib.

  1. Click on 'Solution' in the upper left corner.
  2. Select 'Resources' on the left.
  3. Select 'File' at the top.
  4. Use 'Add' on the upper right to add the file you downloaded. Alternatively, you may 'drag and drop' it.
  5. Set the Path for this file from / to /script_lib. This ensures the file can be accessed using a Groovy script later.
  6. Hit 'Save'. In the confirmation popup that appears, click 'Continue', then close the popup.

For more details on managing files in your solution, see Resource File Manager.

The result will look like this: resouces_picture

Add Global Variables

Now it's time to create two global variables, excelData1 and excelData2 that will store the data.

  1. Select 'Globals' on the left.
  2. Select 'Variables' at the top.
  3. Add a new variable using the 'Add' button on the upper right and give it the name excelData1.
  4. Set the default value to an empty map [:].
  5. Hit 'Save'.
  6. Follow the same steps for excelData2.

Add Global Scripts

Now it's time to add global scripts to your solution.

  1. Change to the 'Script' tab at the top while still in 'Globals'.
  2. Select 'Solution Loaded' on the left-hand side in the 'Sessions' column.
  3. Click 'Edit' at the top to the very right.
  4. You can now edit the 'Begin Dialog' and paste the following code snippet below the already existing code:
    import org.apache.poi.ss.usermodel.Font
    import org.apache.poi.ss.usermodel.Workbook
    import org.apache.poi.ss.usermodel.WorkbookFactory
    import org.apache.poi.ss.usermodel.DataFormatter

    class ExcelParser {

        public static def excelData = [];

        static parse(String filename, boolean useRowsHeaders, boolean useColumnHeaders, boolean markupFormatting) {
            def wbData = [:]

        def file = FileGetter.getFile(filename)
            InputStream inp = new FileInputStream(file)
            Workbook wb = WorkbookFactory.create(inp);
            DataFormatter dataFormatter = new DataFormatter()
            boolean isOldExcel = wb.getSpreadsheetVersion().ordinal() == 0
            if (isOldExcel && markupFormatting) {
                println('WARNING - Markup formatting not supported for *.xls files. Please save your file as *.xslx to apply markup formatting.')
            }
            wb.forEach({ sheet ->

                def sheetData = [:]
                sheet.forEach({ row ->

                    def rowName;

                    if (useRowsHeaders) {
                        rowName = dataFormatter.formatCellValue(row.getCell(0))
                    } else {
                        rowName = "row" + (row.rowNum + 1)
                    }
                    def rowData = [:]
                    row.forEach({ cell ->

                            def columnName;

                            if (useColumnHeaders) {
                                columnName = dataFormatter.formatCellValue(sheet.getRow(0).getCell(cell.columnIndex))
                            } else {
                                columnName = "column" + cell.columnIndex
                            }
                            def cellData = ""
                            if (markupFormatting) {
                                def richText = cell.getRichStringCellValue()
                                int formattingRuns = richText.numFormattingRuns()
                                def plainText = richText.getString()

                                if (formattingRuns > 0 && !isOldExcel) {
                                    for (int i = 0; i < formattingRuns; i++) {
                                        int startIdx = richText.getIndexOfFormattingRun(i)
                                        int length = richText.getLengthOfFormattingRun(i)
                                        int endIdx = startIdx + length

                                        def tranche = plainText.substring(startIdx, endIdx)
                                        Font font = richText.getFontOfFormattingRun(i)
    //Skip markup if formatting run has no font
                                        if (font) {
                                            if (font.getBold()) {
                                                tranche = "<b>" + tranche + "</b>"
                                            }
                                            if (font.getItalic()) {
                                                tranche = "<i>" + tranche + "</i>"
                                            }
                                            if (font.getUnderline() > Font.U_NONE) {
                                                tranche = "<u>" + tranche + "</u>"
                                            }
                                            if (font.getStrikeout()) {
                                                tranche = "<strike>" + tranche + "</strike>"
                                            }
                                            if (font.getTypeOffset() == Font.SS_SUPER) {
                                                tranche = "<sup>" + tranche + "</sup>"
                                            }
                                            if (font.getTypeOffset() == Font.SS_SUB) {
                                                tranche = "<sub>" + tranche + "</sub>"
                                            }

                                            def xssfColor = font.getXSSFColor()
                                            if (xssfColor) {
                                                def colorRGBA = xssfColor.getARGBHex()
                                                def color = colorRGBA.substring(2)
                                                if (color != "000000") {
                                                    tranche = "<span style=\"color: #" + color + ";\">" + tranche + "</span>"
                                                }
                                            }
                                        }

                                        cellData += tranche.replaceAll(/\n/, "<br/>")
                                    }
                                } else {
                                    cellData = plainText
                                }

                            } else {
                                cellData = dataFormatter.formatCellValue(cell)
                            }

                            if (cell.columnIndex > 0 && useColumnHeaders || !useColumnHeaders) {
                                rowData.put(columnName, cellData)
                            }
                    })
                    if (row.rowNum > 0 && useRowsHeaders || !useRowsHeaders) {
                        sheetData.put(rowName, rowData)
                    }

                })
                wbData.put(sheet.sheetName, sheetData)

            })
            return wbData
        }
    }

This class contains two methods. The first excelData, is only a placeholder for the results, while the second one, parse, does the actual work.

Add Parameters

Last step in 'Solution Loaded' is to run the parser for each file you want read, with the following parameters.

Name Type Description
filename String The name of the file you want read.
rowHeaders Boolean Should the first cell of the row be considered a header. If true, the object will have the row header for a key, if false, 'rowX' will be the key, where X is the number of the header.
columnHeaders Boolean As above, but for column headers.
useMarkup Boolean If true, it will convert the styles in the Excel sheet into HTML tags. For example, if a part of a cell text is bolded in the spreadsheet, it will be surrounded by 'b' and '/b' tags in the output. If false, all formatting is ignored.

The result should look something like this: ExcelParser.excelData.add(ExcelParser.parse("modern_format.xlsx", true, true, true)) ExcelParser.excelData.add(ExcelParser.parse("legacy_format.xls", true, true, false))

Add a second script

Then in the same 'Solution Loaded' script add the FileGetter class.

Note that this class can be used for any instance where you need to read the contents of a file, and it isn’t specific to Excel.

        class FileGetter {
            static File getFile(String sResourceFileName) {
                URL url = FileGetter.class.getClassLoader().getResource(sResourceFileName);
                if (url == null) {
                    throw new RuntimeException('Failure to obtain URL for file [' + sResourceFileName + '] in Resources');
                }
                URI uri;
                try {
                    uri = url.toURI();
                } catch (exc) {
                    throw new RuntimeException('Failure to create URI for URL [' + url + '] for file [' + sResourceFileName + '] in Resources', exc);
                }
                File file;
                try {
                    return new File(uri);
                } catch (exc) {
                    throw new RuntimeException('Failure to create File object for URL [' + url + '] and URI [' + uri + '] for file [' + sResourceFileName + '] in Resources', exc);
                }
            }
    }

Add a Begin Dialog script

As a final step for the installation select 'Begin Dialog' and paste the following code:

    excelData1 = ExcelParser.excelData[0]
    excelData2 = ExcelParser.excelData[1]

Usage

Finally, you can use the values from the Excel in a flow as shown below.

To get data from sheet someSheet, for row with header someRow, corresponding with column with header someColumn.

${excelData1.someSheet.someRow.someColumn}

For sheet names or headers with spaces or special characters, use square-bracket notation.

${excelData1.["someSheet"]["some Row Header"]["some Colum Header"]}

In cases where the rowHeader or columnHeader arguments are false, the values can be invoked like this, where 'X' is the sheet, row or column number.

${excelData1.SheetX.rowX.columnX}

Was this page helpful?