Curiosity

Excel Connector

Ingest rows from an Excel workbook (.xlsx) using ClosedXML — the highest-fidelity managed library for reading and writing the Open XML format. For read-only ingest of huge sheets, ExcelDataReader is faster; both are covered below.

Packages

Curiosity.Library on NuGet ClosedXML on NuGet

dotnet add package Curiosity.Library
dotnet add package ClosedXML

If you only need to read very large sheets and don't care about formulas, conditional formatting, or styles, swap ClosedXML for ExcelDataReader + ExcelDataReader.DataSet.

Expected source shape

A worksheet named Products with a header row in row 1:

sku name category price updated_at
SKU-01 Widget Mini Hardware 4.99 2025-10-01T0000Z
SKU-02 Bracket Pack Hardware 12.50 2025-10-05T0000Z
SKU-03 Foo Cleaner Consumable 8.00 2025-10-12T0000Z

Connector code (ClosedXML)

Program.cs
using ClosedXML.Excel;
using Curiosity.Library;

[Node]
public class Product
{
    [Key]       public string         Sku       { get; set; }
    [Property]  public string         Name      { get; set; }
    [Property]  public string         Category  { get; set; }
    [Property]  public double         Price     { get; set; }
    [Timestamp] public DateTimeOffset UpdatedAt { get; set; }
}

using var graph = Graph.Connect(
    endpoint:      Environment.GetEnvironmentVariable("CURIOSITY_ENDPOINT")!,
    token:         Environment.GetEnvironmentVariable("CURIOSITY_TOKEN")!,
    connectorName: "excel-products");

await graph.CreateNodeSchemaAsync<Product>();
graph.SetAutoCommitCost(everyNodes: 10_000);

var path = args.Length > 0 ? args[0] : "products.xlsx";
using var workbook  = new XLWorkbook(path);
var       worksheet = workbook.Worksheet("Products");

// Map header text -> column number (1-based).
var headerRow = worksheet.Row(1);
var headers = headerRow.Cells(1, worksheet.LastColumnUsed().ColumnNumber())
    .ToDictionary(c => c.GetString().Trim().ToLowerInvariant(), c => c.Address.ColumnNumber);

var ingested = 0;
foreach (var row in worksheet.RowsUsed().Skip(1))   // skip header
{
    var product = new Product
    {
        Sku       = row.Cell(headers["sku"]).GetString().Trim(),
        Name      = row.Cell(headers["name"]).GetString(),
        Category  = row.Cell(headers["category"]).GetString(),
        Price     = row.Cell(headers["price"]).GetDouble(),
        UpdatedAt = new DateTimeOffset(row.Cell(headers["updated_at"]).GetDateTime(), TimeSpan.Zero),
    };

    if (string.IsNullOrWhiteSpace(product.Sku)) continue;

    graph.AddOrUpdate(product);
    ingested++;
}

await graph.CommitPendingAsync();
Console.WriteLine($"Ingested {ingested} products from {path}");

How it works

worksheet.RowsUsed() skips fully-empty rows, which is what you usually want — Excel sheets often have phantom blank rows at the end. The header-to-column-number dictionary decouples the connector from column ordering: rearrange columns in the workbook and the code keeps working as long as the headers stay the same.

ClosedXML uses IXLCell.GetString / GetDouble / GetDateTime for typed reads — if a cell contains a formula, you get the computed value, not the formula text. For raw values use cell.Value.ToString().

Fast path: ExcelDataReader for huge sheets

ClosedXML loads the whole workbook into a typed DOM. For sheets with millions of rows that's wasteful — ExcelDataReader streams.

ExcelDataReader on NuGet ExcelDataReader.DataSet on NuGet

using ExcelDataReader;
using System.Text;

// Required on .NET Core / .NET 5+ for some legacy code pages.
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

using var stream = File.OpenRead(path);
using var reader = ExcelReaderFactory.CreateReader(stream);   // auto-detects .xls / .xlsx

// Move past the header row.
reader.Read();

while (reader.Read())
{
    var sku = reader.GetString(0)?.Trim();
    if (string.IsNullOrEmpty(sku)) continue;

    graph.AddOrUpdate(new Product
    {
        Sku       = sku,
        Name      = reader.GetString(1),
        Category  = reader.GetString(2),
        Price     = reader.GetDouble(3),
        UpdatedAt = new DateTimeOffset(reader.GetDateTime(4), TimeSpan.Zero),
    });
}

This is positional (reader.GetString(0), 1, …) — there's no convenient header mapping out of the box. Use IExcelDataReader's field count once on the header row if you need to handle arbitrary column orders.

Notes & pitfalls

  • Cell.GetString() vs Cell.Value.ToString(). GetString() formats the value the same way Excel displays it (respecting cell format). Value.ToString() ignores formatting and returns the raw stored value. For text columns either works; for dates and numbers prefer the typed accessors (GetDateTime, GetDouble).
  • Multiple sheets. Loop workbook.Worksheets if the file has a sheet per category/region/year. Use the sheet name as part of the [Key] to keep keys stable across sheets.
  • Merged cells. ClosedXML returns the value only on the top-left cell of a merged range; other cells in the range return empty. If your source uses merging for visual grouping, flatten it before iterating.
  • Date interpretation. Excel stores dates as floating-point days since 1900-01-01 (with the famous 1900 leap-year bug). GetDateTime() handles it correctly. If you read with GetDouble() by accident, you'll get the serial number — not what you want.
  • .xls vs .xlsx. ClosedXML only reads .xlsx. ExcelDataReader reads both. If you need to ingest legacy .xls, go with ExcelDataReader.
  • Locking on Windows. ClosedXML opens the file with shared read by default; ExcelDataReader needs an explicit FileShare.ReadWrite if the file may be open in Excel during ingest.

See also

Referenced by

© 2026 Curiosity. All rights reserved.