"DATE" 2025
"VENUE" pei devs
"DURATION" 5 min

building a bank statement analyzer with webassembly + svelte

"DESCRIPTION"

5-minute lightning talk on using wasm for compute and svelte for ui—performance without complexity

why,

i'm passionate about finance and writing webapps, i really miss writing c++, i'm tired of the react+typescript standard that is unironically bloating the web, and i wanted to challenge myself

so i built a bank statement analyzer that extracts transactions from PDFs—entirely in your browser. no servers, no APIs, no data leaves your machine.

what's niche about it? webassembly for compute + svelte for UI.


what is webassembly?

webassembly (wasm) = binary instruction format for a stack-based virtual machine

c++/rust → llvm → .wasm bytecode → browser jit → machine code
cx: llvm is set of compiler and toolchain technologies that can be used to develop
a frontend from any programming language.

key features:

  • linear memory model, sandboxed execution
  • 1.5x-4x faster than optimized js
  • 98% browser coverage, no jit warmup

the problem

parse bank PDFs from any canadian/us bank using regex patterns. requirements: fast, private, offline.

universal pattern: date → date → description → amount

  • pdf.js extracts continuous text (not lines), enabling format-agnostic parsing
  • 10 specialized patterns covering 95-98% of north american banks
  • patterns 1-2 (45%): rbc, cibc, chase, boa
  • patterns 3-10: simple formats, csv exports, multi-currency

why wasm?

speed: js regex 300-500ms → wasm 80-120ms (3-4x faster)

privacy: all processing in browser, zero network calls, gdpr-compliant by design

portability: same binary runs on x86/arm/risc-v, works everywhere (chrome/firefox/safari/node/cloudflare workers)


the c++ implementation

universal pattern in code:

std::vector<Transaction> TransactionExtractor::extract(const std::string& text) {
  // pattern: (date) (date) (description) (amount.xx)
  // continuous text from pdf.js, not line-by-line
  std::regex transactionPattern(
    R"(((?:Jan|Feb|Mar|...|Dec)[a-z]*\s+\d{1,2}|
        \d{1,2}[/-]\d{1,2}[/-]\d{2,4})\s+)"
    // followed by post date, description, amount patterns
  );
}

why this works:

  • pdf.js extracts continuous text with multiple spaces, not lines
  • pattern matches entire transaction sequence in stream
  • format-agnostic: works regardless of bank's layout

the compilation pipeline

emcc -O2 --bind -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 \
  transaction_extractor.cpp -o bank_analyzer.js

output: bank_analyzer.wasm (236KB) + bank_analyzer.js (88KB glue code)


the javascript ↔ c++ bridge

svelte:

const module = await loadAnalyzerModule();
const transactions = module.extractTransactions(text);
transactionStore.set(transactions); // reactive update

c++:

EMSCRIPTEN_BINDINGS(bank_analyzer) {
    function("extractTransactions", &extractTransactions);
}

why svelte? reactive stores + no virtual DOM, leading to instant UI updates, and it's also really lightweight compared to react


the gotchas

⚠️ memory: use -s ALLOW_MEMORY_GROWTH=1 or you'll hit "memory access out of bounds"

⚠️ async: always await loadModule() before calling wasm functions

⚠️ optimization: -O2 is the sweet spot (236KB), -O3 smaller but slower compile

⚠️ debugging: add -gsource-map for chrome devtools support


performance

breakdown (chrome 131, m1 mac):

wasm load:     22ms (cached)
pdf extract:   45ms
c++ regex:     87ms
svelte render: 3ms
total:         157ms

vs. alternatives:

  • react + js: 395ms (2.5x slower)
  • python backend: ~700ms (4.5x slower + network)

why svelte wins:

  • no virtual dom = ui updates instantly
  • compiler-based = 12kb bundle vs react's 45kb
  • direct dom manipulation = zero reconciliation overhead
cx: the real dom is the actual, browser-rendered representation of a web page
that is slow to update, while the virtual dom is a lightweight, in-memory
javascript representation of the real dom that is much faster for handling updates

production: 95%+ accuracy, zero server costs, gdpr-compliant by design


why this stack works

wasm for compute, svelte for ui = performance without complexity

WASM: near-native speed, no GC pauses, type-safe
Svelte: direct DOM updates, compile-time reactivity

when to use:

  • performance-critical apps (data viz, finance, scientific tools)
  • privacy-first (client-side compute, no telemetry)
  • edge computing (cloudflare workers + sveltekit)

when not to:

  • simple CRUD apps (svelte alone is fine, skip wasm)
  • large teams already invested in react

resources

docs:

my implementation: