txtmet

$ a command-line text metric utility implemented in both C and Rust

overview

TXTMET is a lightweight CLI tool that reads a text file and outputs word count, character count, line count, sentence count, or unique word count depending on which flags are passed. Born out of frustration with tools that are either too heavy or not flexible enough for simple counting tasks. Implemented in both C and Rust.

tech stack

  • C (standard library, getopt): streaming fgetc character processing with lastc state tracking
  • Rust (standard library): fs::read_to_string with in_word boolean state and match arms
  • getopt: Unix-style CLI flag parsing in the C version for selecting which metrics to output

links

how it works

Both implementations use the same core logic for all metrics. Word detection tracks a boolean state (lastc in C, in_word in Rust): a transition from non-whitespace to whitespace increments the word counter. Sentence detection counts terminal punctuation characters (., !, ?). Every punctuation mark is counted, so run-on sentences and ellipses are treated naively.

The C version processes the file one character at a time via fgetc, keeping memory usage flat regardless of file size. It uses getopt for Unix-style flag parsing, so flags can be combined freely. The Rust version reads the entire file into a string with fs::read_to_string and iterates over characters with match arms for the state transitions, relying on Rust's ownership model to guarantee the buffer is freed at scope exit.

Test files include short samples and a longer text (misery.txt, 1755 words) for comparison against known counts.

takeaways

Implementing the same utility in two languages side by side makes idiomatic differences concrete rather than abstract. Some key takeaways:

  • C file I/O, streaming fgetc processing and getopt flag parsing
  • Rust ownership model and pattern matching for the same byte-level logic
  • State machine character processing, tracking transitions rather than scanning for delimiters
  • The practical difference in defensive boilerplate between C and Rust for the same task