sh2 logo

Pipelines and Text Processing

This tutorial teaches you how to build pipelines in sh2—and when to let Bash do the heavy lifting instead.

What you’ll learn:

  • Write readable sh2 pipelines with run(...) | run(...)
  • Capture pipeline output safely
  • Understand how status() works with pipelines
  • Know when sh2 helps vs when Bash is simpler
  • Use sh(...) as an escape hatch (rarely)

Prerequisites: Complete Tutorial 01: Getting Started and Tutorial 03: Error Handling.


1. Why Pipelines Feel Easy in Bash—And Why They Become Hard

Bash pipelines are concise and powerful:

ps aux | grep nginx | grep -v grep | awk '{print $2}' | xargs kill -9

This “works” for quick commands. But as pipelines grow, problems emerge:

Problem 1: Quoting inside awk/sed

# Which quotes go where? Easy to break.
grep "error" log.txt | awk -F: '{print $1 " had " $2 " errors"}'

Problem 2: Status handling

set -o pipefail
result=$(curl -s "$url" | jq '.data')
# Did curl fail? Did jq fail? $? only tells you "something failed"

Problem 3: Readability at scale

find . -name "*.log" -mtime +7 -print0 | \
    xargs -0 grep -l "ERROR" | \
    sort -u | \
    head -20 | \
    while read f; do echo "Problem: $f"; done

This is hard to review. What happens if find fails? What if a filename has special characters? The answers require deep Bash knowledge.


2. sh2 Structured Pipelines

sh2 pipelines use | between run(...) calls. Each argument is safely quoted.

Simple example

func main() {
    run("printf", "a\nb\nc\n") | run("wc", "-l")
}

Output: 3

Slightly longer: count unique shells

func main() {
    # Get unique shells from /etc/passwd
    let shells = capture(
        run("cut", "-d:", "-f7", "/etc/passwd")
        | run("sort")
        | run("uniq", "-c")
        | run("sort", "-rn")
    )
    
    print("Shell usage:")
    print(shells)
}

What’s clearer:

  • Each stage is a distinct run(...) call
  • Arguments don’t need quoting gymnastics
  • The pipeline structure is visually obvious

3. Capturing Pipeline Output

Use capture(...) to store pipeline output in a variable:

func main() {
    let line_count = capture(
        run("find", ".", "-name", "*.txt")
        | run("wc", "-l")
    )
    
    print("Found " & trim(line_count) & " text files")
}

Use trim(...) to remove trailing whitespace

Many commands (like wc) output extra whitespace. Use trim():

let count = trim(capture(run("ls", "-1") | run("wc", "-l")))

Handling failures with allow_fail=true

For pipelines that might fail:

func main() {
    let result = capture(
        run("grep", "ERROR", "app.log")
        | run("wc", "-l"),
        allow_fail=true
    )
    
    if status() != 0 {
        print("No errors found (or file missing)")
    } else {
        print("Error count: " & trim(result))
    }
}

4. Exit Status Rules

How status() works after a pipeline

After a pipeline, status() reflects the last command’s exit code (like Bash default behavior).

func main() {
    # Pipeline: true | false
    run("true") | run("false")
    print("Status: " & status())  # Prints: 1 (from false)
}

This is important: if grep finds nothing (exit 1) but wc succeeds (exit 0), the overall status is 0.

Pattern: if pipeline fails, print error and exit

func main() {
    let output = capture(
        run("curl", "-s", "https://api.example.com/data")
        | run("jq", ".items"),
        allow_fail=true
    )
    
    if status() != 0 {
        print_err("Pipeline failed (curl or jq)")
        exit(1)
    }
    
    print(output)
}

When you need per-stage error checking

Break the pipeline into steps:

func main() {
    let raw = capture(run("curl", "-sf", "https://api.example.com/data"), allow_fail=true)
    if status() != 0 {
        print_err("curl failed")
        exit(1)
    }
    
    # Write to temp file for jq
    write_file("/tmp/api_data.json", raw)
    
    let parsed = capture(run("jq", ".items", "/tmp/api_data.json"), allow_fail=true)
    if status() != 0 {
        print_err("jq failed: invalid JSON?")
        exit(1)
    }
    
    print(parsed)
}

5. Real Examples

Example 1: Parse w output to list unique usernames

Bash:

w -h | awk '{print $1}' | sort -u

sh2:

func main() {
    let users = capture(
        run("w", "-h")
        | run("awk", "{print $1}")
        | run("sort", "-u")
    )
    
    print("Logged-in users:")
    for user in lines(users) {
        if user != "" {
            print("  " & user)
        }
    }
}

What got clearer:

  • Intent is explicit: “get users, then iterate”
  • Adding per-user logic is easy

What got worse:

  • More lines than the Bash one-liner

Example 2: Extract and count shells from /etc/passwd

Bash:

cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn | head -5

sh2:

func main() {
    print("Top 5 shells by user count:")
    
    run("cut", "-d:", "-f7", "/etc/passwd")
    | run("sort")
    | run("uniq", "-c")
    | run("sort", "-rn")
    | run("head", "-n", "5")
}

What got clearer:

  • Arguments are unambiguous ("-d:" vs -d: quoting issues)
  • Easy to modify one stage

What got worse:

  • Slightly more verbose

Example 3: JSON API with error handling (curl | jq)

Bash:

result=$(curl -sf "$API_URL" | jq -r '.name') || { echo "Failed"; exit 1; }
echo "Name: $result"

sh2:

func main() {
    let url = "https://api.github.com/repos/siu-mak/sh2lang"
    
    let data = capture(
        run("curl", "-sf", url)
        | run("jq", "-r", ".name"),
        allow_fail=true
    )
    
    if status() != 0 {
        print_err("Failed to fetch or parse API response")
        exit(1)
    }
    
    print("Repo name: " & trim(data))
}

What got clearer:

  • Error handling is explicit
  • URL is a variable, not inline with quoting risk

What got worse:

  • More lines for the same result

Example 4: Grep logs and summarize error counts

Bash:

grep -c "ERROR" *.log 2>/dev/null | awk -F: '{sum+=$2} END {print sum}'

sh2:

func main() {
    # Find all log files and count ERROR lines
    let logs = capture(run("find", ".", "-name", "*.log", "-type", "f"), allow_fail=true)
    
    let total = 0
    for log in lines(logs) {
        if log != "" {
            let count = trim(capture(
                run("grep", "-c", "ERROR", log),
                allow_fail=true
            ))
            
            if status() == 0 {
                if count != "" {
                    if count != "0" {
                        print(log & ": " & count & " errors")
                        # Note: string-to-int addition requires a workaround
                    }
                }
            }
        }
    }
    
    print("See above for per-file counts")
}

What got clearer:

  • Works correctly even if no .log files exist
  • Handles files with spaces in names

What got worse:

  • Significantly more verbose
  • This is a case where Bash’s terseness wins

Example 5: CSV formatting with awk

For complex text transformations, awk shines. Here’s a case where using sh(...) is reasonable:

Bash:

awk -F, '{printf "%-20s %s\n", $1, $2}' data.csv

sh2:

func main() {
    # Use run() for simple awk patterns
    run("awk", "-F,", "{printf \"%-20s %s\\n\", $1, $2}", "data.csv")
}

Note: Even awk expressions work as run() arguments when they’re simple. For complex multi-line awk scripts, consider keeping them in a separate .awk file and calling run("awk", "-f", "script.awk", "data.csv").


6. Where Bash Still Wins

Be honest: sh2 is not always the best choice.

Category 1: Dense awk/sed one-liners

awk '{gsub(/foo/,"bar"); print}' file.txt
sed -n '/START/,/END/p' log.txt

These are compact, well-tested patterns. Wrapping them in sh2 adds verbosity without adding safety.

Category 2: Process substitution

diff <(sort file1.txt) <(sort file2.txt)

sh2 has no equivalent. You’d need a workaround with temp files.

Category 3: Job control

long_task &
pid=$!
# ... do other work ...
wait $pid

Bash uses the terse & operator for quick background jobs. sh2 prefers explicit, structured job control via spawn(run(...)) and wait(pid).

The honest truth

sh2 is great for structured glue—scripts that run commands, check status, branch, and log. It’s not trying to replace Bash’s text-processing DSL.


7. Rule of Thumb + Decision Table

Your pipeline… Prefer
Is a quick one-liner you’ll run once Bash
Uses complex awk/sed patterns Bash (or external script)
Needs error handling per stage sh2
Will be reviewed by others sh2
Mixes commands with sudo, confirm, logging sh2
Uses process substitution or interactive job control (fg/bg) Bash
Has 2–4 stages with standard tools Either works

Quick decision flow

  1. Will this be reviewed? → Lean toward sh2
  2. Is it mostly text transformation? → Lean toward Bash
  3. Do I need per-command error handling? → sh2
  4. Is it a one-off command? → Bash
  5. Does it use sudo, confirm, or file logging? → sh2

8. The Escape Hatch: sh(...)

When you genuinely need shell syntax, use sh(...). But be explicit about why:

func main() {
    # sh(...) because: process substitution <(...) has no sh2 equivalent
    run("diff", "<(sort file1.txt)", "<(sort file2.txt)", allow_fail=true)
    # ❌ This won't work: <(...) is literal text, not substitution
    
    # Correct approach: use temp files or accept sh(...)
}

If you must use complex pipelines with shell features:

func main() {
    # sh(...) because: multi-stage pipeline with subshell grouping
    # Note: No user input is interpolated here; pipeline is static
    let result = capture(
        run("sh", "-c", "cat *.log 2>/dev/null | grep ERROR | wc -l"),
        allow_fail=true
    )
    
    if status() == 0 {
        print("Total errors: " & trim(result))
    }
}

Safety rule: Never interpolate user input into sh(...) commands. Always validate first.


Next Steps

You now understand when sh2 pipelines help and when Bash is the right tool.

Feature articles

Reference


Happy piping! 🔧