Pipelines and Text Processing

This tutorial teaches you how to build pipelines in sh2—and when to let Bash do the heavy lifting instead.

What you’ll learn:

Write readable sh2 pipelines with run(...) | run(...)
Capture pipeline output safely
Understand how status() works with pipelines
Know when sh2 helps vs when Bash is simpler
Use sh(...) as an escape hatch (rarely)

Prerequisites: Complete Tutorial 01: Getting Started and Tutorial 03: Error Handling.

1. Why Pipelines Feel Easy in Bash—And Why They Become Hard

Bash pipelines are concise and powerful:

ps aux | grep nginx | grep -v grep | awk '{print $2}' | xargs kill -9

This “works” for quick commands. But as pipelines grow, problems emerge:

Problem 1: Quoting inside awk/sed

# Which quotes go where? Easy to break.
grep "error" log.txt | awk -F: '{print $1 " had " $2 " errors"}'

Problem 2: Status handling

set -o pipefail
result=$(curl -s "$url" | jq '.data')
# Did curl fail? Did jq fail? $? only tells you "something failed"

Problem 3: Readability at scale

find . -name "*.log" -mtime +7 -print0 | \
    xargs -0 grep -l "ERROR" | \
    sort -u | \
    head -20 | \
    while read f; do echo "Problem: $f"; done

This is hard to review. What happens if find fails? What if a filename has special characters? The answers require deep Bash knowledge.

2. sh2 Structured Pipelines

sh2 pipelines use | between run(...) calls. Each argument is safely quoted.

Simple example

func main() {
    run("printf", "a\nb\nc\n") | run("wc", "-l")
}

Output: 3

Slightly longer: count unique shells

func main() {
    # Get unique shells from /etc/passwd
    let shells = capture(
        run("cut", "-d:", "-f7", "/etc/passwd")
        | run("sort")
        | run("uniq", "-c")
        | run("sort", "-rn")
    )
    
    print("Shell usage:")
    print(shells)
}

What’s clearer:

Each stage is a distinct run(...) call
Arguments don’t need quoting gymnastics
The pipeline structure is visually obvious

3. Capturing Pipeline Output

Use capture(...) to store pipeline output in a variable:

func main() {
    let line_count = capture(
        run("find", ".", "-name", "*.txt")
        | run("wc", "-l")
    )
    
    print("Found " & trim(line_count) & " text files")
}

Use `trim(...)` to remove trailing whitespace

Many commands (like wc) output extra whitespace. Use trim():

let count = trim(capture(run("ls", "-1") | run("wc", "-l")))

Handling failures with `allow_fail=true`

For pipelines that might fail:

func main() {
    let result = capture(
        run("grep", "ERROR", "app.log")
        | run("wc", "-l"),
        allow_fail=true
    )
    
    if status() != 0 {
        print("No errors found (or file missing)")
    } else {
        print("Error count: " & trim(result))
    }
}

4. Exit Status Rules

How `status()` works after a pipeline

After a pipeline, status() reflects the last command’s exit code (like Bash default behavior).

func main() {
    # Pipeline: true | false
    run("true") | run("false")
    print("Status: " & status())  # Prints: 1 (from false)
}

This is important: if grep finds nothing (exit 1) but wc succeeds (exit 0), the overall status is 0.

Pattern: if pipeline fails, print error and exit

func main() {
    let output = capture(
        run("curl", "-s", "https://api.example.com/data")
        | run("jq", ".items"),
        allow_fail=true
    )
    
    if status() != 0 {
        print_err("Pipeline failed (curl or jq)")
        exit(1)
    }
    
    print(output)
}

When you need per-stage error checking

Break the pipeline into steps:

func main() {
    let raw = capture(run("curl", "-sf", "https://api.example.com/data"), allow_fail=true)
    if status() != 0 {
        print_err("curl failed")
        exit(1)
    }
    
    # Write to temp file for jq
    write_file("/tmp/api_data.json", raw)
    
    let parsed = capture(run("jq", ".items", "/tmp/api_data.json"), allow_fail=true)
    if status() != 0 {
        print_err("jq failed: invalid JSON?")
        exit(1)
    }
    
    print(parsed)
}

5. Real Examples

Example 1: Parse `w` output to list unique usernames

Bash:

w -h | awk '{print $1}' | sort -u

sh2:

func main() {
    let users = capture(
        run("w", "-h")
        | run("awk", "{print $1}")
        | run("sort", "-u")
    )
    
    print("Logged-in users:")
    for user in lines(users) {
        if user != "" {
            print("  " & user)
        }
    }
}

What got clearer:

Intent is explicit: “get users, then iterate”
Adding per-user logic is easy

What got worse:

More lines than the Bash one-liner

Example 2: Extract and count shells from `/etc/passwd`

Bash:

cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn | head -5

sh2:

func main() {
    print("Top 5 shells by user count:")
    
    run("cut", "-d:", "-f7", "/etc/passwd")
    | run("sort")
    | run("uniq", "-c")
    | run("sort", "-rn")
    | run("head", "-n", "5")
}

What got clearer:

Arguments are unambiguous ("-d:" vs -d: quoting issues)
Easy to modify one stage

What got worse:

Slightly more verbose

Example 3: JSON API with error handling (curl | jq)

Bash:

result=$(curl -sf "$API_URL" | jq -r '.name') || { echo "Failed"; exit 1; }
echo "Name: $result"

sh2:

func main() {
    let url = "https://api.github.com/repos/siu-mak/sh2lang"
    
    let data = capture(
        run("curl", "-sf", url)
        | run("jq", "-r", ".name"),
        allow_fail=true
    )
    
    if status() != 0 {
        print_err("Failed to fetch or parse API response")
        exit(1)
    }
    
    print("Repo name: " & trim(data))
}

What got clearer:

Error handling is explicit
URL is a variable, not inline with quoting risk

What got worse:

More lines for the same result

Example 4: Grep logs and summarize error counts

Bash:

grep -c "ERROR" *.log 2>/dev/null | awk -F: '{sum+=$2} END {print sum}'

sh2:

func main() {
    # Find all log files and count ERROR lines
    let logs = capture(run("find", ".", "-name", "*.log", "-type", "f"), allow_fail=true)
    
    let total = 0
    for log in lines(logs) {
        if log != "" {
            let count = trim(capture(
                run("grep", "-c", "ERROR", log),
                allow_fail=true
            ))
            
            if status() == 0 {
                if count != "" {
                    if count != "0" {
                        print(log & ": " & count & " errors")
                        # Note: string-to-int addition requires a workaround
                    }
                }
            }
        }
    }
    
    print("See above for per-file counts")
}

What got clearer:

Works correctly even if no .log files exist
Handles files with spaces in names

What got worse:

Significantly more verbose
This is a case where Bash’s terseness wins

Example 5: CSV formatting with awk

For complex text transformations, awk shines. Here’s a case where using sh(...) is reasonable:

Bash:

awk -F, '{printf "%-20s %s\n", $1, $2}' data.csv

sh2:

func main() {
    # Use run() for simple awk patterns
    run("awk", "-F,", "{printf \"%-20s %s\\n\", $1, $2}", "data.csv")
}

Note: Even awk expressions work as run() arguments when they’re simple. For complex multi-line awk scripts, consider keeping them in a separate .awk file and calling run("awk", "-f", "script.awk", "data.csv").

6. Where Bash Still Wins

Be honest: sh2 is not always the best choice.

Category 1: Dense awk/sed one-liners

awk '{gsub(/foo/,"bar"); print}' file.txt
sed -n '/START/,/END/p' log.txt

These are compact, well-tested patterns. Wrapping them in sh2 adds verbosity without adding safety.

Category 2: Process substitution

diff <(sort file1.txt) <(sort file2.txt)

sh2 has no equivalent. You’d need a workaround with temp files.

Category 3: Job control

long_task &
pid=$!
# ... do other work ...
wait $pid

Bash uses the terse & operator for quick background jobs. sh2 prefers explicit, structured job control via spawn(run(...)) and wait(pid).

The honest truth

sh2 is great for structured glue—scripts that run commands, check status, branch, and log. It’s not trying to replace Bash’s text-processing DSL.

7. Rule of Thumb + Decision Table

Your pipeline…	Prefer
Is a quick one-liner you’ll run once	Bash
Uses complex awk/sed patterns	Bash (or external script)
Needs error handling per stage	sh2
Will be reviewed by others	sh2
Mixes commands with `sudo`, `confirm`, logging	sh2
Uses process substitution or interactive job control (`fg`/`bg`)	Bash
Has 2–4 stages with standard tools	Either works

Quick decision flow

Will this be reviewed? → Lean toward sh2
Is it mostly text transformation? → Lean toward Bash
Do I need per-command error handling? → sh2
Is it a one-off command? → Bash
Does it use sudo, confirm, or file logging? → sh2

8. The Escape Hatch: `sh(...)`

When you genuinely need shell syntax, use sh(...). But be explicit about why:

func main() {
    # sh(...) because: process substitution <(...) has no sh2 equivalent
    run("diff", "<(sort file1.txt)", "<(sort file2.txt)", allow_fail=true)
    # ❌ This won't work: <(...) is literal text, not substitution
    
    # Correct approach: use temp files or accept sh(...)
}

If you must use complex pipelines with shell features:

func main() {
    # sh(...) because: multi-stage pipeline with subshell grouping
    # Note: No user input is interpolated here; pipeline is static
    let result = capture(
        run("sh", "-c", "cat *.log 2>/dev/null | grep ERROR | wc -l"),
        allow_fail=true
    )
    
    if status() == 0 {
        print("Total errors: " & trim(result))
    }
}

Safety rule: Never interpolate user input into sh(...) commands. Always validate first.

Next Steps

You now understand when sh2 pipelines help and when Bash is the right tool.

Error Handling — Handle pipeline failures gracefully

Feature articles

Where Bash Still Wins — Honest comparison
No Implicit Expansion — Why sh2 strings are safe

Reference

Language Reference — Full pipeline syntax

Happy piping! 🔧

Source Code & Repository
👉 https://github.com/siu-mak/sh2lang

Pipelines and Text Processing

1. Why Pipelines Feel Easy in Bash—And Why They Become Hard

Problem 1: Quoting inside awk/sed

Problem 2: Status handling

Problem 3: Readability at scale

2. sh2 Structured Pipelines

Simple example

Slightly longer: count unique shells

3. Capturing Pipeline Output

Use trim(...) to remove trailing whitespace

Handling failures with allow_fail=true

4. Exit Status Rules

How status() works after a pipeline

Pattern: if pipeline fails, print error and exit

When you need per-stage error checking

5. Real Examples

Example 1: Parse w output to list unique usernames

Example 2: Extract and count shells from /etc/passwd

Example 3: JSON API with error handling (curl | jq)

Example 4: Grep logs and summarize error counts

Example 5: CSV formatting with awk

6. Where Bash Still Wins

Category 1: Dense awk/sed one-liners

Category 2: Process substitution

Category 3: Job control

The honest truth

7. Rule of Thumb + Decision Table

Quick decision flow

8. The Escape Hatch: sh(...)

Next Steps

Related tutorials

Feature articles

Reference

Use `trim(...)` to remove trailing whitespace

Handling failures with `allow_fail=true`

How `status()` works after a pipeline

Example 1: Parse `w` output to list unique usernames

Example 2: Extract and count shells from `/etc/passwd`

8. The Escape Hatch: `sh(...)`