Skip to content

Instantly share code, notes, and snippets.

@ernstki
Last active October 20, 2025 01:14
Show Gist options
  • Select an option

  • Save ernstki/bd724543445e293a0a997b342f2087ab to your computer and use it in GitHub Desktop.

Select an option

Save ernstki/bd724543445e293a0a997b342f2087ab to your computer and use it in GitHub Desktop.
Multi-line replacement in sed - https://stackoverflow.com/q/75220171

Multi-line text replacement with sed

Original question: "Catch an entire function with grep" on Stack Overflow

It is possible to use (GNU) grep to do multiline matching1, but you're probably making more work for yourself that way, because sed can do basically everything grep can plus make replacements.

Using grep -zoP to do multiline matching is fundamentally the same trick as what's proposed below: turning your file into one long string, separated by non-printable ASCII characters. There are ways to accomplish this in other shells, but for convenience, I assume Bash or Z shell, for their ANSI-C quoting and command substitution features.

The short answer is:

SOH=$'\001'
STX=$'\002'

<File1.cpp tr \\n $STX
  | sed "s$SOH$(tr \\n $STX <File1.cpp)$SOH$(tr \\n $STX <File2.cpp)$SOH" \
  | tr $STX \\n

The demo.sh and multised scripts from this gist demonstrate this in more detail. Also see ascii(7), probably available as man ascii on your system.

What this does in English is:

  1. convert linefeeds in the C++ source files into the ASCII STX (start of text) character, then
  2. performs the replacement using the ASCII SOH (start of header) character as the delimiter for sed's s command
  3. finally, it converts the STXs back to linefeeds

There is no magic to the order of the characters, or the specific characters I chose. They should just be two characters that don't appear in the files you're processing, that's all, and it's quite unlikely that these two will, in either source code or prose.

The s command in sed is normally delimited by /s, but that's problematic, because slashes are highly likely to appear in the input. Even though the man page might not explicitly mention it, you should be able to use other characters besides / (or SOH) as the delimiter for s in either the GNU/Linux or *BSD (which includes macOS) versions of sed.

Potential pitfalls

One downside I can think of offhand is you need to treat the contents of File1.cpp as a regular expression, which implies a knowledge of the regular expression syntax used by sed, in terms of what metacharacters you'd need to escape to ensure a reliable match. That's left as an exercise for the reader.

Aside: manually inputting non-printable ASCII characters in the terminal

It's clearer to others when you define the unprintable characters with mnemonic variable names, as I did above, but if you're just doing one-liners in the shell, you can enter an ASCII SOH or STX in most terminals I'm aware of by pressing Ctrl+V, Ctrl+A or Ctrl+V, Ctrl+B. This is also how you enter a literal tab character in the terminal, since Tab normally triggers tab-completion: Ctrl+V, Tab or Ctrl+V, Ctrl+I.

Do you see any pattern with the letters there? ;)

The ASCII BEL is a also fun character to use for tricks like this, because it sounds the terminal bell when printed, and it's very easy to type in Bash or Z shell. Try printf \\a and then refer to man ascii and man printf for why this works.

The more you know! 🌠

Footnotes

  1. See this Unix & Linux answer, for example.

# editor detritus
*~
.*sw?
# test files
*.new
File1.cpp: void Component::initialize()
File1.cpp: {
File1.cpp: my_component = new ComponentClass();
File1.cpp: }
File2.cpp: void Component::initialize()
File2.cpp: {
File2.cpp: if (doInit)
File2.cpp: {
File2.cpp: my_component = new ComponentClass();
File2.cpp: }
File2.cpp: else
File2.cpp: {
File2.cpp: my_component.ptr = null;
File2.cpp: }
File2.cpp: }
File3.cpp: /* A third file, containing the function to be replaced */
File3.cpp: void Component::initialize()
File3.cpp: {
File3.cpp: my_component = new ComponentClass();
File3.cpp: }
File1.cpp with STXs for LFs:
void Component::initialize()^B{^B my_component = new ComponentClass();^B}^B
Replacement made:
/* A third file, containing the function to be replaced */^Bvoid Component::initialize()^B{^B if (doInit)^B {^B my_component = new ComponentClass();^B }^B else^B {^B my_component.ptr = null;^B }^B}^B
Final result, STXes converted back to LFs:
/* A third file, containing the function to be replaced */
void Component::initialize()
{
if (doInit)
{
my_component = new ComponentClass();
}
else
{
my_component.ptr = null;
}
}
#!/bin/bash
SOH=$'\001'
STX=$'\002'
for f in File{1..3}.cpp; do
sed "s/^/$f: /" $f
echo
done
echo File1.cpp with STXs for LFs:
<File1.cpp tr \\n $STX | cat -A
echo; echo
echo "Replacement made:"
<File3.cpp tr \\n $STX \
| sed "s$SOH$(tr \\n $STX <File1.cpp)$SOH$(tr \\n $STX <File2.cpp)$SOH" \
| cat -A
echo -e "\n\nFinal result, STXes converted back to LFs:"
<File3.cpp tr \\n $STX \
| sed "s$SOH$(tr \\n $STX <File1.cpp)$SOH$(tr \\n $STX <File2.cpp)$SOH" \
| tr $STX \\n
void Component::initialize()
{
my_component = new ComponentClass();
}
void Component::initialize()
{
if (doInit)
{
my_component = new ComponentClass();
}
else
{
my_component.ptr = null;
}
}
/* A third file, containing the function to be replaced */
void Component::initialize()
{
my_component = new ComponentClass();
}
#!/bin/bash
##
## perform multi-line replacement of one file with another, within a third*
##
## *or fourth, fifth…
##
## Author: Kevin Ernst <ernstki -at- mail.uc.edu>
## Date: 19 Oct 2025
## License: MIT or ISC, at your option
## Homepage: https://gist.github.com/ernstki/bd724543445e293a0a997b342f2087ab
## See also: https://stackoverflow.com/q/75220171
##
# set TRACE=1 in the environment to enable execution tracing
(( TRACE )) && set -x
set -euo pipefail
# see ascii(7); there is nothing magic about these two characters except that
# they're highly unlikely to appear in a repository of (plaintext) source code
SOH=$'\001'
STX=$'\002'
ME=${BASH_SOURCE##*/}
HOMEPAGE='https://gist.github.com/ernstki/bd724543445e293a0a997b342f2087ab'
expect=
replace=
quiet=
clobber=
flatten=
dryrun=
targets=()
endofopts=
while (( $# )); do
case $1 in
-h|-\?|--h*|--flags)
echo "
$ME - replaces multi-line string in EXPECT file with contents of REPLACE file
${ME//?/ } for each named TARGET file (or standard input)
usage:
$ME [-h|--help] [-q|--quiet|--silent] [-o|--overwrite|-y|--yes|--confirm]
${ME//?/ } [--flatten] [-n|--dry-run|-s|--simulate] EXPECT REPLACE TARGET [TARGET...]
examples:
# read from TARGET, write to stdout
$ME EXPECT REPLACE TARGET -
# read from stdin, write to stdout (subsequent arguments ignored)
$ME EXPECT REPLACE [-]
# read files from a nested hierarchy and write all to c.w.d. (\"flattened\")
find /other/path -name '*.cpp' | $ME EXPECT REPLACE --flatten
having problems?
Leave a comment at $HOMEPAGE
"
exit
;;
-q|--quiet|--silent)
quiet=1
;;
-o|--overwrite|-y|--yes|--confirm)
clobber=1
;;
--flatten)
flatten=1
;;
-n|--dry*|-s|--simulate)
dryrun=1
;;
--)
endofopts=1
;;
-*)
if (( !endofopts )); then
echo "Unknown option '$1'." >&2
exit 1
fi
;& # fall through
*)
if [[ -z $expect ]]; then
if ! [[ -r "$1" ]]; then
echo "Unreadable expected (pattern) file '$1'." >&2
exit 1
fi
expect=$1
elif [[ -z $replace ]]; then
if ! [[ -r "$1" ]]; then
echo "Unreadable replacement file '$1'." >&2
exit 1
fi
replace=$1
else
if [[ $1 != - && ! -r "$1" ]]; then
echo "Unreadable target file '$1'." >&2
exit 1
fi
targets+=("$1")
fi
;;
esac
shift
done
if [[ -z $expect || -z $replace ]]; then
echo "Both EXPECT and REPLACE filenames are required; try '--help'." >&2;
exit 1
fi
# shellcheck disable=SC2086
replace() {
tr \\n $STX \
| sed "s$SOH$(tr \\n $STX <"$expect")$SOH$(tr \\n $STX <"$replace")$SOH" \
| tr $STX \\n
}
if [[ -z ${targets:-} || ${targets[0]} == - ]]; then
if (( dryrun )); then
echo "Reading from stdin, ignoring '-n' / '--dry-run' option." >&2;
elif [[ -t 0 ]]; then
echo "Reading from stdin…" >&2
fi
# hope there's something on stdin or do the typical Unix thing & just stare
cat | replace
exit
fi
(( quiet )) ||
echo "Replacing '$expect's contents with '$replace' in target files…" >&2
failures=0
for infile in "${targets[@]}"; do
if (( flatten )); then
outfile=${infile##*/}
(( quiet )) || echo "Writing '$infile' to './$outfile'…" >&2
else
outfile=$infile.new
(( quiet )) || echo "Writing '$infile' to '$outfile'…" >&2
fi
if [[ -s $outfile ]] && (( !clobber )); then
echo "File '$outfile' exists; refusing to clobber without '-o' / '--overwrite'." >&2;
failures=$(( failures + 1 ))
continue
fi
if (( dryrun )); then
echo DRY RUN: \< "'$infile'" replace \> "'$outfile'"
else
< "$infile" replace > "$outfile"
fi
done
exit $failures
# multised
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment