Original question: "Catch an entire function with grep" on Stack Overflow
It is possible to use (GNU) grep to do multiline matching1, but you're
probably making more work for yourself that way, because sed can do basically
everything grep can plus make replacements.
Using grep -zoP to do multiline matching is fundamentally the same trick as
what's proposed below: turning your file into one long string, separated by
non-printable ASCII characters. There are ways to accomplish this in other
shells, but for convenience, I assume Bash or Z shell, for their
ANSI-C quoting and command substitution features.
The short answer is:
SOH=$'\001'
STX=$'\002'
<File1.cpp tr \\n $STX
| sed "s$SOH$(tr \\n $STX <File1.cpp)$SOH$(tr \\n $STX <File2.cpp)$SOH" \
| tr $STX \\nThe demo.sh and multised scripts from
this gist demonstrate this in more detail. Also see ascii(7),
probably available as man ascii on your system.
What this does in English is:
- convert linefeeds in the C++ source files into the ASCII
STX(start of text) character, then - performs the replacement using the ASCII
SOH(start of header) character as the delimiter for sed'sscommand - finally, it converts the
STXs back to linefeeds
There is no magic to the order of the characters, or the specific characters I chose. They should just be two characters that don't appear in the files you're processing, that's all, and it's quite unlikely that these two will, in either source code or prose.
The s command in sed is normally delimited by /s, but that's problematic,
because slashes are highly likely to appear in the input. Even though the man
page might not explicitly mention it, you should be able to use other
characters besides / (or SOH) as the delimiter for s in either the
GNU/Linux or *BSD (which includes macOS) versions of sed.
One downside I can think of offhand is you need to treat the contents of
File1.cpp as a regular expression, which implies a knowledge of the regular
expression syntax used by sed, in terms of what metacharacters
you'd need to escape to ensure a reliable match. That's left as an exercise for
the reader.
It's clearer to others when you define the unprintable characters with mnemonic
variable names, as I did above, but if you're just doing one-liners in the
shell, you can enter an ASCII SOH or STX in most terminals I'm aware of by
pressing Ctrl+V, Ctrl+A or
Ctrl+V, Ctrl+B. This is also how
you enter a literal tab character in the terminal, since Tab
normally triggers tab-completion: Ctrl+V, Tab
or Ctrl+V, Ctrl+I.
Do you see any pattern with the letters there? ;)
The ASCII BEL is a also fun character to use for tricks like this, because it
sounds the terminal bell when printed, and it's very easy to type in Bash or Z
shell. Try printf \\a and then refer to man ascii and man printf for why
this works.
The more you know! 🌠
Footnotes
-
See this Unix & Linux answer, for example. ↩