To quote from @SnoringFrog's topic-creation request:
"One of the biggest gotchas using sed is scripts that fail (or succeed in an unexpected way) because they were written for one and not the other. Simple run-down of the more major differences would be good."
macOS uses the BSD version of sed
[1], which differs in many respects from the GNU sed
version that comes with Linux distros.
Their common denominator is the functionality decreed by POSIX: see the POSIX sed
spec.
The most portable approach is to use POSIX features only, which, however, limits functionality:
Notably, POSIX specifies support only for basic regular expressions, which have many limitations (e.g., no support for |
(alternation) at all, no direct support for +
and ?
) and different escaping requirements.
sed
(without -r
), does support \|
, \+
and \?
, which is NOT POSIX-compliant; use --posix
to disable (see below).To use POSIX features only:
(both versions): use only the -n
and -e
options (notably, do not use -E
or -r
to turn on support for extended regular expressions)
GNU sed
: add option --posix
to ensure POSIX-only functionality (you don't strictly need this, but without it you could end up inadvertently using non-POSIX features without noticing; caveat: --posix
itself is not POSIX-compliant)
Using POSIX-only features means stricter formatting requirements (forgoing many conveniences available in GNU sed
):
\n
and \t
are generally NOT supported.b
) must be followed by an actual newline or continuation via a separate -e
option.However, both versions implement extensions to the POSIX standard:
sed
implements more).If you need to support BOTH platforms (discussion of differences):
Incompatible features:
Use of the -i
option without an argument (in-place updating without backup) is incompatible:
sed
: MUST use -i ''
sed
: MUST use just -i
(equivalent: -i''
) - using -i ''
does NOT work.-i
sensibly turns on per-input-file line numbering in GNU sed
and recent versions of BSD sed
(e.g., on FreeBSD 10), but does NOT on macOS as of 10.12.
Note that in the absence of -i
all versions number lines cumulatively across input files.
If the last input line does not have a trailing newline (and is printed):
sed
: always appends a newline on output, even if the input line doesn't end in one.sed
: preserves the trailing-newline status, i.e., it appends a newline only if the input line ended in one.Common features:
sed
scripts to what BSD sed
supports, they will generally work in GNU sed
too - with the notable exception of using platform-specific extended regex features with -E
. Obviously, you'll also forgo extensions that are specific to the GNU version. See next section.Guidelines for cross-platform support (OS X/BSD, Linux), driven by the stricter requirements of the BSD version:
Note that that the shorthands macOS and Linux are occasionally used below to refer to the BSD and GNU versions of sed
, respectively, because they are the stock versions on each platform. However, it is possible to install GNU sed
on macOS, for instance, using Homebrew with brew install gnu-sed
.
Note: Except when the -r
and -E
flags are used (extended regexes), the instructions below amount to writing POSIX-compliant sed
scripts.
For POSIX compliance, you must restrict yourself to POSIX BREs (basic regular expressions), which are, unfortunately, as the name suggests, quite basic.
Caveat: do not assume that \|
, \+
and \?
are supported: While GNU sed
supports them (unless --posix
is used), BSD sed
does not - these features are not POSIX-compliant.
While \+
and \?
can be emulated in POSIX-compliant fashion :
\{1,\}
for \+
,
\{0,1\}
for \?
,
\|
(alternation) cannot, unfortunately.
For more powerful regular expressions, use -E
(rather than -r
) to support EREs (extended regular expressions) (GNU sed
doesn't document -E
, but it does work there as an alias of -r
; newer version of BSD sed
, such as on FreeBSD 10, now also support -r
, but the macOS version as of 10.12 does not).
Caveat: Even though use of -r
/ -E
means that your command is by definition not POSIX-compliant, you must still restrict yourself to POSIX EREs (extended regular expressions). Sadly, this means that you won't be able to use several useful constructs, notably:
\<
on Linux, [[:<]]
on OS X).s
function calls), because BSD sed
doesn't support them in extended regexes (but, curiously, does so in basic ones, where they are POSIX-mandated).Control-character escape sequences such as \n
and \t
:
In regexes (both in patterns for line selection and the first argument to the s
function), assume that only \n
is recognized as an escape sequence (rarely used, since the pattern space is usually a single line (without terminating \n
), but not inside a character class, so that, e.g., [^\n]
doesn't work; (if your input contains no control chars. other than \t
, you can emulate [^\n]
with [[:print:][:blank:]]
; otherwise, splice control chars. in as literals[2]) - generally, include control characters as literals, either via spliced-in ANSI C-quoted strings (e.g., $'\t'
) in shells that support it (bash,
ksh, zsh
), or via command substitutions using printf
(e.g., "$(printf '\t')"
).
sed 's/\t/-/' <<<$'a\tb' # -> 'a-b'
sed 's/'$'\t''/-/' <<<$'a\tb' # ANSI C-quoted string
sed 's/'"$(printf '\t')"'/-/' <<<$'a\tb' # command subst. with printf
In replacement strings used with the s
command, assume that NO control-character escape sequences are supported, so, again, include control chars. as literals, as above.
sed 's/-/\t/' <<<$'a-b' # -> 'a<tab>b'
sed 's/-/'$'\t''/' <<<'a-b'
sed 's/-/'"$(printf '\t')"'/' <<<'a-b'
Ditto for the text arguments to the i
and a
functions: do not use control-character sequences - see below.
Labels and branching: labels as well as the label-name argument to the b
and t
functions must be followed by either by a literal newline or a spliced-in $'\n'
. Alternatively, use multiple -e
options and terminate each right after the label name.
sed -n '/a/ bLBL; d; :LBL p' <<<$'a\nb' # -> 'a'
sed -n '/a/ bLBL d; :LBL p' <<<$'a\nb'
$\n
instances):sed -n '/a/ bLBL'$'\n''d; :LBL'$'\n''p' <<<$'a\nb'
-e
options):sed -n -e '/a/ bLBL' -e 'd; :LBL' -e 'p' <<<$'a\nb'
Functions i
and a
for inserting/appending text: follow the function name by \
, followed either by a literal newline or a spliced-in $'\n'
before specifying the text argument.
sed '1 i new first line' <<<$'a\nb' # -> 'new first line<nl>a<nl>b'
sed -e '1 i\'$'\n''new first line' <<<$'a\nb'
-e
, the text argument is inexplicably not newline-terminated on output on macOS (bug?).\n
and \t
in the text argument, as they're only supported on Linux.\
-escape them.-e
option (this is a general requirement that applies to all versions).Inside function lists (multiple function calls enclosed in {...}
), be sure to also terminate the last function, before the closing }
, with ;
.
sed -n '1 {p;q}' <<<$'a\nb' # -> 'a'
sed -n '1 {p;q;}' <<<$'a\nb'
GNU sed
-specific features missing from BSD sed
altogether:
GNU features you'll miss out on if you need to support both platforms:
Various regex-matching and substitution options (both in patterns for line selection and the first argument to the s
function):
I
option for case-INsensitive regex matching (incredibly, BSD sed
doesn't support this at all).M
option for multi-line matching (where ^
/ $
match the start / end of each line)s
function, see https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-CommandEscape sequences
Substitution-related escape sequences such as \u
in the replacement argument of the s///
function that allow substring manipulation, within limits; e.g., sed 's/^./\u&/' <<<'dog' # -> 'Dog'
- see http://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command
Control-character escape sequences: in addition to \n
, \t
, ..., codepoint-based escapes; for instance, all of the following escapes (hex., octal, decimal) represent a single quote ('
): \x27
, \o047
, \d039
- see https://www.gnu.org/software/sed/manual/sed.html#Escapes
Address extensions, such as first~step
to match every step-th line, addr, +N
to match N lines following addr
, ... - see http://www.gnu.org/software/sed/manual/sed.html#Addresses
[1] The macOS sed
version is older than the version on other BSD-like systems such as FreeBSD and PC-BSD. Unfortunately, this means that you cannot assume that features that work in FreeBSD, for instance, will work [the same] on macOS.
[2] The ANSI C-quoted string $'\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177'
contains all ASCII control characters except \n
(and NUL), so you can use it in combination with [:print:]
for a pretty robust emulation of [^\n]
:
'[[:print:]'$'\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177'']