Assumptions:
- As we read a new line we need to test for overlaps against all previous non-overlapping lines
- If a new line does not overlap with any of the previous non-overlapping lines then ...
- a) we save the new line as a new member of the group of non-overlapping lines and
- b) print the new line to stdout
awk
idea:
awk '
BEGIN { FS=OFS="\t" }
{ for (i=1; i<=cnt; i++) # loop through array of previous lines
if ( ( $2 >= start[i] && $2 <= end[i] ) || # does current "start" overlap with a previous line?
( $3 >= start[i] && $3 <= end[i] ) ) # does current "end" overlap with a previous line?
next # if there is an overlap then skip this line and process the next line of input
start[++cnt] = $2 # we have a new non-overlapping line so save the start and end points
end[cnt] = $3
print # print current line to stdout
}
' file.txt
Name1 1 3
Name2 7 9
Name4 5 6
You can keep previous non-overlapping start and end positions as indices and values in an array in awk so you can easily iterate through them for each record to test if the current start and end positions overlap with any of them, and skip the current record if they do:
awk '-F\t' '{for(s in a)if($2<=a[s]&&s<=$3)next;a[$2]=$3}1' file.txt
Demo: https://awk.js.org/?snippet=REkxdr
another answer:Thank you to both @markp-fuso and @blhsing
When modified to fit my file, both solutions work exactly as expected!
First Solution (provided by @markp-fuso)
awk '
BEGIN { FS=OFS="\t" }
{ for (i=1; i<=cnt; i++) # loop through array of previous lines
if ( ( $2 >= start[i] && $2 <= end[i] ) || # does current "start" overlap with a previous line?
( $3 >= start[i] && $3 <= end[i] ) ) # does current "end" overlap with a previous line?
next # if there is an overlap then skip this line and process the next line of input
start[++cnt] = $2 # we have a new non-overlapping line so save the start and end points
end[cnt] = $3
print # print current line to stdout
}
' file.txt
Second Solution (provided by @ blhsing)
awk '-F\t' '{for(s in a)if($2<=a[s]&&s<=$3)next;a[$2]=$3}1' file.txt