Notification texts go here Contact Us Buy Now!

Eliminating overlapping entries based on start / end values in bash

Assumptions:

  • As we read a new line we need to test for overlaps against all previous non-overlapping lines
  • If a new line does not overlap with any of the previous non-overlapping lines then ...
  • a) we save the new line as a new member of the group of non-overlapping lines and
  • b) print the new line to stdout
One awk idea:
awk ' BEGIN { FS=OFS="\t" } { for (i=1; i<=cnt; i++) # loop through array of previous lines if ( ( $2 >= start[i] && $2 <= end[i] ) || # does current "start" overlap with a previous line? ( $3 >= start[i] && $3 <= end[i] ) ) # does current "end" overlap with a previous line? next # if there is an overlap then skip this line and process the next line of input start[++cnt] = $2 # we have a new non-overlapping line so save the start and end points end[cnt] = $3 print # print current line to stdout } ' file.txt
This generates:
Name1 1 3 Name2 7 9 Name4 5 6
another answer:

You can keep previous non-overlapping start and end positions as indices and values in an array in awk so you can easily iterate through them for each record to test if the current start and end positions overlap with any of them, and skip the current record if they do:

awk '-F\t' '{for(s in a)if($2<=a[s]&&s<=$3)next;a[$2]=$3}1' file.txt

Demo: https://awk.js.org/?snippet=REkxdr

another answer:

Thank you to both @markp-fuso and @blhsing

When modified to fit my file, both solutions work exactly as expected!

First Solution (provided by @markp-fuso)

awk ' BEGIN { FS=OFS="\t" } { for (i=1; i<=cnt; i++) # loop through array of previous lines if ( ( $2 >= start[i] && $2 <= end[i] ) || # does current "start" overlap with a previous line? ( $3 >= start[i] && $3 <= end[i] ) ) # does current "end" overlap with a previous line? next # if there is an overlap then skip this line and process the next line of input start[++cnt] = $2 # we have a new non-overlapping line so save the start and end points end[cnt] = $3 print # print current line to stdout } ' file.txt

Second Solution (provided by @ blhsing)

awk '-F\t' '{for(s in a)if($2<=a[s]&&s<=$3)next;a[$2]=$3}1' file.txt

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.