Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Is it possible to remove duplicate rows from a text file? If yes, how?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
524 views
Welcome To Ask or Share your Answers For Others

1 Answer

Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.

This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.

@echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
  set "prev="
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    if /i "!ln!" neq "!prev!" (
      endlocal
      (echo %%A)
      set "prev=%%A"
    ) else endlocal
  )
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"

This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.

@echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
>"%deduped%" (
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    >"%line%" (echo !ln:=\!)
    >nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
    endlocal
  )
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"


EDIT

Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.

I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.


New solution 2016-04-13: JSORT.BAT

You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.

@jsort file.txt /u >file.txt.new
@move /y file.txt.new file.txt >nul

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...