I have tried numerous scripts and approaches to clean a large text file before importing into MS Access.
The text file is 500k+ lines. Some lines contain 'carriage returns' or 'line breaks'. These are displayed as square symbols in Notepad. (Interestingly in Windows XP they are squares, but in Windows 2003 they don't appear in Notepad but do break text onto the next line/row.
Each field should have no occurrences of these. Therefore I need a way of removing all of these from the file.
Example of text file contents:
FIELD_NAME1|FIELD_NAME2 |FIELD_NAME3
John |He likes food |1002
Jake |He eats food |1004
Jake |He eats food and [][] likes swimming|1003
1) One solution was to read through the file and repair rows. However difficulty in getting this to work. Typically you only realise the row is erroneous based on errors in following rows.
2) Another is to split the text file into smaller bits. Then use find and replace. Once cleansed - stick back together into MS Access.
Is there a simple way to do this?
This task only has to be run a couple of times so automation is not crucial.
Analysis output added by dmuk and then editted by Tony Dallimore
See my (Tony Dallimore) answer for an explanation of this analysis output. I had not expected such long string of control characters (caused by, for example, 44 blank lines) to be found. I have wrapped these long strings in column 1 to improve readability.
String ? | ? ? ? File ? ?| ? ? ? Line ? ?| ? ? ? File ? ?| ? ? ? Line
?13 10 ? | ? ? ? 1 ? ? ? | ? ? ? 1 ? ? ? | ? ? ? 376 ? ? | ? ? ? 626
?9 ? ? ? | ? ? ? 1 ? ? ? | ? ? ? 2299 ? ?| ? ? ? 375 ? ? | ? ? ? 3524
?9 9 ? ? | ? ? ? 3 ? ? ? | ? ? ? 6106 ? ?| ? ? ? 67 ? ? ?| ? ? ? 6111
?9 9 9 9 ? ? | ? ? ? 6 ? ? ? | ? ? ? 1916 ? ?| ? ? ? 53 ? ? ?| 1492
?9 9 9 ? | ? ? ? 6 ? ? ? | ? ? ? 1917 ? ?| ? ? ? 53 ? ? ?| ? ? ? 1493
?9 9 9 9 9? ? ?| ? ? ? 42 ? ? ?| ? ? ? 1266 ? ?| ? ? ? 42 ? ? ?| 1266
?10 ? ? | ? ? ? 69 ? ? ?| ? ? ? 1524 ? ?| ? ? ? 240 ? ? | ? ? ? 4885
?10 10 ? | ? ? ? 69 ? ? ?| ? ? ? 3577 ? ?| ? ? ? 222 ? ? | ? ? ? 4651
?13 10 13 10 ? | ? ? ? 71 ? ? ?| ? ? ? 3697 ? ?| ? ? ? 374 ? ? | 3258
?13 10 10 ? ? ?| ? ? ? 80 ? ? ?| ? ? ? 5440 ? ?| ? ? ? 240 ? ? | 4166
?13 10 13 10 13| 81 | 2657 | 290 | 2094
10 13 10 ? ? ?| | | |
?13 10 13 10 13| 81 | 2662 | 215 | 1802
10 | | | |
13 10 13 10 10| ? ? ? 86 ? ? ?| ? ? ? 2082 ? ?| ? ? ? 86 ? ? ?| 6914
10 10 10 | 88 | 1314 | 221 | 4754
?9 10 ? | ? ? ? 94 ? ? ?| ? ? ? 246 ? ? | ? ? ? 94 ? ? ?| ? ? ? 246
?13 10 13 10 13| 126 | 1699 | 126 | 1699
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
?13 10 13 10 13| 143 | 2078 | 143 | 2078
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 | | | |
?10 10 10 10? ?| ? ? ? 182 ? ? | ? ? ? 1846 ? ?| ? ? ? 188 ? ? | 2663
10 10 10 10 10| 195 | 3320 | 195 | 3320
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10? ?| | | |
?13 10 13 10 13| 198 | 4223 | 198 | 4223
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 ? | ? ? ? 198 | ? ? ? 4223 ? ?| ? ? ? 198 ? ? | ? ? ? 4223
?10 10 10 10 10| 213 | 5449 | 213 | 5449
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 10 10 10 10| | | |
10 ? ?| | | |
?13 10 13 10 13| 278 | 788 | 278 | 788
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 13| | | |
10 13 10 13 10| | | |
13 10 13 10 ? | | | |
See Question&Answers more detail:os