Hi,
I have some large csv files, some above 700 megabytes and i need to find and replace a part of a string. I just read about ‘sed’ and it’s simple command but how do I type the command if I want to replace “/Category” with “old/Cat egory/” without quotes. The / needs to be escaped somehow so that ‘sed’ doesn’t take it as a parameter delimiter right?
If you guys have some other suggestions on how to edit these large csv files please post them here.
Libre office works for files under 300 megabytes, but above that it becomes unresponsive.
Thanks!
On Mon 16 Sep 2013 11:16:02 PM CDT, robertot5 wrote:
Hi,
I have some large csv files, some above 700 megabytes and i need to find
and replace a part of a string. I just read about ‘sed’ and it’s simple
command but how do I type the command if I want to replace “/Category”
with “old/Cat egory/” without quotes. The / needs to be escaped somehow
so that ‘sed’ doesn’t take it as a parameter delimiter right?
If you guys have some other suggestions on how to edit these large csv
files please post them here.
Libre office works for files under 300 megabytes, but above that it
becomes unresponsive.
Thanks!
Hi
Here you go;
find ./ -type f -print -exec sed -i 's/\/Category/old\/Cat egory\//g' {} \;
–
Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE 12.3 (x86_64) GNOME 3.8.4 Kernel 3.7.10-1.16-desktop
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!
You can use any character as delimiter, so
sed -e 's@/Category@old/Cat egory/@'
will work fine.
Sorry, I typed it wrong,
I need to replace instances of “Category” with “Old\Category” so it is a backwards slash “”.
I’m getting
sed: -e expression #1, char 40: unterminated `s' command
for
sed -e 's@Category\@Old\Category\@'
In that case you can just use ‘/’ as separator, but that doesn’t matter.
But you’ll have to escape ‘’ itself (because it’s the escape character) by using ‘\’.
So this should work:
sed -e 's/Category\\/Old\\Category\\/'
or
sed -e 's@Category\\@Old\\Category\\@'
Bingo, that was it !
I reached this command
sed -ie 's@Category\\@Old\\Category\\@g' bigfile.csv
And it does the work in just a few seconds, absolutely amazing, it took me hours to find and replace with other text editors.
LINUX ROCKS !
Thank You very much.
L.E: I understand the “-i” parameter means “in-place” - That means the modifications are made in the original file correct ?
The “-e” parameter I do not fully understand. Can you please explain it to me in a few words ? I noticed a .csve file was created in the same folder as my big.csv file. Is that the result of the “-e” parameter
Oops,
I reached a csv in wich I need to replace instances of “CITROËN” with “Old\CITROEN” but the character “Ë” give me headaches. It seems to ignore it and nothing is replaced.
I tried adding the character from character selector tool but I does not work. How can I make SED understand that " Ë " ?
Well, in this case it seems to be a charset issue.
I would just try a ‘.’ instead of ‘Ë’ in the search query, that matches any character. I guess there won’t be too many other “CITRO.N” (where ‘.’ != ‘Ë’) strings in there…
Only if document character set is the compatible with current locale.
On 2013-09-20 14:06, arvidjaar wrote:
>
> wolfi323;2586356 Wrote:
>> I would just try a ‘.’ instead of ‘Ë’ in the search query, that matches
>> any character.
>
> Only if document character set is the compatible with current locale.
Is the source file coded in UTF-8?
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
Well, I thought it would work if the system’s charset is UTF-8 and the document is in a Latin variant, but it doesn’t. I just tried.
So, the best would be to change the document’s charset to UTF-8 using iconv (See “man iconv”) first, I guess.
Or change the system’s locale to that of the document, f.e. try:
LANG=C sed -e 's@CITRO.N@Old\\CITROEN@' test.csv
This line should work if the document is in Latin-1.
file -bi database.csv
returned
text/html; charset=iso-8859-1
Yes that is Latin-1, so try prepending the sed line with "LANG=C " and replace the ‘Ë’ by a ‘.’.
That did it !
Thanks !