After waiting for Google Fusion tables to upload a largeish file (70MB) with a csv suffix, I noticed that the file was a tab separated file (Hint: head filename.csv previews the first 10 rows on the Unix command line, and head -n 20 filename.csv > testfile.csv ) and that, at the moment, Google fusion tables only likes comma separated/CSV text file uploads.

So I was wondering: what's an easy way of converting data in a tab separated file to a comma separated file? Note that the data cells may contain commas, and may also contain control characters etc.

asked 02 Jun '11, 13:11

psychemedia's gravatar image

psychemedia ♦♦
1.1k323961
accept rate: 11%


TSV to CSV is less of a misery than most conversions but the solution depends on how irritating your data is. Sed is definitely the way to go on this:

  • Try:

    sed 's/,/\\,/g' | sed 's/t/,/g'
    You might get lucky that Google Fusion's CSV parser will see the comma escape.

  • No luck... then like you I'd go:

    sed 's/,/\\,/g' | sed 's/t/","/g' | sed 's/$/"/g' | sed 's/^/"/g'

  • This will be dead if you have quotation marks in the data. In that case add

    sed 's/"/\\"/g'
    to the start of the chain.

  • You're dead now if you have " in the data. And here, the escaping becomes quite painful.

  • With luck the already escaped \t bits in the TSV data should be fine on the CSV importer. But if not you'll want a late

    sed 's/\t/\t/g'

  • To be honest, I really don't know how \, is likely to turn out inside quoted CSV. Your CSV importer may or may not need it.

And at this point you're so far down the rabbit hole that you probably want to start over.

Tab problem: You using a Mac? I despise their sed.

--

link

answered 02 Jun '11, 16:49

gudaithi's gravatar image

gudaithi
5613
accept rate: 100%

edited 02 Jun '11, 16:52

Thanks for that; just one note - on my mac (;-) I needed to escape the escape when escaping the ": sed 's/"/three_backslashes"/g'

(03 Jun '11, 12:39) psychemedia ♦♦

To answer my own question:

  • I tried using Python and the csv module but it threw an error: _csv.Error: line contains NULL byte

  • I also had a go with sed: sed 's/^/"/; s/$/"/; s/ /","/g;' origFile.tsv > newFile.csv

Originally, i tried the sed expression 's/^/"/; s/$/"/; s/\t /","/g;' but my version of sed didnlt like it; instead I replaced the t tab with ctrl-v TAB (from http://www.unix.com/unix-dummies-questions-answers/71183-why-tab-could-not-recognized-grep-sed.html )

PS By the by, whilst looking around for a solution, I came across the xlrd Python library that looks as if it can be used to read Excel spreadsheets.

link

answered 02 Jun '11, 14:02

psychemedia's gravatar image

psychemedia ♦♦
1.1k323961
accept rate: 11%

edited 02 Jun '11, 14:03

Doh - also need to escape any pre-existing " before adding more...

(03 Jun '11, 12:42) psychemedia ♦♦

Something like this, perhaps?: https://gist.github.com/1006465

(PHP, converts between two formats using an actual CSV parser)

link

answered 03 Jun '11, 16:01

hubdata's gravatar image

hubdata
1
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×1
×1
×1

Asked: 02 Jun '11, 13:11

Seen: 1,723 times

Last updated: 14 Jun, 04:03

powered by OSQA