cURL / Mailing Lists / curl-users / Single Mail

curl-users

Smart strategy to write million of files in OSX with bash curl?

From: Rodrigo Zanatta Silva <rodrigozanattasilva_at_gmail.com>
Date: Mon, 5 Jan 2015 14:15:22 -0200

Hi. I am using the bash curl with max capacity than possible.

So, I create about 150 (or more if possible) script bash with a list of
curl command and open all of then at same time (so I have 150 threads
working). It work. Maybe it fail to download or write some file, but I can
run another script and check if the file exist and download it again.

But.. There are time I will only download small html file (1k) but a really
big number of them.

My problem is: This can really make a mess in my system. I DAMAGE the
partition of an HFS HD when I was working with my old macbook with OSX Lion
(10.7.5) (I need to format the HD because the the mac program can't fix it,
but it don't like to be hardware problem because the hd is working now).

I thought that using the new OSX can be bether and using my principal
computer with OS X Yosemite (10.10.1).

After I write about 1 million of files, the finder was really slow in ALL
system (not only in the folder with the files). I disable the indexing in
this folder.

Now I don't know what strategy I can use. This is some I was thinking and
don't know how to do:

   - Write all results in one file (Sometime ago I tried make bash write in
   one file from various threads and it failed miserably, need to use more
   complex strategy with file lock to do it)
   - Write all output of one thread in one file (so it will create 150
   files)
      - In this strategy, how can I write "<filename>content
      <otherfilename>content..."
   - Write every file in disk but use some tool to not make it affect the
   system.
   - buffer in memory and write in disk time to time
      - Is there an easy way to do this?

Any idea? Maybe the internet is the slower part in system, so even if I
lose some time writing, the cost is not so big at all.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-01-05