Speeding up series of complex INSERTS 2005-05-13 - By KEVIN ZEMBOWER
Back This is my first experience working with a large database. I'm trying to insert over 320,000 record of a bibliographic database of medical articles. If anyone would like to test my system, the program is available at http://www.jhuccp.org /temp/KZ/pop2sql.pl, the SQL statements to set up the database are at http://www .jhuccp.org/temp/KZ/popline.sql and a file consisting of a header record and 1000 data records is at http://www.jhuccp..org/temp/KZ/popline.dmp.gz and must be decompressed before use. The program is run by specifying the name of the data file to be processed on the command line, such as "./pop2sql.pl popline .dmp". The only unusual module used by the program is Text::xSV.
When testing this program with 1000 data records, I was processing about 570 records per minute. This would have processed all 320,000 records in about 9 hours, which was acceptable to me. However, I started it running yesterday at about 15:40 and it's still running now, almost 20 hours later. Right now, this is the condition of the database: Database popline running on localhost Table Records Type Size abstract 220,038 MyISAM 517.9 MB abstractword 0 MyISAM 1.0 KB abstractword_article 0 MyISAM 1.0 KB article 243,817 MyISAM 66.5 MB author 164,213 MyISAM 7.2 MB author_article 418,227 MyISAM 3.6 MB clientgroup 0 MyISAM 1.0 KB community 0 MyISAM 1.0 KB corpname 2,750 MyISAM 376.7 KB corpname_article 20,477 MyISAM 181.0 KB country 0 MyISAM 1.0 KB findingstype 0 MyISAM 1.0 KB findingstype_article 0 MyISAM 1.0 KB journal 45,032 MyISAM 5.2 MB journal_article 149,919 MyISAM 1.3 MB keyword 2,109 MyISAM 73.8 KB keyword_article 2,393,673 MyISAM 20.5 MB languguage 3 MyISAM 4.1 KB popreporttopic 0 MyISAM 1.0 KB popreporttopic_article 0 MyISAM 1.0 KB 20 table(s) 3,660,258 -- 622.8 MB
As you can see, only 243,817 articles (records) have been inserted. It's currently inserting records at the rate of 104 records per minute. At this rate , it'll take another 13 hours to complete this run.
I read the article in the reference manual at 7.2.14 on the Speed of INSERT Statements. I only use INSERT and SELECT statements. I don't know which of these techniques I can apply to my program. I don't think I can insert multiple rows at a time; I need to process the input file one row at a time. Would INSERT DELAYED help here? Would it be worthwhile to rewrite my program completely to generate a series of input files that I could use LOAD DATA INFILE with? I'm guessing that the most probable speed-up might occur if I disable the indexes, load the data, then recreate the indexes with myisamchk (I can't use the DISABLE/ENABLE KEYS of MySQL 4.0 as I'm running 3.23.49). Would it be worthwhile to rewrite my program to use more than one thread at a time?
Any suggestions on what I can try? Any advice on diagnostics I can run to determine the bottleneck?
Thanks for all your help and suggestions on my problem.
-Kevin Zembower
-- -- E. Kevin Zembower Internet Systems Group manager Johns Hopkins University Bloomberg School of Public Health Center for Communications Programs 111 Market Place, Suite 310 Baltimore, MD 21202 410-659-6139
-- MySQL Perl Mailing List For list archives: http://lists.mysql.com/perl To unsubscribe: http://lists.mysql.com/perl?unsub=mysql@(protected)
|
|