Mailing List
Home
Forum Home
MySQL General - General MySQL discussion
MaxDB - Everything about MaxDB, formerly known as SAP DB
MySQL++ - Programming with the C++ API to MySQL
ODBC - ODBC with the MySQL Connector/ODBC driver
MySQL on Win32 - Runing MySQL on Windows 9x/Me/NT/2000/XP
Java Help - Mostly related to the MySQL Connector/J driver
Perl - Perl support for MySQL with DBI and DBD::mysql
GUI - MySQL GUI Tools
Announcement
Subjects
mysql openssl Question
mysql:it 's a db not a dbms how it 's possible?!
Does the binary log enabling affect the MySQL performances?
Strange behavior, Table Level Permission
FULLTEXT query format question
Preventing Duplicate Entries
Comparing and writing out BLOBS
Executing MySQL Commands From Within C Program
Can 't access mysql after kernel upgrade
Mysql 4 0 Oracle Stored Procedure Trigger Conversion
Downgrade Mysql from 4 to 3 23
MySQL Cluster Software
mysql test 4 1 fails with the gis test
ERROR 2002: Can 't connect to local MySQL server through socket
Getting Identity after INSERT
Update one field with more fields from another table
ERROR 1045: Access denied for user: 'root@localhost ' (Using
password: NO)
mysql have same function mthod as Oracle decode()
 
Speeding up series of complex INSERTS

Speeding up series of complex INSERTS

2005-05-13       - By KEVIN ZEMBOWER

 Back
Reply:     1     2     3  

This is my first experience working with a large database. I'm trying to insert
over 320,000 record of a bibliographic database of medical articles. If anyone
would like to test my system, the program is available at http://www.jhuccp.org
/temp/KZ/pop2sql.pl, the SQL statements to set up the database are at http://www
.jhuccp.org/temp/KZ/popline.sql and a file consisting of a header record and
1000 data records is at http://www.jhuccp..org/temp/KZ/popline.dmp.gz and must
be decompressed before use. The program is run by specifying the name of the
data file to be processed on the command line, such as "./pop2sql.pl popline
.dmp". The only unusual module used by the program is Text::xSV.

When testing this program with 1000 data records, I was processing about 570
records per minute. This would have processed all 320,000 records in about 9
hours, which was acceptable to me. However, I started it running yesterday at
about 15:40 and it's still running now, almost 20 hours later. Right now, this
is the condition of the database:
Database popline running on localhost
Table   Records Type Size
abstract    220,038   MyISAM    517.9 MB
abstractword    0   MyISAM    1.0 KB
abstractword_article    0   MyISAM    1.0 KB
article    243,817   MyISAM    66.5 MB
author    164,213   MyISAM    7.2 MB
author_article    418,227   MyISAM    3.6 MB
clientgroup    0   MyISAM    1.0 KB
community    0   MyISAM    1.0 KB
corpname    2,750   MyISAM    376.7 KB
corpname_article    20,477   MyISAM    181.0 KB
country    0   MyISAM    1.0 KB
findingstype    0   MyISAM    1.0 KB
findingstype_article    0   MyISAM    1.0 KB
journal    45,032   MyISAM    5.2 MB
journal_article    149,919   MyISAM    1.3 MB
keyword    2,109   MyISAM    73.8 KB
keyword_article    2,393,673   MyISAM    20.5 MB
languguage    3   MyISAM    4.1 KB
popreporttopic    0   MyISAM    1.0 KB
popreporttopic_article    0   MyISAM    1.0 KB
20 table(s)   3,660,258  --  622.8 MB

As you can see, only 243,817 articles (records) have been inserted. It's
currently inserting records at the rate of 104 records per minute. At this rate
, it'll take another 13 hours to complete this run.

I read the article in the reference manual at 7.2.14 on the Speed of INSERT
Statements. I only use INSERT and SELECT statements. I don't know which of
these techniques I can apply to my program. I don't think I can insert multiple
rows at a time; I need to process the input file one row at a time. Would
INSERT DELAYED help here? Would it be worthwhile to rewrite my program
completely to generate a series of input files that I could use LOAD DATA
INFILE with? I'm guessing that the most probable speed-up might occur if I
disable the indexes, load the data, then recreate the indexes with myisamchk (I
can't use the DISABLE/ENABLE KEYS of MySQL 4.0 as I'm running 3.23.49). Would
it be worthwhile to rewrite my program to use more than one thread at a time?

Any suggestions on what I can try? Any advice on diagnostics I can run to
determine the bottleneck?

Thanks for all your help and suggestions on my problem.

-Kevin Zembower

-- --
E. Kevin Zembower
Internet Systems Group manager
Johns Hopkins University
Bloomberg School of Public Health
Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD  21202
410-659-6139

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe:    http://lists.mysql.com/perl?unsub=mysql@(protected)