  | | | -none- | -none- 2007-10-08 - By Baron Schwartz
Back Frank,
Frank Bottone wrote: > I've been having trouble with my master/slave server - recently I was > having a few repeated issues where the mysql slave would stop due to > "invalid sql syntax", but the queries executed fine on the master. I > would have to manually dig through the logs and then find the query to > manually execute on the slave, then use skip_counter to resume the > replication skipping the corrupted statement on the slave. I thought it > might be hardware related since it was only affecting the slave, so I > moved it to a different blade (both the servers are blades). > > However, today I was greeted with a nagios alert that the slave had > stopped again. This time, it seems like the relay log is definitely > corrupt. I was able to run mysqlbinlog > /dev/null on all the master > logs, none are corrupt (including the one it had read up to on the > slave). The relay log on the slave is though - it reports > "[root@(protected) mysql]# mysqlbinlog mysql02-relay-bin.010923 > /dev/null > ERROR: Error in Log_event::read_log_event(): 'read error', data_len: > 38210134, event_type: 0 > Could not read entry at offset 618730:Error in log format or read error" > > _Nothing too much different in the logs either: > > _071006 11:18:52 [Note] Slave I/O thread: connected to master > 'replica@(protected) > 4:3306', replication started in log 'mysql-bin.000104' at position > 906124600 > 071008 9:07:12 [ERROR] Error reading packet from server: Lost > connection to MySQL server during query ( server_errno=2013) > 071008 9:07:13 [Note] Slave I/O thread: Failed reading log event,
... snip ...
> their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 > 071008 12:15:33 [ERROR] Error running query, slave SQL thread aborted. > Fix the problem, and restart the slave SQL thread with "SLAVE START". We > stopped > at log 'mysql-bin.000105' position 893425700 > > > Any help or ideas tracking this down would be appreciated - I think we > are going to have to take down the production database to resync the two > and get replication going again. We mainly use the replica for backup > purposes in order to avoid downtime during the backup and in the event > of a hardware issue with the master.
No need to take down the master or re-initialize the slave, given what I've seen so far. Just tell the slave to throw away its relay logs and re-fetch from the master. From the output you showed,
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000105', MASTER_LOG_POS=893425700;
This will discard the relay logs and re-fetch them. As long as that master log hasn't been purged on the master, you might be OK.
You might want to take a look at mysql-table-checksum. Your data could be fine, but it might also be different on the slave. But there's no need to worry about it until you prove it:
http://mysqltoolkit.sourceforge.net/
Your corruption in the relay logs could be caused by any number of things -- bad network, bad hardware, software bug... You could add your voice to an outstanding bug request:
http://bugs.mysql.com/bug.php?id=25737
Hope that helps Baron
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=mysql@(protected)
|
|
 |