I lost a WU but I dont know why

This forum is for discussions about the Motherboards.org Folding team. What is folding? Venture on in for a look.

Moderator: The Mod Squad

I lost a WU but I dont know why

Postby Daft Ada » Sat Feb 16, 2008 11:22 am

This is a excerpt from my log:

[18:10:25] Writing local files
[18:10:27] Completed 114714 out of 400000 steps (29)
[18:10:28] Extra SSE boost OK.
[18:30:08] Writing local files
[18:30:09] Completed 116000 out of 400000 steps (29)
[19:37:44] Writing local files
[19:37:44] Completed 120000 out of 400000 steps (30)

--- Opening Log file [February 15 20:21:57]

# Windows Console Edition #####################################################
###############################################################################

Folding@Home Client Version 6.01beta2

http://folding.stanford.edu

###############################################################################
###############################################################################

[20:21:57] - Ask before connecting: No
[20:21:57] - User name: Its_Only_Me (Team 33258)
[20:21:57] - User ID: E0CC02E41972C48
[20:21:57] - Machine ID: 1
[20:21:57]
[20:21:58] Loaded queue successfully.
[20:21:58]
[20:21:58] + Processing work unit
[20:21:58] Core required: FahCore_78.exe
[20:21:58] Core found.
[20:21:58] Working on Unit 02 [February 15 20:21:58]
[20:21:58] + Working ...
[20:22:03]
[20:22:03] *------------------------------*
[20:22:03] Folding@Home Gromacs Core
[20:22:03] Version 1.90 (March 8, 2006)
[20:22:03]
[20:22:03] Preparing to commence simulation
[20:22:03] - Ensuring status. Please wait.
[20:22:20] - Looking at optimizations...
[20:22:20] - Working with standard loops on this execution.
[20:22:20] - Previous termination of core was improper.
[20:22:20] - Files status OK
[20:22:26] - Expanded 1625539 -> 8365857 (decompressed 514.6 percent)
[20:22:27] - Checksums don't match (work/wudata_02.xtc)
[20:22:29] - Starting from initial work packet

[20:22:29]
[20:22:29] Project: 2451 (Run 165, Clone 8, Gen 2)
[20:22:29]
[20:22:30] Entering M.D.
[20:22:43] Protein: p2451_Fragment-Receptor HDQ
[20:22:43]
[20:22:43] Writing local files
[20:22:48] Writing local files
[20:22:48] Completed 0 out of 400000 steps (0)
[23:37:07] Writing local files
[23:37:09] Completed 4000 out of 400000 steps (1)
[03:09:07] Writing local files
[03:09:07] Completed 8000 out of 400000 steps (2)


It was shutdown (by me) for about an hour but that's nothing abnormal :?

I am right to suspect the bolded bit in the log as the culprit? If so, is this common? :(
Daft Ada
Black Belt 1st Degree
Black Belt 1st Degree
 
Posts: 1358
Joined: Sun Nov 12, 2006 11:03 am

Postby evasive » Sat Feb 16, 2008 3:38 pm

Previous termination of core was improper.
leaves nothing to the imagination here...
We hate rut, but we fear change.
********************************
System error, strike any user to continue...
evasive
Mobo-fu Master
Mobo-fu Master
 
Posts: 37389
Joined: Sun May 06, 2001 12:01 am
Location: Netherlands

Postby Daft Ada » Sun Feb 17, 2008 3:59 am

evasive wrote:
Previous termination of core was improper.
leaves nothing to the imagination here...


Thanks, I didn't see that before.

But I'd still like to know what went wrong, it took about 2 days to get to 30%, then it just dies :?
Daft Ada
Black Belt 1st Degree
Black Belt 1st Degree
 
Posts: 1358
Joined: Sun Nov 12, 2006 11:03 am

Postby Pette Broad » Sun Feb 17, 2008 5:53 am

It happens from time to time for a number of reasons. Mainly it's because the machine was shut down incorrectly or lost power, not in this case though. It also happens if you close folding by using the X, it needs to be closed by Ctl-C.....and sometimes it just happens for no apparent reason. Looking at your log, you need to close down folding and restart as you are in standard loops and it'll take an age like that :( . I use the -forceasm flag which forces the assembly methods, bypassing the check that folding does on restart. I don't know which file system you use, but NTFS handles improper shutdowns much better than Fat32, probably over 90% recover O.K, with Fat32 its likely to be less than 20%.

Pete
Image
Pette Broad
Black Belt 5th Degree
Black Belt 5th Degree
 
Posts: 5491
Joined: Tue Jul 10, 2001 12:01 am
Location: Flintshire, U.K

Postby Karlsweldt » Sun Feb 17, 2008 5:56 am

A WU or any other intensive process has to be shut down in stages.. first the process halted or paused, then final computations for that stage finalized.. then the data cached. Then the final process steps are written as a hard file. Now the process can be terminated entirely.

Likely a very brief "blip" on the power mains caused a restart.. and any active processes were not recorded. Has happened to me many times!
The PSU in any computer has less than 1/2 second of reserve storage power. Where most electronic devices may ignore this brief "blip", a computer would be affected by it.. and likely restart.
It would also occur if you did a hard reset after a freeze or BSOD.
F@H.. to solve mankind's maladies.. in our lifetimes!
Karlsweldt
Mobo-fu Master
Mobo-fu Master
 
Posts: 20690
Joined: Wed Nov 12, 2003 11:57 am
Location: 07438

Postby Daft Ada » Sun Feb 17, 2008 6:12 am

Pette Broad wrote:It happens from time to time for a number of reasons. Mainly it's because the machine was shut down incorrectly or lost power, not in this case though. It also happens if you close folding by using the X, it needs to be closed by Ctl-C.....and sometimes it just happens for no apparent reason. Looking at your log, you need to close down folding and restart as you are in standard loops and it'll take an age like that :( . I use the -forceasm flag which forces the assembly methods, bypassing the check that folding does on restart. I don't know which file system you use, but NTFS handles improper shutdowns much better than Fat32, probably over 90% recover O.K, with Fat32 its likely to be less than 20%.

Pete


I'm running Windows Vista with NTFS. Folding is running as a service. I'll have to check on the -forceasm setting, I'm not sure about that at the moment.
Daft Ada
Black Belt 1st Degree
Black Belt 1st Degree
 
Posts: 1358
Joined: Sun Nov 12, 2006 11:03 am

Postby wullger » Mon Feb 18, 2008 1:04 pm

Thanks for bringing up that tip:) I noticed my smp machines had been rebooted, so went ahead and added that flag before I restarted; every little performance gain helps!
wullger
Initiate
Initiate
 
Posts: 60
Joined: Wed Oct 17, 2007 3:36 pm

Postby Karlsweldt » Tue Feb 19, 2008 5:51 am

There is a termination process that closes all programming and features whenever a reboot or shut-down is enacted.. but not present whenever a power interruption occurs.

If a system is rebooted manually, or automatically by update processes, then there should be no problem with files/programs becoming corrupt. But when very minor power glitches enter the equation, the system will reboot due to loss of stable power.. which only a 'line-tamer' or UPS can alleviate.

Uncommon to note a power "blip" during heavy-use periods.. the lines are loaded, and surges are absorbed into the system (mains). But during light-load periods, very brief surges may cause voltage levels to fall very briefly.. by 20% below normal, which can trigger the PSU to loose its power conversion process.. and loss of mobo power.
If the problem is frequent, then notify the utility company of the incidents.. and request a monitoring recorder on your lines, to gather evidence of those surges that shouldn't be present.

The primary root of the problem isn't from the power mains.. it is in the storage capacity of the PSU's main capacitors. If doubled in capacity, they should have at least one full second of reserve.. virtually eliminating most minor mains glitches.
F@H.. to solve mankind's maladies.. in our lifetimes!
Karlsweldt
Mobo-fu Master
Mobo-fu Master
 
Posts: 20690
Joined: Wed Nov 12, 2003 11:57 am
Location: 07438


Return to Motherboards.org Folding Team

Who is online

Users browsing this forum: No registered users and 1 guest