[H] Ubuntu Folding Appliance - installation trouble

jimmiejoe

n00b
Joined
Oct 25, 2013
Messages
20
Hello.

I am trying to install the appliance on a Samsung 830 64GB SSD. It had Win 7 on it. When I initially ran RMPrepUSB, there was an error. I ran it a second time and it completed without an error.
I have connected the SSD to a Supermicro X8STi-3F with Intel 5639 and 48GB of memory.
When the script gets to init-bottom the script stalls, indicates some errors and then I get a blank screen.

Any ideas how to get the install to complete?

I had a successful experence on the H8SCM-F with 4180 using a new Patriot Supersonic Pulse USB stick.
 
I have connected the SSD to a Supermicro X8STi-3F with Intel 5639 and 48GB of memory.
When the script gets to init-bottom the script stalls, indicates some errors and then I get a blank screen.

Any ideas how to get the install to complete?
I did you try a reboot after the black screen? Did you get the same problem? Also have you had Ubuntu 12.04 running prior on this motherboard?
 
Rebooted 3 times, and get the same result. Never used Linux before.

Screenshot below:
 
Last edited:
1383623029.5505.iPicit.jpg
 
How to delete this reply? edit/delete does not provide delete
 
Last edited:
Moved to the more appropriate area, hope you can get some good help, im of no use in helping with the folding appliance.

Also the delete feature is reserved for moderators and admin staff.
 
Hmm, do you remember the error initial run of RMPrepUSB returned? [not that it matters, but would be
nice to have it]

Also, the boot error you've seen -- did it happen at the _very_ _first_ boot? [see step 14 -- http://hardforum.com/showpost.php?p=1040001858&postcount=1].

One of first-boot tasks is generation of unique filesystem and swap UUIDs -- I'd like to know if
the error happened before or after UUID generation (before or after the auto-reboot mentioned in
step 14).
In other words, have you seen the box auto-reboot at first boot or not?
 
Unfortunately I do not remember the error clearly for RMPrepUSB. I think it was something about the disk, I assumed it was because the disk was not cleared.
The second time it was claer that the disk was unmounted.

I did not see the system reboot on it own, so I do not think it made it to the auto-reboot. I rebooted is after it appeared to be stalled, so it probably has not made it to the auto-reboot. It appears the stall in the same place, and I believe the first boot also stalled in the same place, but maybe with different or more errors.

I have not seen anything after the init-bottom section in the script. Is that the last section?
 
What you've seen is only initial part of boot process -- it should normally keep going.

The fact of the error being reported 5 minutes into the boot process suggests medium
access/wear issue [though I'm unsure why we're not seeing more verbose logs].

It would be good to confirm whether the box makes it to auto-reboot point (you'd need to
prepare the drive afresh). It would also make sense to check drive's health, ideally
using vendor tools...

I'm sorry I can't offer straightforward advice but we haven't seen this kind of error in the past.
 
Got it. Thank you :)

Your suspicions were right. It seems to hang long before first-boot tasks are executed.

I think it's (in order of increasing likelihood):
(a) some weird interaction/incompatibility between Linux/SATA controller/your drive
(b) signal integrity issues (bad cable or bad connector in either: the board or the drive)
(c) worn drive

In case of (a) -- flipping AHCI on/off (not sure if you have it) in the BIOS or installing newer
spin of Linux could potentially help. I could go about prepping test image based on Ubuntu
13.10 if you like.

In case of (b) -- replacing the cable and switching to different (less used?) SATA port on
the board could help.

In case of (c) -- not too many choices (drive needs to be replaced)... would probably need
to run additional tests/diags to confirm.
 
I installed the Samsung software on the Host machine and found that it would not except the Samsung drive, as a Samsung drive. I also noticed that the controller was set as RAID instead of AHCI, which the Samsung software does not like.
Would this affect RMPrepUSB when it is putting the image on the drive?
I will need to reinstall Win7 after changing the controller to AHCI, as it will not recovery.
I will need to validate the controller setting on the target machine, but believe it is set as ACHI already.
 
To my knowledge, the mode on source machine shouldn't matter...

Any possibility of using some other Windows box to examine the drive?
Reinstalling Windows just to test the drive seems a bit ... drastic.

I'm working to spin a test 13.10 version of the image for you to try (if you're interested).
 
Thank you.
I will probably need to wait until Monday to play with this more.
I will try both as well as check the drive.
Currently waiting for work units to complete so I can reconfigure everything and continue testing.
 
Last edited:
Got home early today, will be away from the system for a week.
I put the SSD in another host system, ran RMprepUSB with the current 12.4 image and then put the drive back in the X8STi-3F system and it appear to be behaving normally, so it is either the cable or the controller, or controller mode on the original host system.

If you like I can put the 13.10 image on the X8STi-F system which also has the L5639 CPU but only 24GB of memory. If you think these are close enough hardware I can let then run for comparison of 12.4 vs 13.10
 
Last edited:
Was running RMPrepUSB on my second Samsung SSD and got what I think is the same error as the first SSD:

1383970587.2948.iPicit.jpg
 
Loaded 13.10 image.
The install seems slower than 12.4, and the GUI is very sluggish.
 
jimmiejoe said:
I put the SSD in another host system, ran RMprepUSB with the current 12.4 image and then put the drive back in the X8STi-3F system and it appear to be behaving normally, so it is either the cable or the controller, or controller mode on the original host system.
Understood...
Who knows, perhaps there was some sort of silent corruption going on on the host system -- I can't be sure.

jimmiejoe said:
If you like I can put the 13.10 image on the X8STi-F system which also has the L5639 CPU but only 24GB of memory. If you think these are close enough hardware I can let then run for comparison of 12.4 vs 13.10
Not sure if there's a lot of value in it, given that same (12.04) image eventually worked
in the problematic system. I'm still puzzled though.

jimmiejoe said:
Was running RMPrepUSB on my second Samsung SSD and got what I think is the same error as the first SSD:
Interesting... is that on the original host system? Or the one you used to prepare the
"successful" SSD (to use in X8STI-3F)? The reason I'm asking is to determine if we can
definitely isolate the issue to the original host system.

jimmiejoe said:
Loaded 13.10 image.
The install seems slower than 12.4, and the GUI is very sluggish.
MM, yes.. these new UIs are becoming too heavy for server hardware. I've seen the
same thing happen to Fedora... We may need to put the UI on a weight loss program or
replace it completely with something lighter [when we switch to 14.04].
What is the host system you prepared that drive in and which system did you try the drive in?
 
The second SSD was prepped on the second host, but the error looks the same as the first host. I am guessing that it is due to both SSDs being Win7 formatted drives to start with and appears to be giving an error in the same sector area, so it could be the boundary of the hidden system partition. RMPrepUSB saw both drives as mounted on the first attempt, and unmounted on the second attempt. The second attempt completed without error in both cases.

The first host was the X8STi-F Intel chipset system, the second host is the H8SGL-F AMD chipset system with a 6128 and 24GB.

The first SSD was from the X8STi-3F, the second SSD was from the X8STi-F. The SSDs were returned to there original systems meaning the SSD from the X8STi-3F was returned to the X8STi-3F, and the SSD from the X8STi-F was returned to the X8STi-F, after being prepped in the H8SGL-F instead of the X8STi-F.

12.4 is on the X8STi-3F and 13.10 is on the X8STi-F.
 
I have started to setup the H8DCL-6F with 4284s and 24GB with the 12.4 image. It appears to take much longer to run through the start-up. Is this due to it being a 2P, or because the it is trying to OC or some other reason.
I am definitely see more lines of text than on the other 3 installs. The drive is connected to the embedded LSI SAS controller and backplane instead of the AMD or Intel chipset as on the other two harddrive based installs.
 
The second SSD was prepped on the second host, but the error looks the same as the first host. I am guessing that it is due to both SSDs being Win7 formatted drives to start with and appears to be giving an error in the same sector area, so it could be the boundary of the hidden system partition. RMPrepUSB saw both drives as mounted on the first attempt, and unmounted on the second attempt. The second attempt completed without error in both cases.

The first host was the X8STi-F Intel chipset system, the second host is the H8SGL-F AMD chipset system with a 6128 and 24GB.

The first SSD was from the X8STi-3F, the second SSD was from the X8STi-F. The SSDs were returned to there original systems meaning the SSD from the X8STi-3F was returned to the X8STi-3F, and the SSD from the X8STi-F was returned to the X8STi-F, after being prepped in the H8SGL-F instead of the X8STi-F.

12.4 is on the X8STi-3F and 13.10 is on the X8STi-F.
Understood. I suppose it's then reasonable to hope that the "boot-up hang" issue has been
mitigated by switching to another host system.

jimmiejoe said:
I have started to setup the H8DCL-6F with 4284s and 24GB with the 12.4 image. It appears to take much longer to run through the start-up. Is this due to it being a 2P, or because the it is trying to OC or some other reason.
I am definitely see more lines of text than on the other 3 installs. The drive is connected to the embedded LSI SAS controller and backplane instead of the AMD or Intel chipset as on the other two harddrive based installs.
Ubuntu startup is designed to be asynchronous whenever possible (ordering is applied only
on as-needed basis -- if two items have no dependencies, they start in parallel).

When X (GUI) startup (path) takes longer to execute you'll end up seeing more messages
in the console.

Does the H8DCL-6F system also run on an SSD or regular HDD?
 
Last edited:
Looks like my reply a few days ago did not stick.

The H8DCL-6F is using a 3.5" hard drive.
 
Looks like all of my rigs regardless of core count are getting A3 WU.

When FAH -clientonly loaded I only saw -Verbosity 9 -smp as additional options. I was under the impression that the install would automatically add the setting for bonus work. Did I miss something?

What is the latest core count required for bonus WU? I am currently running from 6 - 16 cores and will bring on the 24 core machine later.
 
you can edit the config file to add in the -bigadv tag, also change the package size to 'big'

however, you'd want to make sure the systems can turn in 8101 in time. i'm not sure 2 L5639's would cut it at stock clocks. not sure about the 4284's, but 16 cores of socket F need minimum 2.8ghz, so you might be fine with 16 cores at 3.0ghz. i'm seriously concerned about the L5639's though.
 
two L5639's at stock will not make 8101 , it would be close, last I knew you need dual Hex (For 1366) at ~2.8 Ghz (all 24 threads) to complete bigadv within time.
 
The 24 core is a H8DGU-F with 6174s.

Is there a good gage to determine if the bonus WU will complete based on A3 WU, or do you need to run the A5 or A6 themselves to determine if it will make it or not?
 
i once heard that if your machine can do ~60k on SMP WU that you should be able to do bigadv, not sure how accurate that is though.

if you jumped in the IRC, there are people that could help you set up a test WU so you could find out real easy if it would make deadline or not.

with my first bigadv machine, i just did it and it grabbed an 8101 right off the bat. after 3% i was sitting at just north of 30mins so it was good to go. so if you don't feel like joining IRC (don't listen to anything m33pm33p says) you could just configure for bigadv, fire it up and see what happens.
 
(don't listen to anything m33pm33p says)


Truer words have never been spoken :-P ..... But ya, see if you can join the IRC, that m33p fella will hook ya up and talk you into it regardless.
 
The installation is supposed to configure -bigadv flag automatically...

Is it possible the hard drive was first booted in a machine with less than 24 cores/threads
and then was moved to current machine? (that would explain the lack of flag)

EDIT: can you see how many CPUs are reported on the dual 6174 box when you run
Code:
grep -c cpu[0-9] /proc/stat

In any event, adding the -bigadv flag by means of ./fah6 -configonly (be sure to stop the client
first -- pkill fah6) should get you going.

To answer your question, dual 6174 should be able to make preferred (bonus) deadline
of the most demanding BA unit (8101) (estimating TPF at sub-30 minutes)

Hope this helps :)
 
Last edited:
I finally got around to doing pkill and adding the -bigadv flag, however, I got an error message after the restart, but looks like it is still running. Do I follow the error instruction or ignore it?

I tried to post a pic of the error but do not seem to have the ability any longer.

thekraken: Warning: looks like current WU failed to start
thekraken: please stop the client, delete machine dependent.dat, queue.dat and wo
rk/ directory,
 
Last edited:
This is a common symptom of missing -smp parameter. Is it possible you didn't retain it
while configuring the client?

See a transcript of a sample reconfiguration session below. Stuff in bold was entered by me
from keyboard:

Code:
lappy 06:35 [2502]:~/fah$ [b]./fah6 -configonly[/b]

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Folding@Home User Configuration



--- Opening Log file [December 31 05:36:01 UTC] 


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/testuser/fah
Executable: ./fah6
Arguments: -configonly -verbosity 9 -smp

[05:36:01] - Ask before connecting: No
[05:36:01] - Proxy: 127.0.0.1:8880
[05:36:01] - User name: tear (Team 33)
[05:36:01] - User ID: XXXXXXXXXXXXXXXX
[05:36:01] - Machine ID: 1
[05:36:01] 
[05:36:01] Configuring Folding@Home...

User name [tear]? 
Team Number [33]? 
Passkey [xxxxxxxxxxxxxxxx]? 
Ask before fetching/sending work (no/yes) [no]? 
Use proxy (yes/no) [yes]? 
Proxy Name [127.0.0.1]? 
Proxy Port [8880]? 
Use username & password with proxy [no]? 
Acceptable size of work assignment and work result packets (bigger units
 may have large memory demands) -- 'small' is <5MB, 'normal' is <10MB, and
 'big' is >10MB (small/normal/big) [big]? 
Change advanced options (yes/no) [no]? [b]yes[/b]
Core Priority (idle/low) [idle]? 
Disable highly optimized assembly code (no/yes) [no]? 
Interval, in minutes, between checkpoints (3-30) [15]? 
Memory, in MB, to indicate (7800 available) [7800]? 
Set -advmethods flag always, requesting new advanced
  scientific cores and/or work units if available (no/yes) [yes]? 
Ignore any deadline information (mainly useful if
 system clock frequently has errors) (no/yes) [yes]? 
Machine ID (1-16) [1]? 
The following options require you to restart the client before they take effect
Disable CPU affinity lock (no/yes) [no]? 
Additional client parameters. Use space to clear. [-verbosity 9 -smp]? [b]-verbosity 9 -bigadv -smp[/b]
IP address to bind core to (for viewer) []? 

[05:36:41] - Ask before connecting: No
[05:36:41] - Proxy: 127.0.0.1:8880
[05:36:41] - User name: tear (Team 33)
[05:36:41] - User ID: XXXXXXXXXXXXXXXX
[05:36:41] - Machine ID: 1
[05:36:41] 
[05:36:41] -configonly flag given, so exiting.
Terminated
 
Hello.

I have started running the server again now that it has cooled off.

I am getting a consistence failure returning work after it has complete. I have waited for 28 failure before restarting the server at which point it appears to sent the work with out issue and pull new work.

Any ideas? This is on the H8DGU-F only, all of the other server seem to run without issue.
 
Have you enabled Langouste/proxy?

If so, then failures are expected. Langouste makes the client think upload failed while
performing actual upload in the background. Can you check EOC stats if you were credited
for these 28 or so units?

In any event, can you post relevant part of the log? (at least one unit from download through
failed upload(s)).

Also, getting output of:
Code:
ls -l /dev/shm/langouste-*/
will shed add'l light on the matter -- it will tell us if Langouste has been active since last boot.
 
Back
Top