FAQ

From PRObE's Wiki
Jump to: navigation, search

Below are some Frequently Asked Questions and their answers. If you feel something is missing from this page or require further assistance, please send an email with your suggestion to [email protected].

Contents

About PRObE Projects

What type of projects are the PRObE clusters for?

The PRObE systems are dedicated to systems research. In particular projects that require a large scale testbed to run. Smaller testbeds are easy to come by, but large clusters of 100 and in our case over 2020 machines that you can do what you need to with, are not that common.

The committee will be looking at proposals to make sure that:

  • They are systems research related, and not just an application trying to get cycles to compute some result
  • That they have no dependence on communication with any computers outside the cluster, except for uploading code and input data before the cluster allocations, and downloading results after the cluster experiment is complete.
  • For some examples, see our Current Projects page.

Do you have examples of projects that run on PRObE?

Yes, our Current Projects page lists all current and past projects.

Can our students use PRObE clusters for class projects?

Yes, class projects can be done on Marmot and on Nome when idle nodes are available.

Getting started and about the systems

How do I get access to PRObE machines?

In general, a Principal Investigator, e.g., a faculty member leading a research team, will need to apply for a project on Nome, Susitna, or Marmot. After the project has been approved, other collaborators can request an account and associate themselves to the specific project.

Which machines can I use?

PRObE have several clusters, each of them run a separate instance of Emulab. Marmot and Nome accounts will be granted as long as the requester for the project is qualified and the project involves the type of research supported by the PRObE project.

I got access. What's this Emulab thing, and how do I use it?

Here is information about how to get started on Nome, Susitna, and Marmot. Here are Emulab getting started and advanced tutorials. Much of the Emulab main site documentation applies to Nome, Susitna, and Marmot, including the FAQ (from which comes much of the information in this FAQ) and the Emulab knowledge base.

What kind of computers are the PRObE machines

Here are hardware overviews for Nome, Susitna, and Marmot.

Using the systems

How many nodes can I ask for?

You can ask for as many nodes as are currently available. You can click on the "Node Status" link in the "Experimentation" drop-down menu to see how many nodes are currently free. If you ask for more nodes than are currently available, your experiment will be rejected (you will receive email notification shortly after you submit your NS file to the web interface).

We urge all new Emulab users to begin with a small 3-4 node experiment so that you will become familiar with NS syntax and the practical aspects of Emulab operation.

How long will it take to swap in an experiment?

A large experiment (hundreds of nodes) will take about 10 minutes to swap in if you do not specify the node operating system in the ns file. Operating systems can be loaded on the nodes using the os_load command in ops after the experiment is swapped in or the node operating systems can be specified in the ns file with the tb-set-node-os command. Specifying the operating system in the ns file can significantly increase the experiment swap in time and the swap in will fail if even one node fails to load the operating system image unless you use the tb-set-node-failure-action option in your ns file. We suggest omitting tb-set-node-os from the ns file and using os_load in ops after the experiment is swapped in to load the desired OS image(s) in the nodes.

I need to install package X but can't access the Internet from my experiment nodes

For security reasons PRObE nodes do not have direct Internet access. The NMC keeps some common software repositories mirrored locally and there is a web proxy available for installing programs from external repositories. If you need to transfer files to the nodes from some other locations, you will need to do that through the ops nodes, and share the files to your nodes through your home directory. Email [email protected] if you are having problems installing programs in the nodes or transferring data or if you need to use the web proxy to install programs from external repositories in the nodes.

How do I install operating systems on the nodes?

We recommend using the os_load command from a shell in ops rather than specifying the operating system in your ns file. See a list of available operating system images by clicking "List ImageIDs" in the "Experimentation" drop-down menu or by running "os_load -l" in ops.

Can I install my own operating system on the nodes?

Yes. The path of least resistance by far is to customize an existing operating system image and save the customized image as your own. This procedure is described in the main Emulab tutorial.

What is experiment swapping?

Swapping is when you (or we, or the Emulab system) temporarily swaps out your experiment, releasing all of the nodes in the experiment. Your experiment is still resident in the Emulab database, and you can see its status in the web interface, but no nodes are allocated. Once an experiment is swapped out, you can swap it back in via the web interface by going to the Experiment Information page for your experiment, and clicking on the swapin option. You can also modify it.

The idle-swap checkbox in the Begin Experiment web page is used to determine what experiments can be automatically swapped by the testbed scheduling system. Note that all experiments are capable of being swapped; even if you do not check the idle-swap box, you are free to swap your own experiments as you like. The only difference is that the testbed scheduling system will not consider your experiment when looking for experiments to swap out.

Be aware that we do not currently save any files that you may have placed on your nodes. When your experiment is swapped back in, you will likely get different nodes, with fresh copies of the disk images. For that reason, you should not swap your experiment out unless you make arrangements to save and restore any state you need. Files in your home directory will persist but files on the local disks in the nodes will be lost if your experiment is swapped out.

I need more nodes than are free, what should I do?

Please send us email at [email protected] if you are not able to able to get the number of nodes you need for your experiment and we will help free up nodes if we can.

Or, you can use the Batch System to queue an interactive job. By submitting your experiment as a batch job, but without any tb-set-node-startcmd directives in your ns file, the job will be queued until nodes are available. For most experiments, this means just using your regular ns file, and checking the Batch Mode Experiment box when you create the experiment.

When your queued job is swapped in, you will be sent email to inform you, and you can start working. Please note that the experiment will be idle when it is swapped in, and will be idle swapped if you do not get things running on the nodes in a short period of time. If your experiment does get swapped out before you can get to it, you can always visit the experiment's information page and try again by using the Queue Batch Experiment menu item.

Do I get root access on my nodes?

Yes. Project leaders get root access to all of the nodes in all of the experiments that are running in their project. Project members get root if their project leader grants them root access when the leader approves the group membership request. Root privileges are granted via the sudo command and root passwords are available for all nodes. The tutorial describes this in more detail.

Can I reboot (power cycle) my nodes?

Yes. Each of the nodes is independently power controlled. If your node hangs or is otherwise unresponsive, you can use the node_reboot or rpower commands in ops.

Where do I store files needed by my experiment, what storage is available?

Each project has its own directory, rooted at /proj, which is available via NFS to all of the nodes in experiments running in that project. For example, when the "FOO" project was created, a directory called /proj/FOO was also created. This directory is owned by the project creator, and is in the unix group "FOO." Project members are encouraged to store any files needed by their experiments in the corresponding /proj directory.

If your experiment requires multi-gigabytes of data to operate, then you should consider loading the data onto the node-local disks. Each node in Nome has one 1 TB disk and each node in Marmot has one 2 TB disk. You are free to use the unused space in each disk in any way you like. Be aware that the node-local disks are erased when an experiment is swapped out.

For long term storage, you should store items in your project's /proj directory or in your home directory. The /proj directory and your home directory are available in ops and in all of the nodes.

Can I modify my experiment after creating it?

Yes. On the experiment view page, choose "Modify this Experiment". This will allow you to modify an experiment, either swapped-out or in, by editing its ns file.

If the experiment is swapped-out, Experiment Modify will simply replace its topology with the newly specified one; this new topology will be mapped when the experiment is swapped in.

If the experiment is already swapped-in, Modify will change the topology and map in the portions which have been changed. This allows dynamic addition, subtraction, and replacement of an experiment's nodes and links.

Can I schedule programs to run automatically when a node boots?

Yes. You can arrange to run a single program or script when your node boots. The script is run as the UID of the experiment creator, and is run after all other node configuration (including RPM installation) has completed. The exit status of the script (or program) is reported back and is made available for you to view in Experiment Information link in the menu at your left. The Emulab ns extension tb-set-node-startcmd is used in the ns file to specify the path of the script (or program) to run. You may specify a different program for each node in the experiment.

Can I get console access to my nodes?

Yes. Use the rcon program in ops:

no-ops> rcon no256
Connecting to serial console:
  no256
  console: con-b-no06:7016
  console ip: 10.53.0.85
Trying 10.53.0.85...
Connected to con-b-no06.0.53.10.in-addr.arpa.
Escape character is '^]'.

CentOS release 6.5 (Final)
Kernel 2.6.32-431.23.3.el6.x86_64 on an x86_64

no256.hwdown.emulab-ops.nome.nx login: 
telnet> Connection closed.
no-ops> 

Press Ctrl-] to disconnect and Ctrl-d to exit the telnet program.

Troubleshooting and common problems

How do I get help with the PRObE systems?

Send email to [email protected].

My experiment setup failed, what did I do wrong?

Please read this Emulab knowledge base entry.

I think a node is broken. What do I do?

Emulab has some automated features to detect bad nodes, but it's not always able to. If a node has problems we may need to check the hardware manually. Please report such problems to [email protected]. In the email, please include which node you are having issues with. It also helps if you explain how the problems manifest themselves and if they have occurred more than once. It is important to let us know so we can remove problem nodes from the system for analysis and repair and so they don't get used in experiments until they are functioning properly.

I am getting "Temp Resource Shortage" in Portal, what did I do wrong?

Portal has no computers available for experiments. You must run experiments in Nome.

To log in to Nome, log in to the Portal web interface at http://portal.nmc-probe.org and choose ProbeNome in the left side drop down menu. Or, you can log in to Nome (http://nome.nmc-probe.org) directly. If you have never logged directly to Nome, you will first need to reset your password in that cluster because your password does not transfer from Portal to Nome. Log in to Susitna (http://susitna.nmc-probe.org) and Marmot (http://marmot.nmc-probe.org) directly.

Swap in took a very long time, one node failed, so the experiment automatically swapped out. Can this be avoided?

Yes, omit tb-set-node-os from your experiment ns file and instead, after the swap in use os_load in ops to load the desired OS image. A sample ns file is available here in our PRObE-specific Getting Started page (section Creating your first experiment).

About the facility

Can I come and do work at the PRObE computer facility?

Yes. If you have funding to travel, the NMC has office space reserved for visitors that would like to come and spend time at the facility during their assigned time on the large cluster. Information about travelling to the NMC and staying in Los Alamos can be found on the Visiting the NMC page. For further inquires, or to arrange a visit, please contact [email protected].

Personal tools
Namespaces
Variants
Actions
Wiki Navigation