Next: 6 Initialization Up: 2 Detailed description of Previous: 4 Files and Program Contents

Subsections

5 Shell scripts for running programs

1 Job control (c-shell scripts)

In order to run WIEN2k several c-shell scripts are provided which link the individual programs to specific tasks.

All available (user-callable) commands have the ending _lapw so you can easily get a list of all commands using

ls $WIENROOT/_lapw

in the directory of the WIEN2k executables. (Note: all of the more important commands have a link to a short name omitting ``_lapw''.) All these commands have at least one option, -h, which will print a small help indicating purpose and usage of this command.

1 Main execution script (x_lapw)

The main WIEN2kscript, x_lapw or x, executes a single WIEN2kprogram. First it creates the corresponding program.def-file, where the connection between Fortran I/O-units and filenames are defined. One can modify its functionality with several switches, modifying file definitions in case of spin-polarized or complex calculations or tailoring special behaviour. All options are listed with the help switch

x -h or x_lapw -h

With some of the options the corresponding input files may be changed temporarely, but are set back to the original state upon completion.

USAGE:  x PROGRAMNAME [flags] 

PURPOSE:runs WIEN executables:  afminput,aim,arrows,broadening,cif2struct,
      clmaddsub,clmcopy,clminter,dipan,dstart,eosfit,eosfit6,filtvec,init_xspec,
      hex2rhomb,irrep,joint,kgen,kram,lapw0,lapw1,lapw2,lapw3,lapw5,lapw7,      
      lapwdm,lapwso,lcore,lorentz,lstart,mini,mixer,nn,pairhess,plane,qtl,      
      optic,optimize,orb,rhomb_in5,sgroup,spaghetti,struct_afm_check,sumpara,   
      supercell,symmetry,symmetso,telnes3,tetra,txspec,xspec, dmftproj          

FLAGS:
-f FILEHEAD ->  FILEHEAD for path of struct & input-files
-t/-T ->        suppress output of running time
-h/-H ->        help
-d    ->        create only the def-file 
-up   ->        runs up-spin
-dn   ->        runs dn-spin
-du   ->        runs up/dn-crossterm                     
-sc   ->        runs semicore calculation
-c    ->        complex calculation (no inversion symmetry present)
-p    ->        run lapw1/2/so in parallel (needs .machines file)
-orb  ->        runs lapw1 with LDA+U/OP or B-ext correction 
-it   ->        runs lapw1 with iterative diagonalization
-noHinv   ->    runs lapw1 with iterative diag. without  Hinv               
-noHinv0  ->    runs lapw1 with iterative diag. writing new Hinv            
-nohns->        runs lapw1 without HNS
-nmat_only->    runs lapw1 and yields only the matrixsize
-in1orig  ->    runs lapw2 but does not modify case.in1
-emin X ->      runs lapw2 with EMIN=X (in bin9_blaha.in2)
-all X Y ->     runs lapw2 with ALL and E-window X-Y (in bin9_blaha.in2)
-qtl  ->        calculates QTL in lapw2
-alm  ->        calculates ALM,BLM in lapw2
-almd  ->       calculates ALM,BLM in lapw2 for DMFT (Aichhorn/Georges/Biermann)
-qdmft ->       calculates charges including DMFT (Aichhorn/Georges/Biermann)
-help_files ->  creates case.helpXX files in lapw2
-vresp->        creates case.vrespval (for TAU/meta-GGA) in lapw2
-eece ->        for hybrid-functionals (lapw0,lapw2,mixer,orb,sumpara)
-grr  ->        lapw0 for grad rho/rho (using SRC.in0_grr with ixc=50)
-band ->        for bandstructures: unit 4 to 5 (in1), sets QTL and ROOT (in2)
-fermi->        calculates Fermi energy and weights in lapw2
-efg  ->        calculates lapw2 with EFG switch
-so   ->        runs lapw2 with def-file for spin-orbit calculation
-fbz   ->       runs kgen and generates a full mesh in the BZ
-fft  ->        runs dstart only up to case.in0_std creation
-super->        runs dstart and creates new_super.clmsum (and not case.clmsum)
-sel  ->        use reduced vector file in lapw7
-settol 0.000x -> run sgroup with different tolerance
-sigma->        run lstart with case.inst_sigma (autogenerated) for diff.dens.
-rxes->         run tetra using case.rxes weight file for RXES-spectroscopy.
-rxesw E1 E2->  run tetra and create case.rxes file for RXES for energies E1-E2
-delta->        run arrows program with difference between two structures
-lcore->        runs dstart with SRC.rsplcore (produces SRC.clmsc)
-copy ->        runs pairhess and copies .minpair to .minrestart and .minhess
-telnes  ->     run qtl after generating case.inq based on case.innes
USE: x -h PROGRAMNAME   for valid flags for a specific program

Note: To make use of a scratch file system, you may specify such a filesystem in the environment variable SCRATCH (it may already have been set by your system administrator). However, you have to make sure that there is enough disk-space in the SCRATCH directory to hold your case.vector* and case.help* files.

2 Job control for initialization (init_lapw)

In order to start a new calculation, one should make a new subdirectory and run all calculations from there. At the beginning one must provide at least one file (see 3), namely case.struct (see 4.3) (case.inst can be created automatically on the ``fly'', see 6.4.3), then one runs a series of programs using init_lapw. This script is described briefly in chapter 4.5) and in detail in ``Getting started'' for the example TiC (see chapter 3). You can get help with switch -h. All actions of this script are logged in short in :log and in detail in the file case.dayfile, which also gives you a ``restart'' option when problems occurred. In order to run init_lapw starting from a specific point on, specify -s PROGRAM.

Ignoring ERRORS and in many cases also WARNINGS during the execution of this script, most likely will lead to errors at a later stage. Neglecting warnings about core-leakage creates .lcore, which directs the scf-cycle to peform a superposition of core densities.

init_lapw supports switch -b, a ``batch'' mode (non-interactive) for trivial cases AND experienced users. You can supply various options and specify spin-polarization, XC-potential, RKmax, k-mesh or mixing. See init_lapw -h for more details. Changes to case.struct by nn will be accepted, but by sgroup will be neglected. Please check the terminal output for ERRORS and WARNINGS !!!

3 Job control for iteration (run_lapw or runsp_lapw)

In order to perform a complete SCF calculation, several types of scripts are provided with the distribution. For the specific flow of programs see chapter 4.5.

For non-spinpolarized calculations use: run_lapw,
for spin-polarized calculations use: runsp_lapw.
for antiferromagnetic calculations use: runafm_lapw
for FSM (fixed-spin moment) calculations use: runfsm_lapw
for a spin-polarized setup, where you want to constrain the moment to zero (e.g. for LDA+U calculations) use: runsp_c_lapw

Cases with/without inversion symmetry and with/without semicore or core states are handled automatically by these scripts. All activities of these scripts are logged in short in :log (appended) and in detail together with convergence information in case.dayfile (overwriting the old ``dayfile``). You can always get help on its usage by invoking these scripts with the -h flag.

run_lapw -h

PROGRAM:        /zeus/lapw/WIEN2k/bin/run_lapw

PURPOSE:        running the nonmagnetic scf-cycle in WIEN
                to be called within the case-subdirectory
                has to be located in WIEN-executable directory

USAGE:          run_lapw [OPTIONS] [FLAGS]

OPTIONS:
-cc LIMIT ->    charge convergence LIMIT (0.0001 e)
-ec LIMIT ->    energy convergence LIMIT (0.0001 Ry)
-fc LIMIT ->    force  convergence LIMIT (1.0 mRy/a.u.)
                default is -ec 0.0001; multiple convergence tests possible
-e PROGRAM ->   exit after PROGRAM ()
-i NUMBER ->    max. NUMBER (40) of iterations
-s PROGRAM ->   start with PROGRAM ()
-r NUMBER ->    restart after NUMBER (99) iterations (rm *.broyd*)
-nohns NUMBER ->do not use HNS for NUMBER iterations 
-in1new N ->    create "new" in1 file after N iter (write_in1 using scf2 info) 
-ql LIMIT ->    select LIMIT (0.05) as min.charge for E-L setting in new in1
-qdmft NP ->    including DMFT from Aichhorn/Georges/Biermann running on NP proc

FLAGS:
-h/-H ->        help
-I    ->        with initialization of in2-files to "TOT" 
-NI   ->        does NOT remove case.broyd*  (default: rm *.broyd* after 60 sec)
-p    ->        run k-points in parallel (needs .machine file [speed:name])
-it   ->        use iterative diagonalizations 
-it1  ->        use iterative diag. with recreating H_inv (after basis change)
-it2  ->        use iterative diag. with reinitialization (after basis change)
-noHinv   ->    use iterative diag. without H_inv
-vec2pratt ->   use vec2pratt instead of vec2old for iterative diag.
-so   ->        run SCF including spin-orbit coupling
-renorm->       start with mixer and renormalize density
-in1orig->      if present, use case.in1_orig file; do not modify case.in1

CONTROL FILES:
.lcore          runs core density superposition producing case.clmsc
.stop           stop after SCF cycle
.fulldiag       force full diagonalization
.noHinv         remove case.storeHinv files
case.inm_vresp  activates calculation of vresp files for meta-GGAs
case.in0_grr    activates a second call of lapw0 (mBJ pot., or E_xc analysis)

ENVIRONMENT VARIBLES:
SCRATCH         directory  where vectors and help files should go

Additional flags valid only for magnetic cases (runsp_lapw) include:

-dm   ->        calculate the density matrix (when -so is set, but -orb is not)
-eece ->        use "exact exchange+hybrid" methods
-orb  ->        use LDA+U, OP or B-ext correction
-orbc ->        use LDA+U correction, but with constant V-matrix

Calling run_lapw (after init_lapw) from the subdirectory case will perform up to 40 iterations (or what you specified with switch -i) unless convergence has been reached earlier. You can choose from three convergence criteria, -ec (the total energy convergence is the default and is set to 0.0001 Ry for at least 3 iterations), -fc (magnitude of force convergence for 3 iterations, ONLY if your system has ``free'' structural parameters!) or -cc (charge convergence, just the last iteration), and any combination can also be specified. Be careful with these criteria, different systems will require quite different limits (e.g. fcc Li can be converged to $\mu$ Ry, a large unit cell with heavy magnetic atoms only to 0.1 mRy). You can stop the scf iterations after the current cycle by generating an empty file .stop (use eg. touch .stop in the respective case-directory).

The scf-cycle creates case.broyd* files which contain the "charge-history". Once run_lapw has finished, you should usually "save_lapw" (see below) the results. When you continue with another run_lapw without "save_lapw" (because the previous run did not fulfill the convergence criteria or you want to specify a more strict criterium) the "broyden-files" will be deleted unless you specify -NI.

With -e PROGRAM you can run only part of one scf cycle (e.g. run lapw0, lapw1 and lapw2), with -s PROGRAM you can start at an arbitrary point in the scf cycle (e.g. after a previous cycle has crashed and you want to continue after fixing the problem) and continue to self-consistency. Before mixer is invoked, case.clmsum is copied to case.clmsum_old, and the final ``important`` files of the scf calculation are case.clmsum and case.scf.

Invoking

run_lapw -I -i 30 -fc 0.5

will first set in case.in2 the TOT-switch (if FOR was set) to save cpu time, then run up to 30 scf cycles till the force criterion of 0.5 mRy/a.u. is met (for 3 consecutive iterations). Then the calculation of all terms of the forces is activated (setting FOR in case.in2) for a final iteration.

By default the file case.in1 is updated after lapw2 and the current Fermi-energy is inserted. This will force lapw1 to use instead of the default energy parameters (0.30) an energy ``''. The switch -in1orig can be used to keep the present case.in1 file unmodified (or to copy case.in1_orig back after -in1new).

The switch -in1new N preserves for N iteration the current case.in1 file. After the first N iterations write_in1_lapw is called and a new case.in1 file is generated, where the energy parameters are set according to the :EPLxx and :EPHxx values of the last scf iteration and the -ql value (see sections 4.4 and 7.3). In this way you may select in some cases better energy-parameters and also additional LOs to improve the linearization may be generated automatically. Note, however, that this option is potentially unsave and dangerous, since it may set energy-parameters of LOs and APW+lo too close (leading to ghostbands) or in cases where you have a ``bad'' last iteration (or large changes from one scf iteration to the next. The original case.in1 file is saved in case.in1_orig and is used as template for all further scf-cycles.

Parallelization is described in Sec. 5.5.

Iterative diagonalization, which can significantly save computer time (in particular for cases with ``few electrons'' (like surfaces) and ``large matrices (larger than 2000)'' a factor 2-5 ! is possible), is described in Sec. 7.3. It needs the case.vector.old file from the previous scf-iteration (and this file is created from case.vector when the -it switch is set) and an inverse of a previous Hamiltonian () stored in case.storeHinv. When you change the Hamiltonian significantly (changing RKmax or local orbitals), reinitialize the iterative diagonalization either by ``touch .fulldiag'' (performs one full diagonalization) or ``touch .noHinv'' (recreates case.storeHinv files) or using the -it1|-it2 switch.

You can save computer time by performing the first scf-cycles without calculating the non-spherical matrix elements in lapw1. This option can be set for N iterations with the -nohns N switch.

The presence of the file .lcore directs the script to superpose the radial core densities using dstart and generating case.clmsc. It is created automatically during init_lapw when charge-leakage warnings are ignored. This option allows to reduce the number of semi-core states, but still keeping a good charge density. Unfortunately, dstart is not parallelized and thus can be slow for big cases.

The presence of the file case.in0_grr activates a second call of lapw0, which is necessary for modified Becke-Johnson potentials (see Section 4.5.8) or $E_{xc}$ analysis.

If you have a previous scf-calculation and changed lattice parameters or positions (volume optimization or internal positions minimization), one could use -renorm to renormalize the density prior to the first iteration., but the recommended way is to use clmextrapol_lapw.

For magnetic systems which are difficult to converge you can use the script runfsm_lapw -m M (see section 4.5.3) for the execution of fixed-spin moment (FSM) calculations.

2 Utility scripts

1 Save a calculation (save_lapw)

After self-consistency has been reached, the script

save_lapw head_of_save_filename

saves case.clmsum, case.scf, case.dmat, case.vorb and case.struct under the new name and removes the case.broyd* files. Now you are ready to modify structural parameters or input switches and rerun run_lapw, or calculate properties like charge densities (lapw5), total and partial DOS (tetra) or energy bandstructures (spaghetti).

For more complicated situations, where many parameters will be changed, we have extended save_lapw so that calculations can not only be saved under the head_of_save_filename but also a directory can be specified. If you use any of the possible switches (-a, -f, -d, -s) all input files will be saved as well (and can be restored using restore_lapw).

Options to save_lapw can be seen with

save_lapw -h

Currently the following options are supported

-h help

-a save all input files as well

-f force save_lapw to overwrite previous saves

-d directory save calculation in directory specified

-s silent operation (no output)

2 Restoring a calculation (restore_lapw)

To restore a calculation the script restore_lapw can be used. This script restores the struct, clmsum, vorb and dmat files as well as all input files. Note: The input files will only be restored when save_lapw -d was used, i.e. when you have saved a calculation in an extra directory.

After restore_lapw you can continue and either run an scf cycle (run_lapw) or recreate the scf-potential (x lapw0) and the corresponding eigenvectors (x lapw1) for further tasks (DOS, electron density,...).

Options to restore_lapw are:

-h help

-f force restore_lapw to overwrite previous files

-d directory restore calculation from directory specified

-s silent operation (no output)

-t only test which files would be restored

3 Remove unnecessary files (clean_lapw)

Once a case has been completed you can clean up the directory with this command. Only the most important files (scf, clmsum, struct, input and some output files) are kept. It is very important to use this command when you have finished a case, since otherwise the large vector and helpXX files will quickly fill up all your disk space.

4 Migrate a case to/from a remote computer (migrate_lapw)

This script migrates a case to a remote computer (to be called within the case-dir). Needs working ssh/scp without password; local and remote case-dir must have the same name.

Call it within the desired case-dir as:

migrate_lapw [FLAGS OPTIONS] [user@]host:path/case-dir

with the following options:

-put           -> transfer of files to a remote host (default)
-get           -> transfer of files from a remote host

-all           -> the complete directory is copied
-start         -> only files to start an scf cycle are copied (default for put)
-end           -> only new files resulting from an scf cycle are copied
                  (default for get)
-save savedir  -> "save_lapw -d save_dir" is issued and only save_dir is copied

FLAGS:
-h             -> help
-clean         -> a clean_lapw is issued before copying
-r             -> files in source directory are removed after copying
-R             -> source directory (and all files) are removed after copying
-s             -> do it silent (in batch mode)
-z             -> gzip files before scp (slow network)

5 Generate case.inst (instgen_lapw)

This script generates case.inst from a case.struct file. It is used automatically in init_lapw, if case.inst is not present. Using some options (see below) it allows to define the spin-state of all/certain atoms. Note: the label ``RMT'' is necessary in case.struct.

instgen_lapw [-h -s -up -dn -nm -ask]
   -h:   generate this message
   -s:   silent operation (do not ask)
   -up:  generates spin-up configuration for all atoms (default)
   -dn:  generates spin-dn configuration for all atoms
   -nm:  generates non-magnetic configuration for all atoms
   -ask: asks for each atom which configuration it should generate

6 Set R-MT values in your case.struct file (setrmt_lapw)

This perl-script executes x nn and uses its output to determine the atomic sphere radii (obeying recommended ratios for H, sp-, d- and f- elements). It is called automatically within init_lapw or you may call it separately using:

setrmt_lapw case [-r X ]

where case gives the head of the case.struct file. You may specify a reduction of the RMTs by X percent in order to allow for structural optimizations. It creates case.struct_setrmt.

7 Create case.int file (for DOS) (configure_int_lapw)

This script creates the input file case.int for the program tetra and allows to specifiy which partial DOS (atom, l and m) should be calculated. It was provided by Morteza Jamal (m_jamal57@yahoo.com).

You can specify interactively:

  total        (for plotting 'Total Dos')
  N            (to select atom N)
    s,p,d,...  (to select a set of PDOS for previously selected atom N)
                use labels as listed in the header of your case.qtl file)
  end          (for exit)

There is also a "batch" (non-interactive) mode:

configure_int_lapw -b  total 1 tot,d,d-eg,d-t2g  2 tot,s,p  end

which will prepare case.int (eg. for the TiC example) with:

tic             #Title
 -1.000   0.00250   1.200  0.003     #Emin, DE, Emax, Gauss-Broad
     8                               #Number of DOS
     0 1 total-DOS
     1 1 tot-Ti
     1 4 d-Ti
     1 5 d-eg-Ti
     1 6 d-t2g-Ti
     2 1 tot-C
     2 2 s-C
     2 3 p-C

8 Check for running WIEN jobs (check_lapw)

This script searches for .running.* files within the current directory (or the directory specified with ``-d full_path_directory'') and then performs a ps command for these processes. If the specified process has not been found, it removes the corresponding .running.* file after confirmation (default) or immediately (when ``-f'' has been specified).

9 Cancel (kill) running WIEN jobs (cancel_lapw)

This script searches for .running.* files within the current directory (or the directory specified with ``-d full_path_directory'') and then kills the corresponding process after confirmation (default) or immediately (when ``-f'' has been specified). It is particular usefull for killing ``k-point parallel'' jobs.

10 Extract critical points from a Bader analysis (extractaim_lapw)

This script extracts the critical points (CP) after a Bader analysis (x aim (-c)) from case.outputaim. It sorts them (according to the density), removes duplicate CPs, converts units into $\AA, e/\AA^3, ...$ and produces critical_points_ang.

It is used with: �extractaim_lapw case.outputaim

11 scfmonitor_lapw

This program was contributed by:

$\framebox{ \parbox[c]{12cm}{ Hartmut Enkisch \\ Institute of Physics E1b\\... ...ilinglist. If necessary, we will communicate the problem to the authors.} } }$

It produces a plot of some quantities as function of iteration number (a maximum of 6 quantities is possible at once) from the case.scf file as specified on the commandline using analyse_lapw and GNUPLOT. This plot is updated in regular intervals.

You can call scfmonitor_lapw using:

scfmonitor_lapw [-h] [-i n] [-f case.scf] [-p] arg1 [arg2 .. arg6]

-h             	help switch
-i n            show only the last n iterations
-f scf-file     use  "scf-file"  instead of the default  "case.scf"
-p             	produces file  "scfmonitor.png"  instead of X-window plot
arg1,...        arguments to monitor (like  ":ENE"  or  ":DIS" , see analyse_lapw )

The scfmonitor can also be called directly from w2web using the "Analyse" tool.

In order to have a reasonable behavior of scfmonitor the GNUPLOT window should stay in background. This can be achieved by putting a line into your .Xdefaults file like:

gnuplot*raise: off

Note: It does not make sense to start scfmonitor before the first cycle has finished because no case.scf exists at this point.

12 analyse_lapw

The script analyse_lapw is usually called from scfmonitor_lapw. It "greps" from an scf-file the specified arguments and produces analyse.out.

analyse_lapw is called using:

analyse_lapw [-h] scf-file arg1 [arg2 arg3 arg4 arg5 arg6]

-h              help switch
scf-file        "scf-file"  to analyse (there's no default  "case.scf" !)
arg1,...        arguments to analyse:
                atom independend:   :ENE :DIS :FER :MMT
                atom iii dependend: :CTOiii :CUPiii :CDNiii :NTOiii :NUPiii :NDNiii 
                                    :DTOiii :DUPiii :DDNiii :RTOiii :EFGiii :HFFiii
                                    :MMIiii
                vector quantities:  :FORiii[x/y/z] :POSiii[x/y/z] :FGLiii[x/y/z]
                             where     magnitude      z              z   is the default

For vector quantities like :FGLiii or :POSiii (usefull with case.scf_mini) one can specify the respective coordinate by adding x/y/z to the corresponding labels.

13 Check parallel execution (testpara_lapw)

testpara_lapw is a small script which helps you to determine an optimal selection for the file .machines for parallel calculations (see sec. 5.5).

14 Check parallel execution of lapw1 (testpara1_lapw)

testpara1_lapw is a small script which determines how far the execution of lapw1para has proceeded.

15 Check parallel execution of lapw2 (testpara2_lapw)

testpara2_lapw is a small script which determines how far the execution of lapw2para has proceeded.

16 grepline_lapw

Using

grepline_lapw :label 'filename*.scf' lines_for_tail or

grepline :label 'filename*.scf' lines_for_tail

you can get a list of a quantity ``:label'' (e.g. :ENE for the total energy) from several scf files at once.

17 initso_lapw

initso_lapw helps you to initialize the calculations for spin-orbit coupling. It helps together with make_inso_lapw (contributed by Morteza Jamal, m_jamal57@yahoo.com) to create/modify all required input files (case.inso, case.in1, case.in2c). In a spinpolarized case SO may reduce symmetry or equivalent atoms may become non-equivalent, and the script calls symmetso and will help you to find proper symmetries and setup the respective input files. It is called using

initso_lapw or

initso

and you should carefully follow the instructions and explanations of the script and the explanations for case.inso given in section 7.4.

18 vec2old_lapw

vec2old_lapw moves case.vector files to case.vector.old. Usually called automatically just before lapw1 when the iterative diagonalization (run_lapw -it) is specified. It also works for the k-parallel case including local $SCRATCH directories (add -p as first argument, uses hosts from .processes and requires commensurate k-point/number_of_processors) and spin-polarization (-up/-dn switches).

For runfsm_lapw the sequence had to be changed and the switches -updn or -dnup forces vec2old to COPY case.vectorup tocase.vectordn (and vice versa). In the runfsm_lapw case the corresponding case.vector*.old files are generated just AFTER lapw2/lapwdm and not BEFORE lapw1. Thus after runfsm_lapw has finished, the corresponding spin-up/dn vectors are case.vector*.old and NOT case.vector*.

The switches -p -local will copy $SCRATCH/case.vector* to case.vector*. It will be done automatically when you run x lapw2 -p -qtl.

An alternative script vec2pratt_lapw was provided by L.D.Marks (l-marks@northwestern.edu) which together with SRC_vecpratt mixes the last two vectors (Pratt mixing) to generate case.vector.old. It is activatd using the -vec2pratt switch in run_lapw.

19 clmextrapol_lapw

clmextrapol_lapw extrapolates the charge density (case.clmsum/up/dn) from old to new positions (or from old to new lattice parameters). It takes the density from the old positions (copied into old.clmsum) and subtracts an atomic superposition density (new_super.clmsum) fom the old positions and adds an atomic superposition density fom the new ones (generated by dstart). If new_super.clmsum (generated automatically by init_lapw) is not present, it will be generated and for the next geometry step an extrapolation will take place.

It is usually called from ``min_lapw '' after a geometry step has finished and a new struct file has been generated.

It can significantly reduce the number of scf-cycles for the new geometry step.

3 Structure optimization

1 Lattice parameters (Volume, c/a, lattice parameters)

1 Package optimize

The auxilliary program optimize (x optimize) generates from an existing case.struct (or case_initial.struct, which is generated at the first call of optimize) a series of struct files with various volumes (or c/a ratios, or other modified parameters) (depending on your input):

 [1]  VARY VOLUME with CONSTANT RATIO A:B:C
 [2]  VARY C/A RATIO with CONSTANT VOLUME (tetr and hex lattices)
 [3]  VARY C/A RATIO with CONSTANT VOLUME and B/A (orthorh lattice)
 [4]  VARY B/A RATIO with CONSTANT VOLUME and C/A (orthorh lattice)
 [5]  VARY A and C (2D-case) (tetragonal or hexagonal lattice)
 [6]  VARY A, B and C (3D-case) (orthorhombic lattice)
 [7]  VARY A, B, C and Gamma (4D-case) (monoclinic lattice)
 [8]  VARY C/A RATIO and VOLUME (2D-case) (tetr and hex lattices)

It also produces a shell-script optimize.job which looks similar to:

#!/bin/csh -f
 foreach i ( \
        tic_vol_-10.0  \
        tic_vol__-5.0  \
        tic_vol___0.0  \
        tic_vol___5.0  \
        tic_vol__10.0  \
 )
     cp  $i.struct tic.struct
 #    cp  $i.clmsum tic.clmsum
 #    x dstart
 #    run_lapw -ec 0.0001 -in1new 3  -renorm 
     run_lapw -ec 0.0001
     set stat = $status
     if ($stat) then
        echo "ERROR status in" $i
        exit 1
     endif
     save_lapw  ${i}
 #    save_lapw  -f -d XXX $i    
 end

You may modify this script according to your needs: use runsp_lapw or even min_lapw, or specify different convergence parameters; modify the save_lapw command and change the save-name or save into a directory to separate e.g. ``gga'' and ``lda'' results. Eventually you may activate the line `` cp $i.clmsum case.clmsum'' to use a previously saved clmsum file, e.g. from a calculation with smaller RKmax, ... and deactivate the "clmextrapol_lapw" lines, but usually the latter is so efficient that this is no longer recommended.

Note: You must have a case.clmsum file (either from init_lapw or from a previous scf calculation) in order to run optimize.job.

After execution of this script you should have a series of scf-files with energies corresponding to the modified parameters, which should allow you to find the corresponding equillibrium parameters. For the volume optimization an analysis tool is available, other tools are under development).

Using the script grepline (or the ``Analysis $\rightarrow$ Analyze multiple SCF-files'' menu of w2web) you get a summary of the total energy vs. volume (c/a). The file case.analysis can be used in eplot_lapw to find the minimum total energy and the equilibrium volume (c/a). Supported equation of states include the EOS2, Murnaghan and Birch-Murnaghan EOS.

grepline :ENE '*.scf' 1 > case.analysis
grepline :VOL '*.scf' 1 » case.analysis

Using such strategies also higher-dimensional optimizations (e.g. c/a ratio and volume) are possible in combination with the -d option of save_lapw.

For optimization of more degrees of freedom (2-4 lattice parameters), you can use the corresponding option and for analysis of the data the script parabolfit_lapw together with the program eosfit6. It performs a non-linear least squares fit, using a parabolic fit-function in your variables and get an analytic description of your energy surface. Please note, this is only a harmonic fit (no odd or higher terms) and the description may not be very good if your parameter range is large and/or the function is quite anharmonic, or you suffer from numerical noise.

For the determination of elastic constants see the description of ELAST in sec 8.14.

2 Package 2Doptimize

This program was contributed by:

This package performs a convenient 2D structure optimization (Volume and c/a, i.e. tetragonal or hexagonal spacegroups). After initialization of a case, one generates a set of structures and a job-file 2Doptimize.job using the command

set2D_lapw

This calls setup2D and you have to specify the ``order of the fit'' (stored in .fordfitcoa) and the changes in volume and c/a. The resulting 2Doptimize.job script should be adapted (eg. use min_lapw instead of run_lapw; insert switches,...) and executed. Finally

ana2D_lapw

can be executed and will analyze the results. It uses a set of case.Vconst* files (produced by 2Doptimize.job and stored also in subdirectory Vconst) and the numbvcoa file. You can modify the order of the fits in .fordfitcoa and rerun ana2D_lapw to check the sensitivity of the results with the fitting. Note: Fits of high order (and few ``data points'') may lead to artificial results due to unphysical oszillations of the fit.

You can see results for
- energy vs. c/a for each volume,
- energy vs. volume (with optimized c/a) and
- c/a vs. volume.

Optionally you can specify more cases by rerunning set2D_lapw. Specify also your ``old'' volume and c/a points again (or leave them out on purpose in case they were very bad (eg. very far from the minimum). The old results will then be taken automatically into account without recalculation (unless you modify 2Doptimize.job, see the comments at the top of this file). Thus a ``good'' strategy is to use only 3x3 points (order of fit = 3) and in a second step you add points where they are needed.

When you want to rerun such an optimization with different parameters (RKmax, k-mesh, XC-potentials) modify the top of 2Doptimize.job and set answscf=no and a new savename (eg. "_pbe_rk8_1000k").

2 Minimization of internal parameters (min_lapw)

Most of the more complicated structures have free internal structural parameters, which can either be taken from experiment or optimized using the calculated forces on the nuclei.

Starting with WIEN2k_11.1 there are two possibilities to determine the equilibrium position of all individual atoms automatically (obeying the symmetry constraints of a certain space group). One can use either

the shell script min_lapw, together with the program mini, which will run a scf-cycle, update the positions using the calculated forces and restarts a new scf cycle. This continues until forces drop below a certain value;
or use the normal scf-scripts run_lapw with a special input in case.inm such that the charge density and the positions are simultaneously optimized during the scf-cycle.

At present we recommend the second (new) option, although there are cases where this scheme can be slower or may even fail to converge.

A typical sequence of commands for an optimization of the internal positions would look like:

Generate struct file
init_lapw
run_lapw -fc 1 [another runXX script or additional options are of course also possible] (this may take some time)
Inspect the scf file whether you have significant forces (usually at least .gt. 5 mRy/bohr), otherwise you are more or less at the optimal positions (An experienced user may omit the run_lapw step and proceed directly from init_lapw to the next step)

Now you have to decide which method to use:

min_lapw [options] (this may take some time)
- it will generate a default case.inM (if not present) by:
  - executing ``x pairhess -copy ; cp case.inM_st case.inM '' (i.e. it sets up the PORT minimization option and calculates an approximate starting Hessian).
  - when -nohess is specified, it will generate case.inM from SRC_templates with the NEW1 option (not recommended).
- Without -NI switch min_lapw performs an initialization first:
  - removes "histories" (case.broyd*, case.tmpM) if present;
  - copies .min_hess to .minrestart (if present from previous min_lapw or x pairhess).
or edit case.inm and put MSR1a (or MSEC1a) as ``mixing method''. Then continue with run_lapw -fc 0.5 -ec 0.0001 -cc 0.001 [-it]. It will run x pairhess (unless case.inM is already present) and then run (several hundreds) scf-cycles, simultaneously updating positions and charge densities. Once the forces seem to be smaller than the limit defined in case.inM it will switch to ``mixing method'' MSR1 and finalize the scf-cycle with fixed positions. Because of this, the final forces may not be as small as desired and eventually you have to restart this step using MSR1a again.

When using the second method we recommend you read carefully $WIENROOT/SRC_mixer/Mixer_README_4.1.pdf. Overall the method is very good for semiconductors (or well behaved metals), and allows ``tricks'' like small k-mesh or small RKMax at the beginning of the minimization and using higher accuracy only towards the end.

The following text refers (mainly) to the first method using min_lapw:

When case.scf is not present, an scf-cycle will be performed first, otherwise the corresponding forces are extracted into case.finM and the program mini generates a new case.struct with modified atomic positions. The previous step is saved under case_1/2/3.... Then a new scf-cycle is executed and this loop continues until convergence (default: forces below 2mRy/bohr) is reached.
The last iteration of each geometry step is appended to case.scf_mini, so that this file contains the complete history of the minimization and can be used to monitor the progress (grep :ENE *mini; or :FORxxx ...).

By default (unless switch -noex is specified), min will call the script clmextrapol_lapw after the first geometry step and try to extrapolate the charge density to the new positions. This procedure usually significantly reduces the number of scf-cycles and is thus highly recommended.

mini requires an input file case.inM (see Sec. 8.15) which is created automatically and MUST NOT be changed while min_lapw is running (except the force tolerance, which terminates the optimization).

We recommend the PORT minimization method, a reverse-communication trust-region Quasi-Newton method from the Port library, which seems to be stable, efficient and does not depend too much on the users input (DELTAs, see below with NEWT). The PORT option also uses/produces a file .min_hess, which contains the (approximate) Hessian matrix (lower-triangle Cholesky factor) If you restart a minimization with different k-points, RMT, RKmax, ... or do a similar calculation (eg. for a different volume, ...) it will be copied to .minrestart (unless -nohess is specified), so that you start with a reasonable approximation for the Hessian. The program pairhess, which calculates the first Hessian, also prints out the average Hessian eigenvalue for the symmetric, symmetry-preserving modes in mRyd/au2 as well as the minimum and maximum, and also the vibration frequencies. A list of these is given at the end of case.pairhess. Note that these are not all possible modes, but only the symmetry preserving ones. Therefore if you have prior information about the vibrations of the system you can adjust the rescaling term so the average vibration frequency is about correct. (see the description of pairhess in 9.2). (In addition there is a program eigenhess, which will analyze the Hessian after the minimization has been completed. It also prints vibrational frequencies and may give you hints about dynamical instability of your system. Some more description is given in $WIENROOT/SRC_pairhess/README and at the top of the output file case.outputeig.

When using PORT you may also want to check its progress using

grep :LABEL  case.outputM

where :LABEL is :ENE (should decrease), :GRAD (should also go down, but could sometimes also go up for some time as long as the energy still decreases), :MIN (provides a condensed summary of the progress), :WARN may indicate a problem), :DD (provides information about the step sizes and mode used). Some general explanations are:
1) The algorithm takes steps along what it considers are good directions (using some internal logic), provided that these steps are smaller than what is called the trust-region radius. After a good step (e.g. large energy decrease) it expands the trust-region; after a bad one it reduces it. Sometimes it will try too large a step then have to reduce it, so the energy does not always go down. You can see this by using ":DD'' and ``:MIN" .
2) A grep on :MIN gives a condensed progress output, in which the most significant terms are E (energy in some rescaled units), RELDF (last energy reduction), PRELDF (what the algorithm predicted for the step), RELDX (RMS change in positions in Angstroms) and NPRELDF (predicted change in next cycle). Near the solution RELDF and RELDX should both become small. However, sometimes you can have soft modes in your structure in which case RELDX will take a long time before it becomes small.
3) A warning that the step was reduced due to overlapping spheres if it happens only once (or twice) is not important; the algorithm tested too large a step. However, if it occurs many times it may indicate that the RMT's are too big.
4) A warning "CURVATURE CONDITION FAILED" indicates that you are still some distance from the minimum, and the Hessian is changing a lot. If you see many of these, it may be that the forces and energy are not consistent.

Sometimes PORT gets "stuck" (often because of inconsistencies of energy and forces due to insufficient scf convergence or a very non-harmonic potential energy surface). A good alternative is NEW1, which is a "sophisticated" steepest-descent method with optimized step size. It can be very efficient in certain cases, but can also be rather slow when the potential energy surface is rather flat in one, but steep in another direction (eg. a weakly bound molecule on a surface, but constraining the sensitive parameters, like the bond distance of the molecule, may help).

Another alternative is NEWT, where one must set proper "DELTAs" and a "FRICTION" for each atom. Unfortunately, these DELTAs determine crucially how the minimization performs. Too small values lead to many (unnecessary) "geometry steps", while too large DELTAs can even lead to divergence (and finally to a crash). Thus you MUST control how the minimization performs. We recommend the following sequence after 2-3 geometry steps:

grep :ENE *mini
:ENE  : ********** TOTAL ENERGY IN Ry =        -2994.809124                    
:ENE  : ********** TOTAL ENERGY IN Ry =        -2994.813852                    
:ENE  : ********** TOTAL ENERGY IN Ry =        -2994.818538

Good, since the total energy is decreasing.

grep :FGL001 *mini
:FGL001:  1.ATOM                         0.000          0.000         18.219    
:FGL001:  1.ATOM                         0.000          0.000         12.375    
:FGL001:  1.ATOM                         0.000          0.000          7.876

Good, since the force (only a force along z is present here) is decreasing reasonably fast towards zero. You must check this for every atom in your structure.

When you detect oszillations or too small changes of the forces during geometry optimization, you will have to decrease/increase the DELTAs in case.inM and rm case.tmpM. (NOTE: You must not continue with modified DELTAs but keeping case.tmpM.) Alternatively, stop the minimization (touch .minstop and wait until the last step has finished), change case.inM and restart.

You can get help on its usage with:

min -h or min_lapw -h

PROGRAM:        min

USAGE:          min [OPTIONS]

OPTIONS:
-j JOB ->       job-file JOB (default: run_lapw -I -fc 1. -i 40 )
-noex ->        does not extrapolate the density for next geometry step
-p ->           adds -p (parallel) switch to run_lapw 
-it ->          adds -it (iterative diag.) switch to run_lapw 
-it1 ->         adds -it1 (it.diag. with recreating H_inv) switch to $job
-it2 ->         adds -it2 (it.diag. with reinitialization) switch to $job
-noHinv ->      adds -it -noHinv (it.diag. without H_inv) switch to $job
-sp ->          uses runsp_lapw instead of run_lapw
-nohess ->      removes .minrestart (initial Hessian) from previous minimization
-m ->           extract force-input and execute mini (without JOB) and exit
-mo ->          like -m but without copying of case.tmpM1 to case.tmpM
-h/-H ->        help
-NI ->          without initialization of minimization (eg. continue after a crash)
-i NUMBER ->    max. NUMBER (50) of structure changes
-s NUMBER ->    save_lapw after NUMBER of structure changes

CONTROL FILES:
.minstop        stop after next structure change

For instance for a spin-polarized case, which converges more difficultly, you would use:

min -j ``runsp_lapw -I -fc 1.0 -i 60''

4 Phonon calculations

Calculations of phonons is based on a program PHONON by K.Parlinski, which runs under MS-Windows and must be ordered separately (see http://wolf.ifj.edu.pl/phonon/ )

You would define the structure of your compound in PHONON together with a supercell of sufficient size (e.g. 64 atoms). PHONON will then generate a list of necessary displacements of the individual atoms. The resulting file case.d45 must be transfered to UNIX. Here you would run WIEN2k-scf calculations for all displacements and collect the resulting forces, which will be transfered back to PHONON (case.dat and/or case.dsy). With these force information PHONON calculates phonon at arbitrary q-vectors together with several thermodynamic properties.

1 init_phonon_lapw

init_phonon_lapw uses case.d45 from PHONON and creates subdirectories case_XX and case_XX.struct files for all required displacements. It allows you to define globally RMT values for the different atoms and
- initializes every case individually (batch option of init_lapw is now supported) or
- initializes every second case (useful for pos. and neg. displacements, which have the same symmetry and thus only one initialization is necessary), or
- initializes only the first case and copies the files from the first case to all others. This is most convenient in low symmetry cases with P1 symmetry for all cases and thus just one init_lapw needs to be executed (while for higher symmetry a separate initialization is required (but computational effort is reduced).

Please use mainly nn to reduce equivalent atoms. sgroup might change the unitcell and than the collection of forces into the original supercell is not possible (or quite difficult).

A script run_phonon has been created. Modify it according to your needs (parallelization,....) and run all cases to selfconsistency.

Note that good force convergence is essential (at least 0.1 mRy/bohr) and if your structure has free parameters, either very good equillibrium positions must have been found before, or even better, use both, positive and negative displacements to average out any resulting error from non-equillibrium positions.

2 analyse_phonon_lapw

analyse_phonon_lapw uses the resulting scf files and generates the ``Hellmann-Feynman''-file required by PHONON. When you have positive and negative displacements an automatic averaging will be performed. The resulting case.dat and case.dsy filse should be transfered back to MS-Windows and imported into PHONON.

5 Running programs in parallel mode

This section describes two methods for running WIEN2k on parallel computers.

One method, parallelizing k-points over processors, utilizes c-shell scripts, NFS-file system and passwordless login ((public/private keys). This method works with all standard flavors of Unix without any special requirements. The parallelization is very efficient even on heterogeneous computing environments, e.g. on heterogeneous clusters of workstations, but also on dedicated parallel computers and does NOT need large network bandwidth.

The other parallelization method, which comes new with version WIEN2k_07.3, is based on fine grained methods, MPI and SCALAPACK. It is especially useful for larger systems, if the required memory size is no longer available on a single computer or when more processors than k-points are available. It requires a fast network (at least Gb-Ethernet, better Myrinet or Infiniband) or a shared memory machine. Although not as efficient as the simple k-point parallelization, the current mpi-version has been enhanced a lot and shows very good scaling with the number of processors for most parts. In any case, the number of processors and the size of the problem (number of atoms, matrixsize due to the plane wave basis) must be compatible and typically [NMAT / sqrt(processors)] .gt. 2000 should hold.

The k-point parallelization can use a dynamic load balancing scheme and is therefore usable also on heterogeneous computing environments like networks of workstations or PCs, even if interactive users contribute to the processors' work load.

If your case is large enough, but you still have to use a few k-points, a combination of both parallelization methods is possible (always use k-point parallelism if you have more than 1 k-point).

1 k-Point Parallelization

Parts of the code are executed in parallel, namely LAPW1, LAPWSO, LAPW2, LAPWDM, and OPTIC. These are the numerically intensive parts of most calculations.

Parallelization is achieved on the k-point level by distributing subsets of the k-mesh to different processors and subsequent summation of the results. The implemented strategy can be used both on a multiprocessor architecture and on a heterogeneous (even multiplatform) network.

To make use of the k-point parallelization, make sure that your system meets the following requirements:

NFS:: All files for the calculation must be accessible under the same name and path. Therefore you should set up your NFS mounts in such a way, that on all machines the path names are the same.
Remote login:: rlogin or ssh to all machines must be possible without specifying a password. Therefore you must either edit your .rhosts file to include all machines you intend to use (not necessary for a shared memory machine), or correctly specify public/private keys for ssh. This can be done by running ``ssh-keygen -t rsa'' and copying the id_rsa.pub key into /.ssh/authorized_keys at the remote sites.
The command for launching a remote shell is platform dependent, and usually can be 'ssh', 'rsh' or 'remsh'. It should be specified during installation when siteconfig_lapw is executed (see chapter 11).

2 MPI parallelization

Fine grained MPI parallel versions are available for the programs lapw0, lapw1, and lapw2. This parallelization method is based on parallelization libraries, including MPI, ScaLapack, PBlas and FFTW_2.1.5 (lapw0). The required libraries are not included with WIEN2k. On parallel computers, however, they are usually installed. Otherwise, free versions of these libraries are available.

The parallelization affects the naming scheme of the executable programs: the fine grained parallel versions of lapw0/1/2 are called lapw0_mpi, lapw1[c]_mpi, and lapw2[c]_mpi. These programs are executed by calls to the local execution environments, as in the sequential case, by the scripts x, lapw0para, lapw1para, and lapw2para. On most computers this is done by calling mpirun and should also be configured using siteconfig_lapw.

3 How to use WIEN2k as a parallel program

To start the calculation in parallel, a switch must be set and an input file has to be prepared by the user.

The switch -p switches on the parallelization in the scripts x and run_lapw.
In addition to this switch the file .machines has to be present in the current working directory. In this file the machine names on which the parallel processes should be launched, and their respective relative speeds must be specified.

If the .machines file does not exist, or if the -p switch is omitted, the serial versions of the programs are executed.

Generation of all necessary files, starting of the processes and summation of the results is done by the appropriate scripts lapw1para, lapwsopara,lapwdmpara and lapw2para (when using -p), and parallel programs lapw0_mpi, lapw1_mpi, and lapw2_mpi (when using fine grained parallelization has been selected in the .machines file).

4 The `.machines` file

The following .machines file describes a simple example. We assume to have 5 computers, (alpha, ... epsilon), where epsilon has 4, and delta and gamma 2 cpus. In addition, gamma, delta and epsilon are 3 times faster than alpha and beta.:

# This is a valid .machines file
#
granularity:1
1:alpha
1:beta
3:gamma:2 delta
3:delta:1 epsilon:4
residue:delta:2
lapw0:gamma:2 delta:2 epsilon:4

To each set of processors, defined by a single line in this file, a certain number of k-points is assigned, which are computed in parallel. In each line the weight (relative speed) and computers are specified in the following form:

weight:machine_name1:number1 machine_name2:number2 ...

where weight is an integer (e.g. a three times more powerful machine should have a three times higher weight). The name of the computer is machine_name[1/2/...], and the number of processors to be used on these computers are number[1/2/...]. If there is only one processor on a given computer, the :1 may be omitted. Empty lines are skipped, comment lines start with #.

Assuming there are 8 k-points to be distributed in the above example, they are distributed as follows. The computers alpha and beta get 1 each. Two processors of computer gamma and one processor of computer delta cooperate in a fine grained parallelization on the solution of 3 k-points, and one processor of computer delta plus four processors of computer epsilon cooperate on the solution of 3 k-points. If there were additional k-points, they would be calculated by the first processor (or set of processors) becoming available. With higher numbers of k-points, this method ensures dynamic load balancing. If a processor is busy doing other (e.g., interactive) work, the overall calculation will not stall, but most of its work will be done by other processors (or sets of processors using MPI). This is, however, not an implementation for fail safety: if a process does not terminate (e.g., due to shutdown of a computer) the calculation will never terminate. It is up to the user to handle with such hardware failures by modifying the .machines file and restarting the calculation at the appropriate point.

During the run of lapw1para the file .processes is generated. This file is used by lapw2para to determine which case.vector* to read.

By default lapw1para will generate approximately 3 vector-files per processor, if enough k-points are available for distribution. The factor 3 is called ``granularity'' and allows for some load balancing in heterogeneous environments. If you can be sure that load balancing is not an issue (eg. because you use a queuing-system and can be sure that you will get 100% of the cpus for your jobs) it is recommended to set

granularity:1

for best performance.

On shared memory machines it is advisable to add a ``residue machine'' to calculate the surplus (residual) k-points (given by the expression $\mbox{MOD}(klist, \sum_j{newweight_j}$ ) and rely on the operating system's load balancing scheme. Such a ``residue machine'' is specified as

residue:machine_name:number

in the .machines file.

Alternatively, it is also possible to distribute the remaining k-points one-by-one (and not in one junk) over all processors. The option

extrafine:1

can be set in the .machines file.

When using ``iterative diagonalization'' or the $SCRATCH variable (set to a local directory), the k-point distribution must be ``fixed''. This means, the ratio (k-points / processors) must be integer (sloppy called ``commensurate'' at other places in the UG) and granularity:1 should be set.

The line

lapw0:gamma:2 delta:2 epsilon:4

defines the computers used for running lapw0_mpi. In this example the 6 processors of the computers gamma, delta, and epsilon run lapw0_mpi in parallel.

If fine grained parallelization is used, each set of processors defined in the .machines file is converted to a single file .machine[1/2/...], which is used in a call to mpirun (or another parallel execution environment).

When using a queuing system (like PBS, LoadLeveler or SUN-Gridengine) one can only request the NUMBER of processors, but does not know on which nodes the job will run. Thus a ``static'' .machines file is not possible. On can write a simple shell script, which will generate this file on the fly once the job has been started and the nodes are assigned to this job. Examples can be found at our web-site ``http://www.wien2k.at/reg_users/faq''.

5 How the list of k-points is split

In the setup of the k-point parallel version of LAPW1 the list of k-points in case.klist (Note, that the k-list from case.in1 cannot be used for parallel calculations) is split into subsets according to the weights specified in the .machines file:

$\begin{displaymath}newweight_i=\left\lfloor{weight_i*klist\over {granularity * \sum_j{weight_j}}}\right\rfloor\end{displaymath}$

where is the number of k-points to be calculated on processor i. is always set to a value greater equal one.

A loop over all processors is repeated until all k-points have been processed.

Speedup in a parallel program is intrinsically dependent on the serial or parallel parts of the code according to Amdahl's law:

$\begin{displaymath}speedup={1 \over { (1 - P ) + {P \over N}}}\end{displaymath}$

whereas N is the number of processors and P the percentage of code executed in parallel.

In WIEN2k usually only a small part of time is spent in the programs lapw0, lcore and mixer which is very small (negligible) in comparison to the times spent in lapw1 and lapw2. The time for waiting until all parallel lapw1 and lapw2 processes have finished is important too. For a good performance it is therefore necessary to have a good load balancing by estimating properly the speed and availability of the machines used. We encourage the use of testpara_lapw or ``Utils. $\rightarrow$ testpara'' from w2web to check the k-point distribution over the machines before actually running the programs in parallel.

While running lapw1 and lapw2 in parallel mode, the scripts testpara1_lapw (see 5.2.14) and testpara2_lapw (see 5.2.15) can be used to monitor the succession of parallel execution.

6 Flow chart of the parallel scripts

To see how files are handled by the scripts lapw1para and lapw2para refer to figures 5.1 and 5.2. After the lapw2 calculations are completed the densities and the informations from the case.scf2_x files are summarized by sumpara.

Note: parallel lapw2 and sumpara take two command line arguments, namely the case.def file but also a number_of_processor indicator.

**Figure 5.1:** Flow chart of `lapw1para`
$\begin{figure}\begin{center} \leavevmode \rotatebox{0}{\epsfig{figure=figs/lapw1para, width=7cm}} \end{center}\end{figure}$

**Figure 5.2:** Flow chart of `lapw2para`
$\begin{figure}\begin{center} \leavevmode \rotatebox{0}{\epsfig{figure=figs/lapw2para, width=7cm}} \par\end{center}\end{figure}$

7 On the fine grained parallelization

The following parallel programs use different parallelization strategies:

lapw0_mpi

is parallelized over the number of atoms. This method leads to good scalability as long as there are more atoms than processors. For very many processors, however, the speedup is limited, which is usually not at all critical, since the overall computing time of lapw0_mpi is quite small.

lapw1_mpi

uses a two-dimensional processor setup to distribute the Hamilton and overlap matrices. For higher numbers of processors two-dimensional communication patterns are clearly preferable to one-dimensional communication patterns.

Let us assume, for example, 64 processors. In a given processing step, one of these processors has to communicate with the other 63 processors if a one-dimensional setup was chosen. In the case of a two-dimensional processor setup it is usually sufficient to communicate with the processors of the same processor row (7) or the same processor column (7), i.e. with 14 processors.

In general the processor array $P\times Q$ is chosen as follows: $P=\left\lfloor\sqrt{\mbox{number of processors}}\right\rfloor$ , $Q=\left\lfloor \frac{\mbox{number of processors}}{P}\right\rfloor$ . Because of SCALAPACK, often $P\times P$ arrays (i.e. 4, 9, 16,... processors) give best performance and of course it is not recommended to use eg. 17 processors.

lapw2_mpi

is parallelized in two main parts: (i) The density inside the spheres is parallelized over atoms, and (ii) the fast Fourier transforms are done in parallel.

In addition the density calculation for each atom can be further parallelized by distributing the eigenvector on a certain subset of processors (ususlly 2-4). This is not so efficient, but most usefull if the memory requirement is too big otherwise. You set it in .machines using

lapw2_vector_split:2

If more than one k-point is distributed at once to lapw1_mpi or lapw2_mpi, these will be treated consecutively.

Depending on the parallel computer system and the problem size, speedups will vary to some extend. Matrix setup in lapw1 should scale nearly perfect, while diagonalization (using SCALAPACK) will not. Usually, ``iterative'' scales better than ``full'' diagonalization and is preferred for large scale computations. Scalability over atoms will be very good if processor and atom numbers are compatible. Running the fine grained parallelization over a 100Mbit/s Ethernet network is not recommended, even for large problem sizes.

6 Getting on-line help

As mentioned before, all WIEN2k csh-shell scripts have a ``help``-switch -h, which gives a brief summary of all options for the respective script.
To obtain online help on input-parameters, program description, ...use
help_lapw

which opens the pdf-version of the users guide (using acroread or what is defined in $PDFREADER). You can search for a specific keyword using `` $^{\wedge}$ f keyword''. This procedure substitutes an ``Index'' and should make it possible to find a specific information without reading through the complete users guide.
In addition there is a html-version of the UG and its starting page is:
$WIENROOT/SRC_usersguide_html/usersguide.html
When using the user interface w2web, you have access to the html and pdf-version (the latter requires an X-windows environment) of the usersguide.
At our webserver we put informations for the registered user:
- A ''FAQ'' page with answers to some common problems.
- Update information: When you think the program has an error, please check wether newer versions are available, which might have fixed the problem you encounter.
- A mailing list:
  
  Please check the ''digest''!
  
  In many cases your questions may have been answered before.
  
  Locate your problem:
  
  If a calculation crashes, please locate the problem. Check the content of files like case.dayfile, *.error, case.scf, case.scfX, case.outputX where X specifies the program which crashed.
  
  Posting questions:
  
  Please provide enough information so that somebody can help you. A question like: ``My calculation crashed. Please help me!'' will most likely not be answered.

7 Interface scripts

We have included a few ``interface scripts'' into the current WIEN2k distribution, to simplify the previewing of results. In order to use these scripts the public domain program ``gnuplot'' has to be installed on your system.

1 eplot_lapw

The script eplot_lapw plots total energy vs. volume or total energy vs. c/a-ratio using the file case.analysis. The latter should have been created with grepline (using :VOL and :ENE labels) or the ``Analysis $\rightarrow$ Analyze multiple SCF-files'' menu of w2web and the file names must be generated (or compatible) with ``optimize.job''.

For a description of how to use the script for batch like execution call the script using

eplot_lapw -h

2 parabolfit_lapw

The script parabolfit_lapw is an interface for a harmonic fitting of E vs. 2-4-dim lattice parameters by a non-linear least squares fit (eosfit6) using PORT routines. Once you have several scf calculations at different lattice parameters (usually generated with optimize.job) it generates the required case.ene and case.latparam from your scf files. Using

parabolfit_lapw [ -t 2/3/4 ] [ -f FILEHEAD ] [ -scf '*xxx*.scf' ]

you can optionally specify the dimensionality of the fit or the specific scf-filenames.

3 dosplot_lapw

The script dosplot_lapw plots total or partial Density of States depending on the input used by case.int and the interactive input. A more advanced plotting interface is provided by dosplot2_lapw, see below.

For a description of how to use the script for batch like execution call the script using

dosplot_lapw -h

4 dosplot2_lapw

The script dosplot2_lapw plots total or partial Density of States depending on the input used by case.int and the interactive input. It can plot up to 4 DOS curves into one plot, and simultaneously plot spin-up/dn DOS.

It was provided by Morteza Jamal (m_jamal57@yahoo.com).

For a description of how to use the script for batch like execution call the script using

dosplot2_lapw -h

5 Curve_lapw

The script Curve_lapw plots x,y data from a file specified interactively. It asks for additional interactive input. It can plot up to 4 curves into one plot and is a simple gnuplot interface.

It was provided by Morteza Jamal (m_jamal57@yahoo.com).

6 specplot_lapw

specplot_lapw provides an interface for plotting X-ray spectra from the output of the xspec or txspec program.

For a description of how to use the script for batch like execution call the script using

specplot_lapw -h

7 rhoplot_lapw

The script rhoplot_lapw produces a surface plot of the electron density from the file case.rho created by lapw5.

Note: To use this script you must have installed the C-program reformat supplied in SRC_reformat.

8 opticplot_lapw

The script opticplot_lapw produces XY plots from the output files of the optics package using the case.joint, case.epsilon, case.eloss, case.sumrules or case.sigmak. For a description of how to use the script for batch like execution call the script using

opticplot_lapw -h

9 addjoint-updn_lapw

The script addjoint-updn_lapw adds the files case.jointup and case.jointdn together and produces case.joint. It uses internally the program add_columns. It should be called for spin-polarized optics calculations after x joint -up and x joint -dn, because the Kramers-Kronig transformation to the real part of the dielectric function () is not a simple additive quantity concerning the spin (see Ambrosch-Draxl 06). The KK transformation should then be done non-spinpolarized (x kram) resulting in files: case.epsilon, case.eloss, case.sumrules or case.sigmak.

Next: 6 Initialization Up: 2 Detailed description of Previous: 4 Files and Program Contents

pblaha 2011-03-22

-h	help
-a	save all input files as well
-f	force save_lapw to overwrite previous saves
-d directory	save calculation in directory specified
-s	silent operation (no output)

-h	help
-f	force restore_lapw to overwrite previous files
-d directory	restore calculation from directory specified
-s	silent operation (no output)
-t	only test which files would be restored

1 Job control (c-shell scripts)

1 Main execution script (x_lapw)

2 Job control for initialization (init_lapw)

3 Job control for iteration (run_lapw or runsp_lapw)

1 Save a calculation (save_lapw)

2 Restoring a calculation (restore_lapw)

3 Remove unnecessary files (clean_lapw)

4 Migrate a case to/from a remote computer (migrate_lapw)

5 Generate case.inst (instgen_lapw)

6 Set R-MT values in your case.struct file (setrmt_lapw)

7 Create case.int file (for DOS) (configure_int_lapw)

8 Check for running WIEN jobs (check_lapw)

9 Cancel (kill) running WIEN jobs (cancel_lapw)

10 Extract critical points from a Bader analysis (extractaim_lapw)

11 scfmonitor_lapw

12 analyse_lapw

13 Check parallel execution (testpara_lapw)

14 Check parallel execution of lapw1 (testpara1_lapw)

15 Check parallel execution of lapw2 (testpara2_lapw)

16 grepline_lapw

17 initso_lapw

18 vec2old_lapw

19 clmextrapol_lapw

3 Structure optimization

1 Lattice parameters (Volume, c/a, lattice parameters)

2 Minimization of internal parameters (min_lapw)

4 Phonon calculations

1 init_phonon_lapw

2 analyse_phonon_lapw

5 Running programs in parallel mode

4 The .machines file

6 Getting on-line help

7 Interface scripts

1 eplot_lapw

2 parabolfit_lapw

3 dosplot_lapw

4 dosplot2_lapw

5 Curve_lapw

6 specplot_lapw

7 rhoplot_lapw

8 opticplot_lapw

9 addjoint-updn_lapw

4 The `.machines` file