\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] Back to Table of Contents

0 Bibliography

    {100}

  1. [AWK:awk] Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The Awk Programming Language. Addison-Wesley Series in Computer Science. Addison-Wesley Publ., 1988. ISBN 020107981X, 9780201079814.

  2. [amd:law] G. Amdahl. The validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the AFIPS Computing Conference, volume~30, pages 483--485, 1967.

  3. [AxBa:febook] O. Axelsson and A.V. Barker. {\em Finite element solution of boundary value problems. Theory and computation}. Academic Press, Orlando, Fl., 1984.

  4. [AxPo:dd2] Owe Axelsson and Ben Polman. Block preconditioning and domain decomposition methods {II}. J. Comp. Appl. Math., 24:55--72, 1988.

  5. [BarnesHut] Josh Barnes and Piet Hut. A hierarchical $o(n log n)$ force-calculation algorithm. Nature, 324:446--449, 1986.

  6. [Ba:templates] Richard Barrett, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, and Henk {van der Vorst}. {\em Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods}. SIAM, Philadelphia PA, 1994. {\tt http://www.netlib.org/templates/}.

  7. [Batcher:85a] K.E. Batcher. {MPP}: A high speed image processor. In Algorithmically Specialized Parallel Computers. Academic Press, New York, 1985.

  8. [Bell:outlook] Gordon Bell. The outlook for scalable parallel processing. Decision Resources, Inc, 1994.

  9. [BePl:book] Abraham Berman and Robert J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM, 1994. originally published by Academic Press, 1979, New York.

  10. [BGSm:96] Petter E. Bjorstad, William Gropp, and Barry Smith. {\em Domain decomposition : parallel multilevel methods for elliptic partial differential equations}. Cambridge University Press, 1996.

  11. [BlackScholes] Fischer Black and Myron S Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637--54, May-June 1973.

  12. [scalapack-users-guide] L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammerling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. {\em {ScaLAPACK} Users' Guide}. SIAM, 1997.

  13. [reference-blas] Netlib.org {BLAS} reference implementation. \url{http://www.netlib.org/blas}.

  14. [Blelloc:segmented-report] Guy E. Blelloch, Michael A. Heroux, and Marco Zagha. Segmented operations for sparse matrix computation on vector multiprocessors. Technical Report CMU-CS-93-173, CMU, 1993.

  15. [Bohr:30yearDennard] Mark Bohr. A 30 year retrospective on {Dennard}'s {MOSFET} scaling paper. Solid-State Circuits Newsletter, IEEE, 12(1):11 --13, winter 2007.

  16. [Bohr:ISSCC2009] Mark Bohr. The new era of scaling in an soc world. In ISSCC, pages 23--28, 2009.

  17. [Bolz:GPUsparse] Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schr\"{o}oder. Sparse matrix solvers on the gpu: conjugate gradients and multigrid. ACM Trans. Graph., 22(3):917--924, July 2003.

  18. [boost:interval-arithmetic] {BOOST} interval arithmetic library. \url{http://www.boost.org/libs/numeric/interval/doc/interval.htm}.

  19. [papi] S. Browne, J Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. {\em International Journal of High Performance Computing Applications}, 14:189--204, Fall 2000.

  20. [Burks:discussion] A. W. Burks, H. H. Goldstine, and J. von Neumann. Preliminary discussion of the logical design of an electronic computing instrument. Technical report, Harvard, 1946.

  21. [ButtEijkLang:spmvp] Alfredo Buttari, Victor Eijkhout, Julien Langou, and Salvatore Filippone. Performance optimization and modeling of blocked sparse kernels. Int. J. High Perf. Comput. Appl., 21:467--484, 2007.

  22. [Campbell:octree] P. M. Campbell, K. D. Devine, J. E. Flaherty, L. G. Gervasio, and J. D. Teresco. Dynamic octree load balancing using space-filling curves, 2003.

  23. [Chan2007Collective] Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19:1749--1783, 2007.

  24. [Chandrakasa:transformations] A. P. Chandrakasan, R. Mehra, M. Potkonjak, J. Rabaey, and R. W. Brodersen. Optimizing power using transformations. {\em IEEE Transaction on Computer Aided Design of Integrated Circuits and Systems}, pages 13--32, January 1995.

  25. [Chapel:homepage] Chapel programming language homepage. \url{http://chapel.cray.com/}.

  26. [Chapman2008:OpenMPbook] Barbara Chapman, Gabriele Jost, and Ruud van der Pas. {\em Using {OpenMP}: Portable Shared Memory Parallel Programming}, volume~10 of Scientific Computation Series. MIT Press, ISBN 0262533022, 9780262533027, 2008.

  27. [ChenDoolen:LBM] Shiyi Chen and Gary D. Doolen. Lattice {Boltzmann} method for fluid flows. Annual Review of Fluid Mechanics, 30(1):329--364, 1998.

  28. [Choi:scalapack] Yaeyoung Choi, Jack J. Dongarra, Roldan Pozo, and David W. Walker. Scalapack: a scalable linear algebra library for distributed memory concurrent computers. In {\em Proceedings of the fourth symposium on the frontiers of massively parallel computation (Frontiers '92), McLean, Virginia, Oct 19--21, 1992}, pages 120--127, 1992.

  29. [ChGe:sstep] A. Chronopoulos and C.W. Gear. {$s$}-step iterative methods for symmetric linear systems. Journal of Computational and Applied Mathematics, 25:153--168, 1989.

  30. [Cipra:Ising] Barry A. Cipra. An introduction to the ising model. The American Mathematical Monthly, pages 937--959, 1987.

  31. [Clos1953] Charles Clos. A study of non-blocking switching networks. Bell System Technical Journal, 32:406--242, 1953.

  32. [CuMcK:reducing] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In ACM proceedings of the 24th National Conference, 1969.

  33. [DAzevedo2005:vector-mvp] Eduardo F. D'Azevedo, Mark R. Fahey, and Richard T. Mills. Vectorized sparse matrix multiply for compressed row storage format. {\em Lecture Notes in Computer Science, Computational Science – ICCS 2005}, pages 99--106, 2005.

  34. [DAzEijRo:ppscicomp] E.F. D'Azevedo, V.L. Eijkhout, and C.H. Romine. A matrix framework for conjugate gradient methods and some variants of cg with less synchronization overhead. In {\em Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing}, pages 644--646, Philadelphia, 1993. SIAM.

  35. [Google:mapreduce] Jeffrey Dean and Sanjay Ghemawat. {MapReduce}: Simplified data processing on large clusters. In {\em OSDI'04: Sixth Symposium on Operating System Design and Implementation}, 2004.

  36. [dehevo92:acta] J. Demmel, M. Heath, and H. {Van der Vorst}. Parallel numerical linear algebra. In Acta Numerica 1993. Cambridge University Press, Cambridge, 1993.

  37. [Demmel2008IEEE:avoiding] James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick. Avoiding communication in sparse matrix computations. In {\em IEEE International Parallel and Distributed Processing Symposium}, 2008.

  38. [DemEtAl:ieeeproc2004] Jim Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine Petitet, Rich Vuduc, R. Clint Whaley, and Katherine Yelick. Self adapting linear algebra algorithms and software. Proceedings of the IEEE, 93:293--312, February 2005.

  39. [Dennard:scaling] R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256 -- 268, oct 1974.

  40. [Deshpande92efficientparallel] Ashish Deshpande and Martin Schultz. Efficient parallel programming with linda. In In Supercomputing '92 Proceedings, pages 238--244, 1992.

  41. [Dijkstra:semaphores] E. W. Dijkstra. Cooperating sequential processes. \newblock \url{http://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html }. Technological University, Eindhoven, The Netherlands, September 1965.

  42. [Dijkstra1974Programming] Edsger W. Dijkstra. Programming as a discipline of mathematical nature. Am. Math. Monthly, 81:608--612, 1974.

  43. [EWD:EWD117] Edsger W. Dijkstra. Programming considered as a human activity. published as {EWD:EWD117pub}, n.d.

  44. [pvm-1] J. Dongarra, A. Geist, R. Manchek, and V. Sunderam. {Integrated PVM Framework Supports Heterogeneous Network Computing}. Computers in Physics, 7(2):166--75, April 1993.

  45. [Dongarra1987LinpackBenchmark] J. J. Dongarra. {\em The {LINPACK} benchmark: An explanation}, volume 297, chapter Supercomputing 1987, pages 456--474. Springer-Verlag, Berlin, 1988.

  46. [BLAS3] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, March 1990.

  47. [BLAS2] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of {FORTRAN} basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1--17, March 1988.

  48. [OReilly:sedawk] Dale Dougherty and Arnold Robbins. sed \& awk. O'Reilly Media, 2nd edition edition. Print ISBN: 978-1-56592-225-9 , ISBN 10:1-56592-225-5; Ebook ISBN: 978-1-4493-8700-6, ISBN 10:1-4493-8700-4.

  49. [DobbsComplex] {Dr. Dobbs}. Complex arithmetic: in the intersection of {C} and {C++}. \url{http://www.ddj.com/cpp/184401628}.

  50. [Duff:harwellboeingformat] I. S. Duff, R. G. Grimes, and J. G. Lewis. Users' guide for the {H}arwell-{B}oeing sparse matrix collection (release {I}). Technical Report RAL 92-086, Rutherford Appleton Laboratory, 1992.

  51. [Flame:PBMD-report] C. Edwards, P. Geng, A. Patra, and R. van de Geijn. Parallel matrix distributions: have we been doing it all wrong? Technical Report {TR}-95-40, {D}epartment of {C}omputer {S}ciences, {T}he {U}niversity of {T}exas at {A}ustin, 1995.

  52. [Eij:general] Victor Eijkhout. A general formulation for incomplete blockwise factorizations. Comm. Appl. Numer. Meth., 4:161--164, 1988.

  53. [Eijkhout2010ICCS-krylov] Victor Eijkhout, Paolo Bientinesi, and Robert van de Geijn. Towards mechanical derivation of {Krylov} solver libraries. Procedia Computer Science, 1(1):1805--1813, 2010. Proceedings of ICCS 2010, \url{http://www.sciencedirect.com/science/publication?issn=18770509&volume=1&issue=1}.

  54. [Erdos:randomgraph] Paul Erd\"os and A. Rényi. On the evolution of random graphs. {\em Publications of the Mathematical Institute of the Hungarian Academy of Sciences}, 5:17–--61, 1960.

  55. [FaberManteuffel:conditions-for-existence] V. Faber and T. Manteuffel. Necessary and sufficient conditions for the existence of a conjugate gradient method. SIAM J. Numer. Anal., 21:352--362, 1984.

  56. [Falgout:scalable-hypre] R.D. Falgout, J.E. Jones, and U.M. Yang. Pursuing scalability for hypre's conceptual interfaces. Technical Report UCRL-JRNL-205407, Lawrence Livermore National Lab, 2004. submitted to ACM Transactions on Mathematical Software.

  57. [Fiedler:75-property] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its applications to graph theory. Czechoslovak Mathematical Journal, 25:618--633, 1975.

  58. [Fisher:fastparallel] D. C. Fisher. Your favorite parallel algorithms might not be as fast as you think. IEEE Trans. Computers, 37:211--213, 1988.

  59. [flynn:taxonomy] M. Flynn. Some computer organizations and their effectiveness. IEEE Trans. Comput., C-21:948, 1972.

  60. [Fortress:homepage] Project fortress homepage. \url{http://projectfortress.sun.com/Projects/Community}.

  61. [frenkel-smit] D. Frenkel and B. Smit. Understanding molecular simulations: From algorithms to applications, 2nd edition. 2002.

  62. [FrNa:qmr] Roland W. Freund and No\"el M. Nachtigal. {QMR}: a quasi-minimal residual method for non-{H}ermitian linear systems. Numer. Math., 60:315--339, 1991.

  63. [Frigo:oblivious] M. Frigo, Charles E. Leiserson, H. Prokop, and S. Ramachandran. Cache oblivious algorithms. In {\em Proc. 40th Annual Symposium on Foundations of Computer Science}, 1999.

  64. [pvm-2] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. {\em {PVM}: A Users' Guide and Tutorial for Networked Parallel Computing}. MIT Press, 1994. The book is available electronically, the url is {\tt ftp://www.netlib.org/pvm3/book/pvm-book.ps}.

  65. [Gelernter85generativecommunication] David Gelernter. Generative communication in {Linda}. ACM Transactions on Programming Languages and Systems, 7:80--112, 1985.

  66. [Linda-CACM] David Gelernter and Nicholas Carriero. Coordination languages and their significance. Commun. ACM, 35(2):97--107, 1992.

  67. [gmplib] {GNU} multiple precision library. \url{http://gmplib.org/}.

  68. [Goedeker:performance-book] Stefan Goedecker and Adolfy Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, 2001.

  69. [Goldberg:arithmetic] D. Goldberg. Compuer arithmetic. Appendix in [HennessyPatterson] .

  70. [goldberg:floatingpoint] David Goldberg. What every computer scientist should know about floating-point arithmetic. Computing Surveys, March 1991.

  71. [GolubOleary:cg-history] G. H. Golub and D. P. O'Leary. Some history of the conjugate gradient and {L}anczos algorithms: 1948-1976. 31:50--102, 1989.

  72. [golo83] G. H. Golub and C. F. {Van Loan}. Matrix Computations. North Oxford Academic, Oxford, 1983.

  73. [GoVL:matcomp] Gene H. Golub and Charles F. {Van Loan}. Matrix Computations. The Johns Hopkins University Press, Baltimore, second edition edition, 1989.

  74. [GotoGeijn:2008:Anatomy] Kazushige Goto and Robert A. van de Geijn. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3):1--25, 2008.

  75. [Gray:graycodepatent] F. Gray. Pulse code communication. U.S. Patent 2,632,058, March 17, 1953 (filed Nov. 1947).

  76. [Greenberg89randomizedrouting] Ronald I. Greenberg and Charles E. Leiserson. Randomized routing on fat-trees. In Advances in Computing Research, pages 345--374. JAI Press, 1989.

  77. [Gropp:UsingMPI1] W. Gropp, E. Lusk, and A. Skjellum. {\em Using {MPI}}. {T}he {MIT} {P}ress, 1994.

  78. [mpi-2-reference] William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. {\em {MPI}: The Complete Reference, Volume 2 - The {MPI}-2 Extensions}. MIT Press, 1998.

  79. [Gropp:BeowulfBook] William Gropp, Thomas Sterling, and Ewing Lusk. Beowulf Cluster Computing with Linux, 2nd Edition. MIT Press, 2003.

  80. [Hadoop-wiki] Hadoop wiki. \url{http://wiki.apache.org/hadoop/FrontPage}.

  81. [HaYo:applied] Louis A. Hageman and David M. Young. Applied Iterative Methods. Academic Press, New York, 1981.

  82. [Hartstein:cache-sqrt] A. Hartstein, V. Srinivasan, T. R. Puzak, and P. G. Emma. Cache miss behavior: is it \&\#8730;2? In Proceedings of the 3rd conference on Computing frontiers, CF '06, pages 313--320, New York, NY, USA, 2006. ACM.

  83. [Heath:scicomp] Michael T. Heath. Scientific Computing: an introductory survey; second edition. McGraw Hill, 2002.

  84. [He:surveyparallel] Don Heller. A survey of parallel algorithms in numerical linear algebra. SIAM Review, 20:740--777, 1978.

  85. [HeWo:94] B. A. Hendrickson and D. E. Womble. The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM J. Sci. Comput., 15(5):1201--1226, 1994.

  86. [HennessyPatterson] John L. Hennessy and David A. Patterson. Computer Architecture, A Quantitative Approach. Morgan Kaufman Publishers, 3rd edition edition, 1990, 3rd edition 2003.

  87. [HestenesStiefel1952:cg] M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Nat. Bur. Stand. J. Res., 49:409--436, 1952.

  88. [Higham:2002:ASN] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, second edition, 2002.

  89. [BSPlib] Jonathan. Hill, Bill McColl, Dan C. Stefanescu, Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, and Rob H. Bisseling. {BSPlib}: The {BSP} programming library. Parallel Computing, 24(14):1947--1980, 1998.

  90. [Hoare1969axiomatic] C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, pages 576--580, October 1969.

  91. [Hoefler:2010:SCP] Torsten Hoefler, Christian Siebert, and Andrew Lumsdaine. Scalable communication protocols for dynamic sparse data exchange. SIGPLAN Not., 45(5):159--168, January 2010.

  92. [ieee754-webpage] {IEEE} 754: Standard for binary floating-point arithmetic. \url{http://grouper.ieee.org/groups/754}.

  93. [wikipedia:interval-arithmetic] Interval arithmetic. \url{http://en.wikipedia.org/wiki/Interval_(mathematics)}.

  94. [DAP:79a] C.R. Jesshope and R.W. {Hockney editors}. The {DAP} approach, volume 2. pages 311--329. Infotech Intl. Ltd., Maidenhead, 1979.

  95. [jopl94] M. T. Jones and P. E. Plassmann. The efficient parallel iterative solution of large sparse linear systems. In A. George, J.R. Gilbert, and J.W.H. Liu, editors, {\em Graph Theory and Sparse Matrix Computations}, IMA Vol 56. Springer Verlag, Berlin, 1994.

  96. [charmpp] L. V. Kale and S. Krishnan. Charm++: Parallel programming with message-driven objects. In {\em Parallel Programming using C++, G. V. Wilson and P. Lu, editors}, pages 175--213. MIT Press, 1996.

  97. [Karbo:book] Michael Karbo. {PC} architecture. \url{http://www.karbosguide.com/books/pcarchitecture/chapter00.htm}.

  98. [KarpZhang88] R.M. Karp and Y. Zhang. A randomized parallel branch-and-bound procedure. In {\em Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, 2-4 May 1988}, pages .290--300. ACM Press, 1988.

  99. [Katzenelson:nbody] J. Katzenelson. Computational structure of the n-body problem. SIAM Journal of Scientific and Statistical Computing, 10:787--815, July 1989.

  100. [KSRallcache] {Kendall Square Research}. \url{http://en.wikipedia.org/wiki/Kendall_Square_Research}.

  101. [Knuth:vol2] Donald Knuth. {\em The Art of Computer Programming, Volume 2: Seminumiercal algorithms}. Addison-Wesley, Reading MA, 3rd edition edition, 1998.

  102. [KopkaDaly] Helmut Kopka and Patrick W. Daly. {\em A Guide to {\LaTeX}}. Addison-Wesley, first published 1992.

  103. [Kulisch:2011:VFE] Ulrich Kulisch. Very fast and exact accumulation of products. Computing, 91(4):397--405, April 2011.

  104. [Kulish:dotproduct] Ulrich Kulisch and Van Snyder. The exact dot product as basic tool for long interval arithmetic. Computing, 91(3):307--313, 2011.

  105. [Kulkami:howmuch] Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali, and Calin Cascaval. How much parallelism is there in irregular applications? In Principles and Practices of Parallel Programming (PPoPP), 2009.

  106. [Kumar:parcomp-book] Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing. Benjamin Cummings, 1994.

  107. [Kung:pegasus2009] U. Kung, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. In Proc. Intl. Conf. Data Mining, pages 229--238, 2009.

  108. [Lamport:LaTeX] L. Lamport. {\em {\LaTeX}, a Document Preparation System}. Addison-Wesley, 1986.

  109. [Lanczos1952:solution_of_systems] C. Lanczos. Solution of systems of linear equations by minimized iterations. Journal of Research, Nat. Bu. Stand., 49:33--53, 1952.

  110. [Landau:comp-phys] Rubin H Landau, Manual Jos\'e P\'aez, and Cristian C. Bordeianu. A Survey of Computational Physics. Princeton University Press, 2008.

  111. [Langou:thesis] J. Langou. {\em Iterative methods for solving linear systems with multiple right-hand sides.} {P}h.{D}. dissertation, INSA Toulouse, June 2003. CERFACS TH/PA/03/24.

  112. [Langville2005eigenvector] Amy N. Langville and Carl D. Meyer. A survey of eigenvector methods for web information retrieval. SIAM Review, 47(1):135--161, 2005.

  113. [Lawson:blas] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw., 5(3):308--323, September 1979.

  114. [LEcuyer:multiple-random] P. L'Ecuyer. Combined multiple recursive generators. Operations Research, 44, 1996.

  115. [Leiserson:fattree] Charles E. Leiserson. Fat-{T}rees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput, C-34:892--901, 1985.

  116. [LinCohen:PIC] F. Lin and W.W. Cohen. Power iteration clustering. In {\em Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel}, 2010.

  117. [LiRoTa:dissection] Richard J. Lipton, Donald J. Rose, and Robert Endre Tarjan. Generalized nested dissection. SIAM J. Numer. Anal., 16:346--358, 1979.

  118. [LiTa:separator] Richard J. Lipton and Robert Endre Tarjan. A separator theorem for planar graphs. SIAM J. Appl. Math., 36:177--189, 1979.

  119. [Little:law] J.D.C. Little. A proof of the queueing formula {$L=\lambda W$}. Ope. Res., pages 383--387, 1961.

  120. [Liu:cudasw2009] Yongchao Liu, Douglas L. Maskell, and Bertil Schmidt. {CUDASW++}: optimizing {Smith-Waterman} sequence database searches for {CUDA}-enabled graphics processing units. BMC Res Notes, 2:73, 2009. PMID- 19416548.

  121. [Luby:parallel] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 4, 1986.

  122. [Pregel:podc2009] Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In {\em Proceedings of the 28th ACM symposium on Principles of distributed computing}, PODC '09, pages 6--6, New York, NY, USA, 2009. ACM.

  123. [Mascagni:SPRNG] M. Mascagni and A. Srinivasan. Algorithm 806: Sprng: A scalable library for pseudorandom number generation,. ACM Transactions on Mathematical Software, 26:436--461, 2000.

  124. [OReilly-GnuMake] Robert Mecklenburg. {\em Managing Projects with {GNU} {Make}}. O'Reilly Media, 3rd edition edition, 2004. Print ISBN:978-0-596-00610-5 ISBN 10:0-596-00610-1 Ebook ISBN:978-0-596-10445-0 ISBN 10:0-596-10445-6.

  125. [MevdVo:itsol] J.A. Meijerink and H.A. van der Vorst. An iterative solution method for linear systems of which the coefficient matrix is a symmetric {M}-matrix. Math Comp, 31:148--162, 1977.

  126. [Metropolis] N. {Metropolis}, A. W. {Rosenbluth}, M. N. {Rosenbluth}, A. H. {Teller}, and E. {Teller}. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21:1087--1092, June 1953.

  127. [Me:multicg] Gerard Meurant. Multitasking the conjugate gradient method on the {CRAY} {X-MP/48}. Parallel Computing, 5:267--280, 1987.

  128. [Me:dd] G\'erard Meurant. Domain decomposition methods for partial differential equations on parallel computers. Int. J. Supercomputing Appls., 2:5--12, 1988.

  129. [LaTeXcompanion] Frank Mittelbach, Michel Goossens, Johannes Braams, David Carlisle, and Chris Rowley. {\em The {\LaTeX} Companion, 2nd edition}. Addison-Wesley, 2004.

  130. [Moreland:formalmetrics2015] Kenneth Moreland and Ron Oldfield. Formal metrics for large-scale parallel performance. In Julian M. Kunkel and Thomas Ludwig, editors, {\em High Performance Computing}, volume 9137 of Lecture Notes in Computer Science, pages 488--496. Springer International Publishing, 2015.

  131. [NeedlemanWunsch] Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443 -- 453, 1970.

  132. [Oetiker:LaTeXintro] Tobi Oetiker. The not so short introductino to {\LaTeX}{}. \url{http://tobi.oetiker.ch/lshort/}.

  133. [matrix-market] National Institute of Standards and Technology. Matrix market. \url{http://math.nist.gov/MatrixMarket}.

  134. [OgAi:sparsestorage] Andrew T. Ogielski and William Aiello. Sparse matrix computations on parallel processor arrays. SIAM J. Sci. Stat. Comput. in press.

  135. [mpi-ref] S. Otto, J. Dongarra, S. Hess-Lederman, M. Snir, and D. Walker. Message Passing Interface: The Complete Reference. The {MIT} Press, 1995.

  136. [Overton:754book] Michael L. Overton. {\em Numerical Computing with {IEEE} Floating Point Arithmetic}. SIAM, Philadelphia PA, 2001.

  137. [PageBrin:PageRank] Larry Page, Sergey Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.

  138. [Pa:combinations] V. Ya. Pan. New combination sof methods for the acceleration of matrix multiplication. Comp. \& Maths. with Appls., 7:73--125, 1981.

  139. [papi-homepage] Performance application programming interface. \url{http://icl.cs.utk.edu/papi/}.

  140. [plimpton] S. Plimpton. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys., 117:1--19, 1995.

  141. [Elemental:TOMS] Jack Poulson, Bryan Marker, Jeff R. Hammond, and Robert van de Geijn. Elemental: a new framework for distributed memory dense matrix computations. ACM Transactions on Mathematical Software. submitted.

  142. [spiral] M. P{\"u}schel, B. Singer, J. Xiong, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. {SPIRAL}: A generator for platform-adapted libraries of signal processing algorithms. Int'l Journal of High Performance Computing Applications, 18(1):21--45, 2004.

  143. [tacc:ranger] {Texas Advanced Computing Center: Sun Constellation Cluster: Ranger}. \url{http://www.tacc.utexas.edu/resources/hpc/constellation}.

  144. [Rao:1978:cache] Gururaj S. Rao. Performance analysis of cache memories. J. ACM, 25:378--395, July 1978.

  145. [Reid1971:cg] J.K. Reid. On the method of conjugate gradients for the solution of large sparse systems of linear equations. In J.K. Reid, editor, Large sparse sets of linear equations, pages 231--254. Academic Press, London, 1971.

  146. [saad96] Y. Saad. Iterative methods for sparse linear systems. PWS Publishing Company, Boston, 1996.

  147. [Sato2004] Tetsuya Sato. The earth simulator: Roles and impacts. Nuclear Physics B - Proceedings Supplements, 129-130:102 -- 108, 2004. Lattice 2003.

  148. [canfar-lecture] David Schade. Canfar: Integrating cyberinfrastructure for astronomy. \newblock \url{https://wiki.bc.net/atl-conf/display/BCNETPUBLIC/CANFAR+-+Integrating+Cyberinfrastructure+for+Astronomy}.

  149. [Schreiber:scalability92] R. Schreiber. Scalability of sparse direct solvers. In A. George, J.R. Gilbert, and J.W.-H. Liu, editors, {\em Sparse Matrix Computations: Graph Theory Issues and Algorithms (An IMA Workshop Volume)}. Springer-Verlag, New York, 1993, 1993. also: Technical Report RIACS TR 92.13, NASA Ames Research Center, Moffet Field, Calif., May 1992.

  150. [shaw] D. E. Shaw. A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions. J. Comput. Chem., 26:1318--1328, 2005.

  151. [TAU:ijhpca] S. Shende and A. D. Malony. {\em International Journal of High Performance Computing Applications}, 20:287--331, 2006.

  152. [Skillicorn96questionsand] D. B. Skillicorn, Jonathan M. D. Hill, and W. F. McColl. Questions and answers about {BSP}, 1996.

  153. [mpi-reference] Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference, Volume 1, The MPI-1 Core. MIT Press, second edition edition, 1998.

  154. [Spielman:spectral-graph-theory] Dan Spielman. Spectral graph theory, fall 2009. \url{http://www.cs.yale.edu/homes/spielman/561/}.

  155. [Stewart90] {G. W.} Stewart. Communication and matrix computations on large message passing systems. Parallel Computing, 16:27--40, 1990.

  156. [St:gaussnotoptimal] V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354--356, 1969.

  157. [UKTeXFAQ] {\TeX} frequently asked questions.

  158. [UPC:homepage] {Universal Parallel C at George Washingon University}. \url{http://upc.gwu.edu/}.

  159. [Valiant:1990:BSP] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33:103--111, August 1990.

  160. [PLAPACK:UG] R. van de Geijn, Philip Alpatov, Greg Baker, Almadena Chtchelkanova, Joe Eaton, Carter Edwards, Murthy Guddati, John Gunnels, Sam Guyer, Ken Klimkowski, Calvin Lin, Greg Morrow, Peter Nagel, James Overfelt, and Michelle Pal. Parallel linear algebra package ({PLAPACK}): Release r0.1 (beta) users' guide. 1996.

  161. [PLAPACK] Robert A. van de Geijn. {\em Using {PLAPACK}: Parallel Linear Algebra Package}. The MIT Press, 1997.

  162. [TSoPMC] Robert A. van de Geijn and Enrique S. Quintana-Ort\'{\i}. The Science of Programming Matrix Computations. {\tt www.lulu.com}, 2008.

  163. [vdVorst1992:bicgstab] Henk {van der Vorst}. {Bi-CGSTAB}: a fast and smoothly converging variant of {Bi-CG} for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 13:631--644, 1992.

  164. [Varga:iterative-analysis] Richard S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ, 1962.

  165. [oski] R. Vuduc, J. Demmel, and K. Yelikk. Oski: A library of automatically tuned sparse matrix kernels. In {\em (Proceedings of SciDAC 2005, Journal of Physics: Conference Series, to appear.}, 2005.

  166. [vuduc:thesis] Richard W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California Berkeley, 2003.

  167. [atlas-parcomp] R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automated empirical optimization of software and the {ATLAS} project. Parallel Computing, 27(1--2):3--35, 2001. Also available as University of Tennessee LAPACK Working Note \#147, UT-CS-00-448, 2000 ({\tt www.netlib.org/lapack/lawns/lawn147.ps}).

  168. [Wi:fastseparable] O. Widlund. On the use of fast methods for separable finite difference equations for the solution of general elliptic problems. In D.J. Rose and R.A. Willoughby, editors, {\em Sparse matrices and their applications}, pages 121--134. Plenum Press, New York, 1972.

  169. [Wilkinson:roundoff] J.H. Wilkinson. Rounding Errors in Algebraic Processes. Prentice-Hall, Englewood Cliffs, N.J., 1963.

  170. [Williams:2009:roofline] Samuel Williams, Andrew Waterman, and David Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52:65--76, April 2009.

  171. [Wulf:memory-wall] Wm. A. Wulf and Sally A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, March 1995.

  172. [YandBrent:bicgstab] L. T. Yand and R. Brent. The improved bicgstab method for large and sparse unsymmetric linear systems on parallel distributed memory architectures. In {\em Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing}. IEEE, 2002.

  173. [YatesFixedPoint] Randy Yates. Fixed point: an introduction. \url{http://www.digitalsignallabs.com/fp.pdf}, 2007.

  174. [Yoo:2005:scalable-bfs] Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. A scalable distributed parallel breadth-first search algorithm on {BlueGene/L}. In {\em Proceedings of the 2005 ACM/IEEE conference on Supercomputing}, SC '05, pages 25--, Washington, DC, USA, 2005. IEEE Computer Society.

  175. [Young:thesis] David M. Young. {\em Iterative method for solving partial differential equations of elliptic type}. PhD thesis, Harvard Univ., Cambridge, MA, 1950.