Publications

Note: The papers below were supported in part by the HokieSpeed resource (and related accelerator-based resources).

Sort by: Publication type   Area   Sub Area   Date (papers only)   First author (papers only)  
Jump to: Networking   Renaissance   Systems  

  • Renaissance


  • Parallel Programming with Pictures is a Snap!.
    Annette Feng, Mark Gardner, Wu-chun Feng.
    In Journal of Parallel and Distributed Computing, 105 150-162, January 2017.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • pDindel: Accelerating InDel Detection on a Multicore CPU Architecture with SIMD.
    Da Zhang, Hao Wang, Kaixi Hou, Jing Zhang, Wu-chun Feng.
    In Proceedings of the 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Miami, FL, USA, October 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CentroidBLAST: Accelerating Sequence Search via Clustering.
    Wu-chun Feng, Konstantinos Krommydas, Liqing Zhang.
    In Proceedings of the 7th International Conference on Bioinformatics and Computational Biology, Honolulu, Hawaii, March 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows.
    Nabeel Mohamed, Nabanita Maji, Jing Zhang, Nataliya Timoshevskaya, Wu-chun Feng.
    In Proceedings of the 3rd IEEE International Conference on Big Data Science and Engineering (BDSE), Beijing, China, September 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • SDAFT: A Novel Scalable Data Access Framework for Parallel BLAST.
    Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng.
    In Parallel Computing, 40 (10): 697–709, August 2014.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU.
    Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng.
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Phoenix, Arizona, USA, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU.
    Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng.
    In GPU Technology Conference (GTC), San Jose, CA, USA, March 2014.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures.
    Jing Zhang, Heshan Lin, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Delft, Netherlands, May 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Protein Sequence Search in a Heterogeneous Computing System.
    Shucai Xiao, Heshan Lin, Wu-chun Feng.
    In Proceedings of the 25th International Parallel and Distributed Processing Symposium, Anchorage, Alaska, USA, May 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards Accelerating Molecular Modeling via Multi-Scale Approximation on a GPU.
    Mayank Daga, Wu-chun Feng, Thomas Scogland.
    In Proceedings of the 1st IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Orlando, Florida, USA, February 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • When High-Performance Computing Meets Bioinformatics.
    Wu-chun Feng.
    In Wake Forest University – School of Medicine, Winston Salem, NC, June 2010.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards Chip-on-Chip Neuroscience: Fast Mining of Neuronal Spike Streams Using Graphics Hardware.
    Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, Naren Ramakrishnan.
    In Proceedings of the 7th ACM International Conference on Computing Frontiers, Bertinoro, Italy, pp. 1--10, May 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards Chip-on-Chip Neuroscience: Fast Mining of Neuronal Spike Streams Using Graphics Hardware.
    Yong Cao, D Patnaik, Sean Ponce, Wu-chun Feng, Naren Ramakrishnan.
    In 7th ACM International Conference on Computing Frontiers, ACM, New York, NY, USA, May 2010.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Electrostatic Surface Potential Calculation with Multi-Scale Approximation on Graphics Processing Units.
    Ramu Anandakrishnan, Tom R.W. Scogland, Andrew T. Fenley, John C. Gordon, Wu-chun Feng, Alexey V. Onufriev.
    In Journal of Molecular Graphics and Modelling, 28 904-910, April 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Missing Genes in the Annotation of Prokaryotic Genomes.
    Andrew S. Warren, Jeremy Archuleta, Wu-chun Feng, Joao Carlos Setubal.
    In BMC Bioinformatics 2010, 11 (131): 1-12, March 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture.
    Heshan Lin, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma, Wu-chun Feng.
    In Proceedings of the ACM/IEEE SC|08: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Austin, Texas, November 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Optimizing Performance, Cost, and Sensitivity in Pairwise Sequence Search on a Cluster of PlayStations.
    Ashwin M. Aji, Wu-chun Feng.
    In Proceedings of the IEEE International Conference on BioInformatics and BioEngineering, Athens, Greece, October 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures.
    Ashwin Aji.
    Virginia Tech, August 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Maintainable Software Architecture for Fast and Modular Bioinformatics Sequence Search.
    Jeremy S. Archuleta, Eli Tilevich, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Conference on Software Maintenance, Paris, France, October 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Pluggable Framework for Parallel Pairwise Sequence Search.
    Jeremy S. Archuleta, Wu-chun Feng, Eli Tilevich.
    In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, August 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Genomic Sequence-Search on a Massively Parallel System.
    Oystein Thorsen, Brian Smith, Carlos P. Sosa, Karl Jiang, Heshan Lin, Amanda Peters, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, May 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications.
    Mark K. Gardner, Wu-chun Feng, Jeremy S. Archuleta, Heshan Lin, Xiaosong Ma.
    In Proceedings of the ACM/IEEE SC|06: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Tampa, FL, November 2006.
    Best Paper Nominee
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Systems


  • new  A Framework for Fast and Fair Evaluation of Automata Processing Hardware.
    Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng.
    In IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, October 2017.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • new  AutoMatch: An Automated Framework for Relative Performance Estimation and Workload Distribution on Heterogeneous HPC Systems.
    Ahmed E. Helal, Wu-chun Feng, Changhee Jung, Yasser Y. Hanafy.
    In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, October 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Fast Segmented Sort on GPUs.
    Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng.
    In Proceedings of the International Conference on Supercomputing, Chicago, IL, June 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Demystifying Automata Processing: GPUs, FPGAs or Micron’s AP?.
    Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi.
    In Proceedings of the International Conference on Supercomputing, Chicago, IL, June 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs.
    Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, Guohua Cao.
    In Proceedings of the ACM Computing Frontiers, Siena, Italy, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • PaPar: A Parallel Data Partitioning Framework for Big Data Applications.
    Hao Wang, Jing Zhang, Da Zhang, Sarunya Pumma, Wu-chun Feng.
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Directive-Based Partitioning and Pipelining for Graphics Processing Units.
    Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-chun Feng.
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors.
    Kaixi Hou, Wu-chun Feng, Shuai Che.
    In Proceedings of the 7th International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the Proceedings of the ACM Computing Frontiers, Siena, Italy, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating 3D-Structured Grid on FPGA via OpenCL : A Case Study with OpenDwarfs.
    Anshuman Verma, Wu-chun Feng.
    In International Symposium on Code Generation and Optimization (CGO), Austin, TX, February 2017.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Telescoping Architectures: Evaluating Next-Generation Heterogeneous Computing.
    Konstantinos Krommydas, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Conference on High Performance Computing, Hyderabad, India, December 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters.
    Ahmed E. Helal, Paul Sathre, Wu-chun Feng.
    In Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Salt Lake City, Utah, USA, November 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterizing Performance and Power Towards Efficient Synchronization of GPU Kernels.
    Islam Harb, Wu-chun Feng.
    In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, England, September 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Transposition of Sparse Data Structures.
    Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng.
    In Proceedings of the 30th International Conference on Supercomputing (ICS), Istanbul, Turkey, June 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, USA, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Online Power Estimation of Graphics Processing Units.
    Vignesh Adhinarayanan, Balaji Subramaniam, Wu-chun Feng.
    In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs.
    Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, Guohua Cao.
    In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • An Automated Framework for Characterizing and Subsetting GPGPU Workloads.
    Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Uppsala, Sweden, April 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CoreTSAR: Core Task-Size Adapting Runtime.
    Thomas R. W. Scogland, Wu-chun Feng, Barry Rountree, Bronis R. de Supinski.
    In IEEE Transactions on Parallel and Distributed Systems, 26 (11): 2970-2983, November 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Fast Detection of Transformed Data Leaks.
    Xiaokui Shu, Jing Zhang, Danfeng (Daphne) Yao, Wu-chun Feng.
    In IEEE Transactions on Information Forensics and Security (TIFS), PP (99): November 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Block-Based Programming Abstractions for Explicit Parallel Computing.
    Annette Feng, Eli Tilevich, Wu-chun Feng.
    In Proceedings of the Blocks and Beyond: Lessons and Directions for First Programming Environments, Atlanta, GA, USA, October 2015. A VL/HCC 2015 Workshop.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures.
    Konstantinos Krommydas, Wu-chun Feng, Christos D. Antonopoulos, Nikolaos Bellas.
    In Journal of Signal Processing Systems, 1--20, October 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.
    Jing Zhang, Hao Wang, Wu-chun Feng.
    In IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), PP (99): October 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • GLAF: A Visual Programming and Auto-Tuning Framework for Parallel Computing.
    Konstantinos Krommydas, Ruchira Sasanka, Wu-chun Feng.
    In Proceedings of the International Conference on Parallel Processing, Beijing, China, September 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.
    Ashwin M. Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the IEEE Cluster, Chicago, Illinois, September 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MPI-ACC: Accelerator-Aware MPI for Scientific Applications.
    Ashwin M. Aji, Lokendra Panwar, Feng Ji, Karthik Murthy, Milind Chabbi, Pavan Balaji, Keith Bisset, James Dinan, Wuchun Feng, John Mellor-Crummey, Xiaosong Ma, Rajeev Thakur.
    In IEEE Transactions on Parallel and Distributed Systems, PP (99): June 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the 29th ACM International Conference on Supercomputing, Newport Beach, California, USA, June 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Directive-Based GPU Programming for Computational Fluid Dynamics.
    Brent P. Pickering, Charles W. Jackson, Thomas R.W. Scogland, Wu-Chun Feng, Christopher Roy.
    In Computers and Fluids, 114 242-253, March 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing.
    Xiaokui Shu, Jing Zhang, Danfeng (Daphne) Yao, Wu-chun Feng.
    In ACM Conference on Data and Application Security and Privacy (CODASPY), San Antonio, TX, USA, March 2015.
    Best Poster Award
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures.
    Thomas R. W. Scogland, Wu-chun Feng.
    In Proceedings of the International Conference on Performance Engineering (ICPE), Austin, TX, USA, January 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MetaMorph: A Modular Library for Democratizing the Acceleration of Parallel Computing across Heterogeneous Devices.
    Paul Sathre, Wu-chun Feng.
    In ACM/IEEE International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC|14), New Orleans, LA, November 2014.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the 7th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Minneapolis, Minnesota, September 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP.
    Thomas R. W. Scogland, Wu-chun Feng.
    In 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT), Alberta, Canada, August 2014.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • CoreTSAR: Adaptive Worksharing for Heterogeneous Systems.
    Thomas R. W. Scogland, Wu-chun Feng, Barry Rountree, Bronis R. de Supinski.
    In Proceedings of the International Supercomputing Conference, Leipzig, Germany, June 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms.
    Konstantinos Krommydas, Wu-chun Feng, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas.
    In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Zurich, Switzerland, June 2014.
    Best Paper Finalist
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards a Performance-Portable FFT Library for Heterogeneous Computing.
    Carlo del Mundo, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers (CF), Cagliari, Italy, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems.
    James E. McClure, Hao Wang, Jan F. Prins, Cass T. Miller, Wu-chun Feng.
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Phoenix, Arizona, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Power-Measurement Methodology for Large-Scale, High-Performance Computing.
    Thomas Scogland, Craig Steffen, Torsten Wilde, Florent Parent, Susan Coghlan, Natalie Bates, Wu-chun Feng, Erich Strohmaier.
    In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE), Dublin, Ireland, March 2014.
    Nominated for Best Industrial Paper Award
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors.
    Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Online Performance Projection for Clusters with Heterogeneous GPUs.
    Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Programmability and Performance of Heterogeneous Platforms.
    Konstantinos Krommydas, Thomas R.W. Scogland, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator.
    Mark Gardner, Paul Sathre, Wu-chun Feng, Gabriel Martinez.
    In Parallel Computing, 39 (12): 769-786, December 2013.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms.
    Konstantinos Krommydas, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas, Wu-chun Feng.
    In 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems.
    Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng.
    In Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), Zhangjiajie, China, November 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Enabling Efficient Intra-Warp Communication for Fourier Transforms in a Many-Core Architecture.
    Carlo del Mundo, Wu-chun Feng.
    In ACM/IEEE International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC|13), Denver, CO, November 2013.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.
    Palden Lama, Yan Li, Ashwin M. Aji, Pavan Balaji, James Dinan, Shucai Xiao, Yunquan Zhang, Wu-chun Feng, Rajeev Thakur, Xiaobo Zhou.
    In Proceedings of the 33rd International Conference on Distributed Computing Systems, Philadelphia, USA, July 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Fast Fourier Transform for Wideband Channelization.
    Carlo del Mundo, Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the International Conference on Communications (ICC), Budapest, Hungary, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Synchronization and Ordering Semantics in Hybrid MPI GPU Programming.
    Ashwin M. Aji, Pavan Balaji, James Dinan, Wu-chun, Feng, Rajeev Thakur.
    In Proceedings of the 3rd International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Boston, USA, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Efficacy of GPU-Integrated MPI for Scientific Applications.
    Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John Mellor-Crummy, Xiaosong Ma, Rajeev Thakur.
    In Proceedings of the ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), New York, USA, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A MapReduce Framework for Heterogeneous Computing Architectures.
    Marwa Elteir.
    Virginia Tech, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Complexity of Robust Source-to-Source Translation from CUDA to OpenCL.
    Paul D. Sathre.
    Virginia Tech, April 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterization and Exploitation of GPU Memory Systems.
    Kenneth Lee.
    Virginia Tech, October 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation.
    Paul Sathre, Mark Gardner, Wu-chun Feng.
    In Proceedings of the 5th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Pittsburgh, PA, September 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Forget about the Clouds, Shoot for the MOON.
    Wu-chun Feng.
    In BioIT World Cloud Computing Summit, San Francisco, CA, September 2012.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • CU2CL: An Automated CUDA-to-OpenCL Source-to-Source Translator.
    Wu-chun Feng.
    In AMD Fusion Developer Summit, Bellevue, WA, June 2012.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • DMA-Assisted, Intranode Communication in GPU Accelerated Systems.
    Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Rajeev Thakur, Wu-chun Feng, Xiaosong Ma.
    In Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems.
    Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset, Rajeev Thakur.
    In Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Performance Characterization of Data-Intensive Kernels on AMD Fusion Architectures.
    Kenneth Lee, Heshan Lin, Wu-chun Feng.
    In Proceedings of the International Supercomputing Conference (ISC), Hamburg, Germany, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Efficient Intranode Communication in GPU-Accelerated Systems.
    Feng Ji, Ashwin Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Xiaosong Ma.
    In Proceedings of the 2nd IEEE International Workshop on Accelerators and Hybrid Exascale Systems (in conjunction with the 26th IEEE International Parallel and Distributed Processing Symposium), Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Heterogeneous Task Scheduling for Accelerated OpenMP.
    Thomas R. W. Scogland, Barry Rountree, Wu-chun Feng, Bronis R. de Supinski.
    In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • An Ecosystem for the New HPC: Heterogeneous Parallel Computing.
    Wu-chun Feng.
    In 2nd International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Shanghai, China, May 2012.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • Transparent Accelerator Migration in a Virtualized GPU Environment.
    Shucai Xiao, Pavan Balaji, James Dinan, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, Wu-chun Feng.
    In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems.
    Shucai Xiao, Wu-chun Feng.
    In Proceedings of the PhD Forum at the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units.
    Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, Wu-chun Feng.
    In Proceedings of the IEEE Innovative Parallel Computing (InPar2012), San Jose, CA, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • OpenCL and the 13 Dwarfs: A Work In Progress.
    Wu-chun Feng, Heshan Lin, Tom Scogland, Jing Zhang.
    In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE), Boston, MA, April 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • NUMA Data-Access Bandwidth Characterization and Modeling.
    Ryan Braithwaite.
    Virginia Tech, January 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures.
    Gabriel Martinez, Mark Gardner, Wu-chun Feng.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Architecture-Aware Mapping and Optimization on a 1600-Core GPU.
    Mayank Daga, Tom Scogland, Wu-chun Feng.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • StreamMR: An Optimized MapReduce Framework for AMD GPUs.
    Marwa Elteir, Heshan Lin, Wu-chun Feng, Tom Scogland.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterizing the Impact of Memory-Access Techniques on AMD Fusion.
    Kenneth Lee, Heshan Lin, Wu-chun Feng.
    In ACM/IEEE SC|11: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Seattle, Washington, USA, November 2011.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • Spectral Method Characterization on FPGA and GPU Accelerators.
    Karl Pereira, Peter Athanas, Heshan Lin, Wu-chun Feng.
    In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, November 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • How to Run Your CUDA Program Anywhere.
    Wu-chun Feng.
    In NVIDIA Theater, ACM/IEEE International Conference on High- Performance Computing, Networking, Storage, and Analysis (SC), Seattle, WA, November 2011.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • An Ecosystem for Heterogeneous Parallel Computing.
    Wu-chun Feng.
    In AMD Research, Austin, TX, September 2011.
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Three P's of Heterogeneous Computing with Accelerators.
    Wu-chun Feng.
    In Workshop on Parallel Programming on Accelerator Clusters (PPAC) at IEEE Cluster, September 2011. Keynote Talk
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • Performance Characterization and Optimization of Atomic Operations on AMD GPUs.
    Marwa Elteir, Heshan Lin, Wu-chun Feng.
    In Proceedings of the IEEE Cluster 2011, Austin, TX, USA, September 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Efficacy of a Fused CPU+GPU Processor for Parallel Computing.
    Mayank Daga, Ashwin Aji, Wu-chun Feng.
    In Proceedings of the Symposium on Application Accelerators in High-Performance Computing, Knoxville, Tennessee, USA, July 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures.
    Gabriel Martinez.
    Virginia Tech, July 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • OpenCL and the 13 Dwarfs.
    Wu-chun Feng.
    In AMD Fusion Developer Summit, Bellvue, WA, June 2011. Invited Talk
      Paper    Presentation (PDF)               Citations:  [ BibTeX    XML    PlainText ]   
  • Bounding the Effect of Partition Camping in GPU Kernels.
    Ashwin M. Aji, Mayank Daga, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, May 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems.
    Mayank Daga.
    Virginia Tech, April 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Temporal Data Mining for Neuroscience.
    Wu-Chun Feng, Yong Cao, Debprakash Patnaik, Naren Ramakrishnan.
    In GPU Computing Gems, February 2011. Emerald Edition
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors.
    Ashwin M. Aji, Liqing Zhang, Wu-chun Feng.
    In Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, Hong Kong, China, December 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Molecular Modeling using GPUs.
    Mayank Daga, Wu-chun Feng.
    In GPU Technology Conference, San Jose, California, September 2010.
      Poster               Citations:  [ BibTeX    XML    PlainText ]   
  • To GPU Synchronize or Not GPU Synchronize?.
    Wu-chun Feng, Shucai Xiao.
    In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, May 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Coordinating Computation and I/O in Massively Parallel Sequence Search.
    Heshan Lin, Xiaosong Ma, Wu-chun Feng, Nagiza Samatova.
    In IEEE Transactions on Parallel and Distributed Systems, PP (99): 1-14, May 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Inter-Block GPU Communication via Fast Barrier Synchronization.
    Shucai Xiao, Wu-chun Feng.
    In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Atlanta, Georgia, USA, April 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit.
    Shucai Xiao, Ashwin M. Aji, Wu-chun Feng.
    In Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, China, December 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Tools and Environments for Multi- and Many-Core Architectures.
    Wu-chun Feng, Pavan Balaji.
    In IEEE Computer, 42 (12): 26-27, December 2009.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Energy Efficiency of Graphics Processing Units for Scientific Computing.
    Song Huang, Shucai Xiao, Wu-chun Feng.
    In Proceedings of the 5th IEEE Workshop on High-Performance, Power-Aware Computing (in conjunction with the 23rd International Parallel and Distributed Processing Symposium (IPDPS)), Rome, Italy, 3801-3804, June 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Multi-Dimensional Characterization of Temporal Data Mining on Graphics Processors.
    Jeremy Archuleta, Yong Cao, Tom Scogland, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rome, Italy, May 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Asymmetric Interactions in Symmetric Multi-core Systems: Analysis, Enhancements and Evaluation.
    Thomas Scogland, Pavan Balaji, Wu-chun Feng, Ganesh Narayanaswamy.
    In Proceedings of the ACM/IEEE SC|08: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Austin, Texas, USA, November 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer.
    P. Balaji, W. Feng, H. Lin, J. Archuleta, S. Matsuoka, A. Warren, J. Setubal, E. Lusk, R. Thakur, I. Foster, D. S. Katz, S. Jha, K. Shinpaugh, S. Coghlan, D. Reed.
    In Proceedings of the International Supercomputing Conference, Dresden, Germany, June 2008. Distinguished Paper Award
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine.
    Ashwin M. Aji, Wu-chun Feng, Filip Blagojevic, Dimitrios S. Nikolopoulos.
    In Proceedings of the 5th ACM International Conference on Computing Frontiers, Ischia, Italy, May 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • High-Performance Computing Using Accelerators.
    Wu-chun Feng, Dinesh Manocha.
    In Parallel Computing, 33 (10-11): 645-647, November 2007.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]