Publications

Note: The papers below were supported in part by the HokieSpeed resource (and related accelerator-based resources).

Sort by: Publication type   Area   Sub Area   Date (papers only)   First author (papers only)  
Jump to: 2023   2022   2021   2020   2019   2018   2017   2016   2015   2014   2013   2012   2011   2010   2009   2008   2007   2006   2005   2004   2003   2002   2001   2000   1999   1998   1997  

  • 2021


  • IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units.
    Xuewen Cui, Wu-chun Feng.
    In Journal of Signal Processing Systems, 93 (1): 391-403, April 2021.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • 2020


  • A Feasibility Study for MPI over HDFS.
    Wu-chun Feng, Da Zhang, Jing Zhang, Kaixi Hou, Sarunya Pumma, Hao Wang.
    In Proceedings of the 24th IEEE High-Performance Extreme Computing Conference (HPEC), Waltham, MA, September 2020.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Exploring FPGA Optimizations in OpenCL for Breadth-First Search on Sparse Graph Datasets.
    Atharva Gondhalekar, Wu-chun Feng.
    In Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, Gothenburg, Sweden, September 2020.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark.
    Karim Youssef, Wu-chun Feng.
    In Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Melbourne, Victoria, Australia, May 2020.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2019


  • Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs.
    Ahmed E. Helal, Ashwin M. Aji, Michael L. Chu, Bradford M. Beckmann, Wu-chun Feng.
    In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, Seattle, WA, September 2019.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Iterative Machine Learning (IterML) for Effective Parameter Pruning and Tuning in Accelerators.
    Xuewen Cui, Wu-chun Feng.
    In Proceedings of the 16th ACM International Conference on Computing Frontiers, Alghero, Sardinia, Italy, April 2019.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Portability of GPU-Accelerated Applications via Automated Source-to-Source Translation.
    Paul Sathre, Mark Gardner, Wu-chun Feng.
    In Proceedings of the HPC Asia: International Conference on High Performance Computing in Asia-Pacific Region, Guangzhou, China, January 2019.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2018


  • A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation.
    Paul Sathre, Ahmed E. Helal, Wu-chun Feng.
    In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, December 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Exploring FPGA-specific Optimizations for Irregular OpenCL Applications.
    Mohamed W. Hassan, Ahmed E. Helal, Peter M. Athanas, Wu-chun Feng, Yasser Y. Hanafy.
    In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, December 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CommAnalyzer: Automated Estimation of Communication Cost and Scalability on HPC Clusters from Sequential Code.
    Ahmed E. Helal, Changhee Jung, Wu-chun Feng, Yasser Y. Hanafy.
    In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC 2018), Tempe, Arizona, USA, June 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Highly Efficient Compensation-based Parallelism for Wavefront Loops on GPUs.
    Kaixi Hou, Hao Wang, Wu-chun Feng, Jeffrey S. Vetter, Seyong Lee.
    In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, May 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Taming Irregular Applications via Advanced Dynamic Parallelism on GPUs.
    Jing Zhang, Ashwin M. Aji, Michael L. Chu, Hao Wang, Wu-chun Feng.
    In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF), Ischia, Italy, May 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • GPU Power Prediction via Ensemble Machine Learning for DVFS Space Exploration.
    Bishwajit Dutta, Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF), Ischia, Italy, May 2018.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Framework for the Automatic Vectorization of Parallel Sort on x86-based Processors.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In IEEE Transactions on Parallel and Distributed Systems, 29 (5): 958-972, May 2018.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • GPU-Based Iterative Medical CT Image Reconstructions.
    Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, Guohua Cao.
    In Journal of Signal Processing Systems, 91 (3-4): 321--338, March 2018.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • 2017


  • Robotomata: A Framework for Approximate Pattern Matching of Big Data on an Automata Processor.
    Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the IEEE International Conference on Big Data, Boston, MA, December 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Portable Parallel Design of Weighted Multi-Dimensional Scaling for Real-Time Data Analysis.
    Sajal Dash, Anshuman Verma, Chris North, Wu-chun Feng.
    In Proceedings of the IEEE International Conference on High Performance Computing and Communications (HPCC), Bangkok, Thailand, December 2017.
    Best Paper Finalist
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • AutoMatch: An Automated Framework for Relative Performance Estimation and Workload Distribution on Heterogeneous HPC Systems.
    Ahmed E. Helal, Wu-chun Feng, Changhee Jung, Yasser Y. Hanafy.
    In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, October 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Demystifying Automata Processing: GPUs, FPGAs or Micron’s AP?.
    Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi.
    In Proceedings of the 31st ACM International Conference on Supercomputing, Chicago, IL, June 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Fast Segmented Sort on GPUs.
    Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng.
    In Proceedings of the 31st ACM International Conference on Supercomputing, Chicago, IL, June 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors.
    Kaixi Hou, Wu-chun Feng, Shuai Che.
    In Proceedings of the 7th International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Directive-Based Partitioning and Pipelining for Graphics Processing Units.
    Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-chun Feng.
    In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs.
    Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, Guohua Cao.
    In Proceedings of the ACM Computing Frontiers, Siena, Italy, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the Proceedings of the ACM Computing Frontiers, Siena, Italy, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • PaPar: A Parallel Data Partitioning Framework for Big Data Applications.
    Hao Wang, Jing Zhang, Da Zhang, Sarunya Pumma, Wu-chun Feng.
    In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, Florida, May 2017.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Programming with Pictures is a Snap!.
    Annette Feng, Mark Gardner, Wu-chun Feng.
    In Journal of Parallel and Distributed Computing, 105 150-162, January 2017.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • 2016


  • Telescoping Architectures: Evaluating Next-Generation Heterogeneous Computing.
    Konstantinos Krommydas, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Conference on High Performance Computing, Hyderabad, India, December 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters.
    Ahmed E. Helal, Paul Sathre, Wu-chun Feng.
    In Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Salt Lake City, Utah, USA, November 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics.
    Ramu Anandakrishnan, Mayank Daga, Alexey Onufriev, Wu-chun Feng.
    In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Seattle, WA, USA, October 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterizing Performance and Power Towards Efficient Synchronization of GPU Kernels.
    Islam Harb, Wu-chun Feng.
    In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, England, September 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Bridging the FPGA Programmability-Portability Gap via Automatic OpenCL Code Generation and Tuning.
    Konstantinos Krommydas, Ruchira Sasanka, Wu-chun Feng.
    In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), London, England, July 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Transposition of Sparse Data Structures.
    Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng.
    In Proceedings of the 30th International Conference on Supercomputing (ICS), Istanbul, Turkey, June 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Online Power Estimation of Graphics Processing Units.
    Vignesh Adhinarayanan, Balaji Subramaniam, Wu-chun Feng.
    In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs.
    Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, Guohua Cao.
    In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, USA, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs. (Poster Paper, see "Posters" section for poster).
    Konstantinos Krommydas, Ahmed Helal, Anshuman Verma, Wu-chun Feng.
    In Proceedings of the 24th IEEE International Symposium on Field-Programmable Custom Computing Machines, Washington, DC, May 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • An Automated Framework for Characterizing and Subsetting GPGPU Workloads.
    Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Uppsala, Sweden, April 2016.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2015


  • CoreTSAR: Core Task-Size Adapting Runtime.
    Thomas R. W. Scogland, Wu-chun Feng, Barry Rountree, Bronis R. de Supinski.
    In IEEE Transactions on Parallel and Distributed Systems, 26 (11): 2970-2983, November 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Fast Detection of Transformed Data Leaks.
    Xiaokui Shu, Jing Zhang, Danfeng (Daphne) Yao, Wu-chun Feng.
    In IEEE Transactions on Information Forensics and Security (TIFS), PP (99): November 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Block-Based Programming Abstractions for Explicit Parallel Computing.
    Annette Feng, Eli Tilevich, Wu-chun Feng.
    In Proceedings of the Blocks and Beyond: Lessons and Directions for First Programming Environments, Atlanta, GA, USA, October 2015. A VL/HCC 2015 Workshop.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • OpenDwarfs: On the Characterization of Computation and Communication Patterns on Fixed and Reconfigurable Architectures.
    Konstantinos Krommydas, Wu-chun Feng, Christos D. Antonopoulos, Nikolaos Bellas.
    In Journal of Signal Processing Systems, 1--20, October 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • pDindel: Accelerating InDel Detection on a Multicore CPU Architecture with SIMD.
    Da Zhang, Hao Wang, Kaixi Hou, Jing Zhang, Wu-chun Feng.
    In Proceedings of the 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Miami, FL, USA, October 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.
    Jing Zhang, Hao Wang, Wu-chun Feng.
    In IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), PP (99): October 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • GLAF: A Visual Programming and Auto-Tuning Framework for Parallel Computing.
    Konstantinos Krommydas, Ruchira Sasanka, Wu-chun Feng.
    In Proceedings of the International Conference on Parallel Processing, Beijing, China, September 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.
    Ashwin M. Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the IEEE Cluster, Chicago, Illinois, September 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MPI-ACC: Accelerator-Aware MPI for Scientific Applications.
    Ashwin M. Aji, Lokendra Panwar, Feng Ji, Karthik Murthy, Milind Chabbi, Pavan Balaji, Keith Bisset, James Dinan, Wuchun Feng, John Mellor-Crummey, Xiaosong Ma, Rajeev Thakur.
    In IEEE Transactions on Parallel and Distributed Systems, PP (99): June 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the 29th ACM International Conference on Supercomputing, Newport Beach, California, USA, June 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Directive-Based GPU Programming for Computational Fluid Dynamics.
    Brent P. Pickering, Charles W. Jackson, Thomas R.W. Scogland, Wu-Chun Feng, Christopher Roy.
    In Computers and Fluids, 114 242-253, March 2015.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • CentroidBLAST: Accelerating Sequence Search via Clustering.
    Wu-chun Feng, Konstantinos Krommydas, Liqing Zhang.
    In Proceedings of the 7th International Conference on Bioinformatics and Computational Biology, Honolulu, Hawaii, March 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures.
    Thomas R. W. Scogland, Wu-chun Feng.
    In Proceedings of the International Conference on Performance Engineering (ICPE), Austin, TX, USA, January 2015.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2014


  • Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows.
    Nabeel Mohamed, Nabanita Maji, Jing Zhang, Nataliya Timoshevskaya, Wu-chun Feng.
    In Proceedings of the 3rd IEEE International Conference on Big Data Science and Engineering (BDSE), Beijing, China, September 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study.
    Kaixi Hou, Hao Wang, Wu-chun Feng.
    In Proceedings of the 7th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Minneapolis, Minnesota, September 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • SDAFT: A Novel Scalable Data Access Framework for Parallel BLAST.
    Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng.
    In Parallel Computing, 40 (10): 697–709, August 2014.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • CoreTSAR: Adaptive Worksharing for Heterogeneous Systems.
    Thomas R. W. Scogland, Wu-chun Feng, Barry Rountree, Bronis R. de Supinski.
    In Proceedings of the International Supercomputing Conference, Leipzig, Germany, June 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms.
    Konstantinos Krommydas, Wu-chun Feng, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas.
    In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Zurich, Switzerland, June 2014.
    Best Paper Finalist
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Bio-Inspired MAV Computations using GPUs.
    Amit Amritkar, Danesh Tafti, Paul Sathre, Kaixi Hou, Sriram Chivukula, Wu-chun Feng.
    In Proceedings of the AIAA Aviation and Aeronautics Forum and Exposition 2014, Atlanta, Georgia, June 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards a Performance-Portable FFT Library for Heterogeneous Computing.
    Carlo del Mundo, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers (CF), Cagliari, Italy, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU.
    Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng.
    In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Phoenix, Arizona, USA, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems.
    James E. McClure, Hao Wang, Jan F. Prins, Cass T. Miller, Wu-chun Feng.
    In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Phoenix, Arizona, May 2014.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Power-Measurement Methodology for Large-Scale, High-Performance Computing.
    Thomas R. W. Scogland, Craig Steffen, Torsten Wilde, Florent Parent, Susan Coghlan, Natalie Bates, Wu-chun Feng, Erich Strohmaier.
    In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE), Dublin, Ireland, March 2014.
    Nominated for Best Industrial Paper Award
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2013


  • On the Programmability and Performance of Heterogeneous Platforms.
    Konstantinos Krommydas, Thomas R.W. Scogland, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors.
    Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Online Performance Projection for Clusters with Heterogeneous GPUs.
    Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), Seoul, Korea, December 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator.
    Mark Gardner, Paul Sathre, Wu-chun Feng, Gabriel Martinez.
    In Parallel Computing, 39 (12): 769-786, December 2013.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems.
    Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng.
    In Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), Zhangjiajie, China, November 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.
    Palden Lama, Yan Li, Ashwin M. Aji, Pavan Balaji, James Dinan, Shucai Xiao, Yunquan Zhang, Wu-chun Feng, Rajeev Thakur, Xiaobo Zhou.
    In Proceedings of the 33rd International Conference on Distributed Computing Systems, Philadelphia, USA, July 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Synchronization and Ordering Semantics in Hybrid MPI GPU Programming.
    Ashwin M. Aji, Pavan Balaji, James Dinan, Wu-chun, Feng, Rajeev Thakur.
    In Proceedings of the 3rd International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Boston, USA, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Fast Fourier Transform for Wideband Channelization.
    Carlo del Mundo, Vignesh Adhinarayanan, Wu-chun Feng.
    In Proceedings of the International Conference on Communications (ICC), Budapest, Hungary, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Efficacy of GPU-Integrated MPI for Scientific Applications.
    Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John Mellor-Crummy, Xiaosong Ma, Rajeev Thakur.
    In Proceedings of the ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), New York, USA, June 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures.
    Jing Zhang, Heshan Lin, Pavan Balaji, Wu-chun Feng.
    In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Delft, Netherlands, May 2013.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2012


  • Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation.
    Paul Sathre, Mark Gardner, Wu-chun Feng.
    In Proceedings of the 5th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Pittsburgh, PA, September 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Performance Characterization of Data-Intensive Kernels on AMD Fusion Architectures.
    Kenneth Lee, Heshan Lin, Wu-chun Feng.
    In Proceedings of the International Supercomputing Conference (ISC), Hamburg, Germany, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • DMA-Assisted, Intranode Communication in GPU Accelerated Systems.
    Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Rajeev Thakur, Wu-chun Feng, Xiaosong Ma.
    In Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems.
    Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset, Rajeev Thakur.
    In Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK, June 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Efficient Intranode Communication in GPU-Accelerated Systems.
    Feng Ji, Ashwin Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Xiaosong Ma.
    In Proceedings of the 2nd IEEE International Workshop on Accelerators and Hybrid Exascale Systems (in conjunction with the 26th IEEE International Parallel and Distributed Processing Symposium), Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units.
    Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, Wu-chun Feng.
    In Proceedings of the IEEE Innovative Parallel Computing (InPar2012), San Jose, CA, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Transparent Accelerator Migration in a Virtualized GPU Environment.
    Shucai Xiao, Pavan Balaji, James Dinan, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, Wu-chun Feng.
    In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems.
    Shucai Xiao, Wu-chun Feng.
    In Proceedings of the PhD Forum at the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Heterogeneous Task Scheduling for Accelerated OpenMP.
    Thomas R. W. Scogland, Barry Rountree, Wu-chun Feng, Bronis R. de Supinski.
    In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • OpenCL and the 13 Dwarfs: A Work In Progress.
    Wu-chun Feng, Heshan Lin, Thomas R. W. Scogland, Jing Zhang.
    In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE), Boston, MA, April 2012.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2011


  • StreamMR: An Optimized MapReduce Framework for AMD GPUs.
    Marwa Elteir, Heshan Lin, Wu-chun Feng, Thomas R. W. Scogland.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Architecture-Aware Mapping and Optimization on a 1600-Core GPU.
    Mayank Daga, Thomas R. W. Scogland, Wu-chun Feng.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures.
    Gabriel Martinez, Mark Gardner, Wu-chun Feng.
    In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, Tainan, Taiwan, December 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Spectral Method Characterization on FPGA and GPU Accelerators.
    Karl Pereira, Peter Athanas, Heshan Lin, Wu-chun Feng.
    In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, November 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Performance Characterization and Optimization of Atomic Operations on AMD GPUs.
    Marwa Elteir, Heshan Lin, Wu-chun Feng.
    In Proceedings of the IEEE Cluster 2011, Austin, TX, USA, September 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Efficacy of a Fused CPU+GPU Processor for Parallel Computing.
    Mayank Daga, Ashwin Aji, Wu-chun Feng.
    In Proceedings of the Symposium on Application Accelerators in High-Performance Computing, Knoxville, Tennessee, USA, July 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Protein Sequence Search in a Heterogeneous Computing System.
    Shucai Xiao, Heshan Lin, Wu-chun Feng.
    In Proceedings of the 25th International Parallel and Distributed Processing Symposium, Anchorage, Alaska, USA, May 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Bounding the Effect of Partition Camping in GPU Kernels.
    Ashwin M. Aji, Mayank Daga, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, May 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards Accelerating Molecular Modeling via Multi-Scale Approximation on a GPU.
    Mayank Daga, Wu-chun Feng, Thomas R. W. Scogland.
    In Proceedings of the 1st IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Orlando, Florida, USA, February 2011.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2010


  • GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors.
    Ashwin M. Aji, Liqing Zhang, Wu-chun Feng.
    In Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, Hong Kong, China, December 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Coordinating Computation and I/O in Massively Parallel Sequence Search.
    Heshan Lin, Xiaosong Ma, Wu-chun Feng, Nagiza Samatova.
    In IEEE Transactions on Parallel and Distributed Systems, PP (99): 1-14, May 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Towards Chip-on-Chip Neuroscience: Fast Mining of Neuronal Spike Streams Using Graphics Hardware.
    Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, Naren Ramakrishnan.
    In Proceedings of the 7th ACM International Conference on Computing Frontiers, Bertinoro, Italy, pp. 1--10, May 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • To GPU Synchronize or Not GPU Synchronize?.
    Wu-chun Feng, Shucai Xiao.
    In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, May 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Inter-Block GPU Communication via Fast Barrier Synchronization.
    Shucai Xiao, Wu-chun Feng.
    In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Atlanta, Georgia, USA, April 2010.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Accelerating Electrostatic Surface Potential Calculation with Multi-Scale Approximation on Graphics Processing Units.
    Ramu Anandakrishnan, Thomas R.W. Scogland, Andrew T. Fenley, John C. Gordon, Wu-chun Feng, Alexey V. Onufriev.
    In Journal of Molecular Graphics and Modelling, 28 904-910, April 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • Missing Genes in the Annotation of Prokaryotic Genomes.
    Andrew S. Warren, Jeremy Archuleta, Wu-chun Feng, Joao Carlos Setubal.
    In BMC Bioinformatics 2010, 11 (131): 1-12, March 2010.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • 2009


  • Tools and Environments for Multi- and Many-Core Architectures.
    Wu-chun Feng, Pavan Balaji.
    In IEEE Computer, 42 (12): 26-27, December 2009.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit.
    Shucai Xiao, Ashwin M. Aji, Wu-chun Feng.
    In Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, China, December 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • On the Energy Efficiency of Graphics Processing Units for Scientific Computing.
    Song Huang, Shucai Xiao, Wu-chun Feng.
    In Proceedings of the 5th IEEE Workshop on High-Performance, Power-Aware Computing (in conjunction with the 23rd International Parallel and Distributed Processing Symposium (IPDPS)), Rome, Italy, 3801-3804, June 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Multi-Dimensional Characterization of Temporal Data Mining on Graphics Processors.
    Jeremy Archuleta, Yong Cao, Thomas R. W. Scogland, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rome, Italy, May 2009.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2008


  • Asymmetric Interactions in Symmetric Multi-core Systems: Analysis, Enhancements and Evaluation.
    Thomas R. W. Scogland, Pavan Balaji, Wu-chun Feng, Ganesh Narayanaswamy.
    In Proceedings of the ACM/IEEE SC|08: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Austin, Texas, USA, November 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture.
    Heshan Lin, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma, Wu-chun Feng.
    In Proceedings of the ACM/IEEE SC|08: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Austin, Texas, November 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Optimizing Performance, Cost, and Sensitivity in Pairwise Sequence Search on a Cluster of PlayStations.
    Ashwin M. Aji, Wu-chun Feng.
    In Proceedings of the IEEE International Conference on BioInformatics and BioEngineering, Athens, Greece, October 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer.
    P. Balaji, W. Feng, H. Lin, J. Archuleta, S. Matsuoka, A. Warren, J. Setubal, E. Lusk, R. Thakur, I. Foster, D. S. Katz, S. Jha, K. Shinpaugh, S. Coghlan, D. Reed.
    In Proceedings of the International Supercomputing Conference, Dresden, Germany, June 2008. Distinguished Paper Award
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine.
    Ashwin M. Aji, Wu-chun Feng, Filip Blagojevic, Dimitrios S. Nikolopoulos.
    In Proceedings of the 5th ACM International Conference on Computing Frontiers, Ischia, Italy, May 2008.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2007


  • High-Performance Computing Using Accelerators.
    Wu-chun Feng, Dinesh Manocha.
    In Parallel Computing, 33 (10-11): 645-647, November 2007.
      Preprint               Citations:  [ BibTeX    XML    PlainText ]   
  • A Maintainable Software Architecture for Fast and Modular Bioinformatics Sequence Search.
    Jeremy S. Archuleta, Eli Tilevich, Wu-chun Feng.
    In Proceedings of the 23rd IEEE International Conference on Software Maintenance, Paris, France, October 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • A Pluggable Framework for Parallel Pairwise Sequence Search.
    Jeremy S. Archuleta, Wu-chun Feng, Eli Tilevich.
    In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, August 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • Parallel Genomic Sequence-Search on a Massively Parallel System.
    Oystein Thorsen, Brian Smith, Carlos P. Sosa, Karl Jiang, Heshan Lin, Amanda Peters, Wu-chun Feng.
    In Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, May 2007.
      Paper               Citations:  [ BibTeX    XML    PlainText ]   
  • 2006


  • Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications.
    Mark K. Gardner, Wu-chun Feng, Jeremy S. Archuleta, Heshan Lin, Xiaosong Ma.
    In Proceedings of the ACM/IEEE SC|06: The International Conference on High-Performance Computing, Networking, Storage, and Analysis, Tampa, FL, November 2006.
    Best Paper Finalist
      Paper               Citations:  [ BibTeX    XML    PlainText ]