CIS565-Fall-2014 · foxking0416 · Sep 27, 2014 · Sep 27, 2014 · Sep 28, 2014 · Sep 28, 2014
diff --git a/Project Description.md b/Project Description.md
@@ -0,0 +1,133 @@
+Project-2
+=========
+
+A Study in Parallel Algorithms : Stream Compaction
+
+# INTRODUCTION
+Many of the algorithms you have learned thus far in your career have typically
+been developed from a serial standpoint.  When it comes to GPUs, we are mainly
+looking at massively parallel work.  Thus, it is necessary to reorient our
+thinking.  In this project, we will be implementing a couple different versions
+of prefix sum.  We will start with a simple single thread serial CPU version,
+and then move to a naive GPU version.  Each part of this homework is meant to
+follow the logic of the previous parts, so please do not do this homework out of
+order.
+
+This project will serve as a stream compaction library that you may use (and
+will want to use) in your
+future projects.  For that reason, we suggest you create proper header and CUDA
+files so that you can reuse this code later.  You may want to create a separate
+cpp file that contains your main function so that you can test the code you
+write.
+
+# OVERVIEW
+Stream compaction is broken down into two parts: (1) scan, and (2) scatter.
+
+## SCAN
+Scan or prefix sum is the summation of the elements in an array such that the
+resulting array is the summation of the terms before it.  Prefix sum can either
+be inclusive, meaning the current term is a summation of all the elements before
+it and itself, or exclusive, meaning the current term is a summation of all
+elements before it excluding itself. 
+
+Inclusive:
+
+In : [ 3 4 6 7 9 10 ]
+
+Out : [ 3 7 13 20 29 39 ]
+
+Exclusive
+
+In : [ 3 4 6 7 9 10 ]
+
+Out : [ 0 3 7 13 20 29 ]
+
+Note that the resulting prefix sum will always be n + 1 elements if the input
+array is of length n.  Similarly, the first element of the exclusive prefix sum
+will always be 0.  In the following sections, all references to prefix sum will
+be to the exclusive version of prefix sum.
+
+## SCATTER
+The scatter section of stream compaction takes the results of the previous scan
+in order to reorder the elements to form a compact array.
+
+For example, let's say we have the following array:
+[ 0 0 3 4 0 6 6 7 0 1 ]
+
+We would only like to consider the non-zero elements in this zero, so we would
+like to compact it into the following array:
+[ 3 4 6 6 7 1 ]
+
+We can perform a transform on input array to transform it into a boolean array:
+
+In :  [ 0 0 3 4 0 6 6 7 0 1 ]
+
+Out : [ 0 0 1 1 0 1 1 1 0 1 ]
+
+Performing a scan on the output, we get the following array :
+
+In :  [ 0 0 1 1 0 1 1 1 0 1 ]
+
+Out : [ 0 0 0 1 2 2 3 4 5 5 ]
+
+Notice that the output array produces a corresponding index array that we can
+use to create the resulting array for stream compaction. 
+
+# PART 1 : REVIEW OF PREFIX SUM
+Given the definition of exclusive prefix sum, please write a serial CPU version
+of prefix sum.  You may write this in the cpp file to separate this from the
+CUDA code you will be writing in your .cu file. 
+
+# PART 2 : NAIVE PREFIX SUM
+We will now parallelize this the previous section's code.  Recall from lecture
+that we can parallelize this using a series of kernel calls.  In this portion,
+you are NOT allowed to use shared memory.
+
+### Questions 
+* Compare this version to the serial version of exclusive prefix scan. Please
+  include a table of how the runtimes compare on different lengths of arrays.
+* Plot a graph of the comparison and write a short explanation of the phenomenon you
+  see here.
+
+# PART 3 : OPTIMIZING PREFIX SUM
+In the previous section we did not take into account shared memory.  In the
+previous section, we kept everything in global memory, which is much slower than
+shared memory.
+
+## PART 3a : Write prefix sum for a single block
+Shared memory is accessible to threads of a block. Please write a version of
+prefix sum that works on a single block.  
+
+## PART 3b : Generalizing to arrays of any length.
+Taking the previous portion, please write a version that generalizes prefix sum
+to arbitrary length arrays, this includes arrays that will not fit on one block.
+
+### Questions
+* Compare this version to the parallel prefix sum using global memory.
+* Plot a graph of the comparison and write a short explanation of the phenomenon
+  you see here.
+
+# PART 4 : ADDING SCATTER
+First create a serial version of scatter by expanding the serial version of
+prefix sum.  Then create a GPU version of scatter.  Combine the function call
+such that, given an array, you can call stream compact and it will compact the
+array for you.  Finally, write a version using thrust. 
+
+### Questions
+* Compare your version of stream compact to your version using thrust.  How do
+  they compare?  How might you optimize yours more, or how might thrust's stream
+  compact be optimized.
+
+# EXTRA CREDIT (+10)
+For extra credit, please optimize your prefix sum for work parallelism and to
+deal with bank conflicts.  Information on this can be found in the GPU Gems
+chapter listed in the references.  
+
+# SUBMISSION
+Please answer all the questions in each of the subsections above and write your
+answers in the README by overwriting the README file.  In future projects, we
+expect your analysis to be similar to the one we have led you through in this
+project.  Like other projects, please open a pull request and email Harmony.
+
+# REFERENCES
+"Parallel Prefix Sum (Scan) with CUDA." GPU Gems 3.
diff --git a/Project2/.gitignore b/Project2/.gitignore
@@ -0,0 +1,35 @@
+# General
+builds/
+
+# Compiled objects
+*.o
+*.obj
+
+# Compiled dynamic libraries
+*.dll
+*.so
+
+# Compiled static libraries
+*.lib
+
+# Windows specific
+[Rr]elease/
+[Dd]ebug/
+*.suo
+*.pdb
+*.sdf
+*.opensdf
+*.user
+*.deps
+*.ipch
+
+# OSX specific
+.DS_Store
+*/.DS_Store
+
+# VIM swap files
+*.swp
+
+# Exceptions
+!*/shared/*
+
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial.sln b/Project2/PROJ_WIN/CUDA_ProjectInitial.sln
@@ -0,0 +1,20 @@
+
+Microsoft Visual Studio Solution File, Format Version 11.00
+# Visual Studio 2010
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "CUDA_ProjectInitial", "CUDA_ProjectInitial\CUDA_ProjectInitial.vcxproj", "{D7BEFF7A-4902-4B7E-922B-B0417A66864C}"
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug|Win32 = Debug|Win32
+		Release|Win32 = Release|Win32
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{D7BEFF7A-4902-4B7E-922B-B0417A66864C}.Debug|Win32.ActiveCfg = Debug|Win32
+		{D7BEFF7A-4902-4B7E-922B-B0417A66864C}.Debug|Win32.Build.0 = Debug|Win32
+		{D7BEFF7A-4902-4B7E-922B-B0417A66864C}.Release|Win32.ActiveCfg = Release|Win32
+		{D7BEFF7A-4902-4B7E-922B-B0417A66864C}.Release|Win32.Build.0 = Release|Win32
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+EndGlobal
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial/CUDA_ProjectInitial.vcxproj b/Project2/PROJ_WIN/CUDA_ProjectInitial/CUDA_ProjectInitial.vcxproj
@@ -0,0 +1,108 @@
+<?xml version="1.0" encoding="utf-8"?>
+<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup Label="ProjectConfigurations">
+    <ProjectConfiguration Include="Debug|Win32">
+      <Configuration>Debug</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|Win32">
+      <Configuration>Release</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+  </ItemGroup>
+  <PropertyGroup Label="Globals">
+    <ProjectGuid>{D7BEFF7A-4902-4B7E-922B-B0417A66864C}</ProjectGuid>
+    <RootNamespace>Project3</RootNamespace>
+    <ProjectName>CUDA_ProjectInitial</ProjectName>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
+    <ConfigurationType>Application</ConfigurationType>
+    <UseDebugLibraries>true</UseDebugLibraries>
+    <CharacterSet>MultiByte</CharacterSet>
+    <PlatformToolset>v100</PlatformToolset>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
+    <ConfigurationType>Application</ConfigurationType>
+    <UseDebugLibraries>false</UseDebugLibraries>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+    <CharacterSet>MultiByte</CharacterSet>
+    <PlatformToolset>v100</PlatformToolset>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
+  <ImportGroup Label="ExtensionSettings">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.props" />
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
+  </ImportGroup>
+  <PropertyGroup Label="UserMacros" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
+    <LinkIncremental>false</LinkIncremental>
+  </PropertyGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
+    <ClCompile>
+      <WarningLevel>Level3</WarningLevel>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>$(SolutionDir)/shared/glew/include/;$(SolutionDir)/shared/freeglut/include/;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+    </ClCompile>
+    <Link>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <AdditionalLibraryDirectories>$(SolutionDir)/shared/glew/lib;$(SolutionDir)/shared/freeglut/lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalDependencies>opengl32.lib;glut32.lib;glew32.lib;freeglut.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <SubSystem>Console</SubSystem>
+      <EntryPointSymbol>mainCRTStartup</EntryPointSymbol>
+    </Link>
+    <CudaCompile>
+      <Include>$(CudaToolkitIncludeDir)</Include>
+      <CompileOut>$(ProjectDir)$(Platform)/$(Configuration)/%(Filename)%(Extension).obj</CompileOut>
+      <GPUDebugInfo>true</GPUDebugInfo>
+      <GenerateLineInfo>true</GenerateLineInfo>
+      <HostDebugInfo>true</HostDebugInfo>
+      <CodeGeneration>compute_20,sm_20;compute_30,sm_30</CodeGeneration>
+    </CudaCompile>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
+    <ClCompile>
+      <WarningLevel>Level3</WarningLevel>
+      <Optimization>MaxSpeed</Optimization>
+      <FunctionLevelLinking>true</FunctionLevelLinking>
+      <IntrinsicFunctions>true</IntrinsicFunctions>
+      <AdditionalIncludeDirectories>$(SolutionDir)/shared/glew/include/;$(SolutionDir)/shared/freeglut/include/;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+    </ClCompile>
+    <Link>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <OptimizeReferences>true</OptimizeReferences>
+      <AdditionalDependencies>opengl32.lib;freeglut.lib;glew32.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalLibraryDirectories>$(SolutionDir)/shared/glew/lib;$(SolutionDir)/shared/freeglut/lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+    </Link>
+    <CudaCompile>
+      <CompileOut>$(ProjectDir)$(Platform)/$(Configuration)/%(Filename)%(Extension).obj</CompileOut>
+    </CudaCompile>
+    <CudaCompile>
+      <Include>$(CudaToolkitIncludeDir)</Include>
+    </CudaCompile>
+  </ItemDefinitionGroup>
+  <ItemGroup>
+    <ClCompile Include="..\..\src\main.cpp" />
+  </ItemGroup>
+  <ItemGroup>
+    <CudaCompile Include="..\..\src\kernel.cu">
+      <FileType>Document</FileType>
+      <CodeGeneration Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">compute_20,sm_20</CodeGeneration>
+    </CudaCompile>
+  </ItemGroup>
+  <ItemGroup>
+    <ClInclude Include="..\..\src\kernel.h" />
+    <ClInclude Include="..\..\src\main.h" />
+  </ItemGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
+  <ImportGroup Label="ExtensionTargets">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.targets" />
+  </ImportGroup>
+</Project>
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial/CUDA_ProjectInitial.vcxproj.filters b/Project2/PROJ_WIN/CUDA_ProjectInitial/CUDA_ProjectInitial.vcxproj.filters
@@ -0,0 +1,29 @@
+<?xml version="1.0" encoding="utf-8"?>
+<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup>
+    <ClInclude Include="..\..\src\kernel.h">
+      <Filter>Header</Filter>
+    </ClInclude>
+    <ClInclude Include="..\..\src\main.h">
+      <Filter>Header</Filter>
+    </ClInclude>
+  </ItemGroup>
+  <ItemGroup>
+    <Filter Include="Resource">
+      <UniqueIdentifier>{a80fd6d7-5fd1-4847-8559-ab28bf5e7f41}</UniqueIdentifier>
+    </Filter>
+    <Filter Include="Header">
+      <UniqueIdentifier>{ad476b75-28a6-4e18-ac67-edb7036b0a5b}</UniqueIdentifier>
+    </Filter>
+  </ItemGroup>
+  <ItemGroup>
+    <ClCompile Include="..\..\src\main.cpp">
+      <Filter>Resource</Filter>
+    </ClCompile>
+  </ItemGroup>
+  <ItemGroup>
+    <CudaCompile Include="..\..\src\kernel.cu">
+      <Filter>Resource</Filter>
+    </CudaCompile>
+  </ItemGroup>
+</Project>
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial/scan_scatter_CPU.cpp b/Project2/PROJ_WIN/CUDA_ProjectInitial/scan_scatter_CPU.cpp
@@ -0,0 +1,3 @@
+#include "main.h"
+
+
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial/scan_scatter_CPU.h b/Project2/PROJ_WIN/CUDA_ProjectInitial/scan_scatter_CPU.h
@@ -0,0 +1,14 @@
+#include <iostream>
+#include <sstream>
+#include <fstream>
+#include <ctime>
+
+using namespace std;
+
+int* serialPrefixSumInclusive(int* , int);
+int* serialPrefixSumExclusive(int* , int);
+int* serialScatter(int* arr, int N);
+
+double diffclock( clock_t clock1, clock_t clock2 );
+
+#endif
diff --git a/Project2/PROJ_WIN/CUDA_ProjectInitial/test.cpp b/Project2/PROJ_WIN/CUDA_ProjectInitial/test.cpp
diff --git a/Project2/PROJ_WIN/shared/freeglut/Copying.txt b/Project2/PROJ_WIN/shared/freeglut/Copying.txt
@@ -0,0 +1,27 @@
+
+  Freeglut Copyright
+  ------------------
+
+  Freeglut code without an explicit copyright is covered by the following 
+  copyright:
+
+  Copyright (c) 1999-2000 Pawel W. Olszta. All Rights Reserved.
+  Permission is hereby granted, free of charge,  to any person obtaining a copy 
+  of this software and associated documentation files (the "Software"), to deal
+  in the Software without restriction,  including without limitation the rights 
+  to use, copy,  modify, merge,  publish, distribute,  sublicense,  and/or sell 
+  copies or substantial portions of the Software.
+
+  The above  copyright notice  and this permission notice  shall be included in 
+  all copies or substantial portions of the Software.
+
+  THE SOFTWARE  IS PROVIDED "AS IS",  WITHOUT WARRANTY OF ANY KIND,  EXPRESS OR 
+  IMPLIED,  INCLUDING  BUT  NOT LIMITED  TO THE WARRANTIES  OF MERCHANTABILITY, 
+  FITNESS  FOR  A PARTICULAR PURPOSE  AND NONINFRINGEMENT.  IN  NO EVENT  SHALL 
+  PAWEL W. OLSZTA BE LIABLE FOR ANY CLAIM,  DAMAGES OR OTHER LIABILITY, WHETHER 
+  IN  AN ACTION  OF CONTRACT,  TORT OR OTHERWISE,  ARISING FROM,  OUT OF  OR IN 
+  CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+  Except as contained in this notice,  the name of Pawel W. Olszta shall not be 
+  used  in advertising  or otherwise to promote the sale, use or other dealings 
+  in this Software without prior written authorization from Pawel W. Olszta.