Dienstag, 31. März 2009

More Videos

Here two videos showing the Happy Buddha scene (1024x2048x1024).
High quality video here: Buddha avi [mirror]

The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.



Montag, 30. März 2009

Video

For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.

Landscape AVI [mirror]

Samstag, 28. März 2009

CUDA optimizations II

Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.

1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009

Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).

Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.

Montag, 23. März 2009

New Benchmark Version

Today I ported the CUDA version to CPU (multicore), it is included in the updated Demo

[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )

The first results so far are:

CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps

Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).

Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.

Donnerstag, 19. März 2009

CPU vs. GPU

Today I made a comparison of CPU vs. GPU, to see if it was really worth the work to write everything in CUDA rather than for CPU. [detaild pics] [-CPU-Demo-]

The oponents:
CPU: 3.0 Ghz Pentium D, 1GB vs.
GPU: NVidia GTX285, 1GB

In the first round the CPU seems to provide a good performance, compared to the GPU - the GPU is just 3x faster than the CPU.

In the second round however, the GPU already wins over CPU with a speed factor of 7.3 : 1.

In the third round the CPU now lost all ground and the GPU wins about 20:1 (47.5:2.4)

Finally it would be interesting to know why the GPU doesnt work linear at all. I dont have any idea why the framerate is not half if the computations are doubled or vice versa.

Mittwoch, 18. März 2009

Demo with 2x AA

Small update - the demo linked below now also includes 2xAA (not 2x2!), reducing the aliasing of distant pixels significantly. On the GTS 8800 its quite slow right now, but on the GTX285 its almost no difference to the normal version I found.
For the GTS perhaps I will think about only applying AA to distant geometry to increase the speed.

Dienstag, 17. März 2009

Now the algorithm works entirely on the GPU

Today I finished shifting the ray generation part to the GPU, saving another 1-4ms as well as an unnecessary memcopy. Also silhouette-smoothing is working well, together with basic anti-aliasing ( so far only for GTX2xx cards ).

As for the smoothing, I tried two variants (left), and found the one in the middle looks best so far. The unsmoothed original (top) is too edgy and the one on the bottom smoothens too much for the tree-scene which lets near rendered geometry look like a 2D impostor.

The updated demo is here [-download-] (Cuda 2.1)
Also containing softening for the buddha & dragon scenes now

For the experienced ones of you, the shader-folder contains the shader in GLSL (soft.frag). You can experiment a bit by modifying the smoothing.

Sonntag, 15. März 2009

Silhouette Smoothing

Today I experimented with a new shader to smoothen the silhouette based on the depth buffer. Looks not bad but its difficult to figure out the optimal parameters.


Samstag, 14. März 2009

Soft Voxels II

Today I improved the filtering a bit. The softening looks more nice than yesterday (also its slower a litte).  [-dl-new shaders-]

Still I'm not yet sure if soft voxels look better than hard-edged voxels in general. It  gives the impression of missing detail and low resolution - both things which are unwanted..
Better would be real filtering to approximate the surface.

Freitag, 13. März 2009

Soft Voxels

Today I added depth of field to soften the edgy voxels a littelbit. Its very simple - the smoothing radius just depends on the actual depth.

Donnerstag, 12. März 2009

New Release

Today its time for a new release. Major mapping bugs are fixed and the colors look better now (I hope).

[-Demo Version v2-] ( Cuda 2.1 )

I also posted the Demo as IOTD on GDev as I think its worth to see.
[-link-]

Dienstag, 10. März 2009

Happy Buddha reloaded

This time with shading - looks more nice.
From far its not possible to see if its polygons or voxels - only a closeup reveals what our buddha is made of.

Any limit?

View distance set to 4.000.000 - still interactive (18fps). To have unique voxels everywhere is a problem in this case however.

Here we can also see an advantage of the RLE structure - its very easy to generate procedural mountains. With octree-raycasting it might be possible too, but right now I dont have an idea how this could work easily.

Montag, 9. März 2009

Anti-Aliasing

Here we can see the 4 variants of Anti-aliasing. For the quality, also the distance where the next mipmap is switched to is very important.

Freitag, 6. März 2009

Maximal complexity ?

Here another very complex scene.
RLE Elements total :15.4M
RLE Elements processed:5.8M
RLE Elements rendered:1M
Visible Pixels:0.66M

Donnerstag, 5. März 2009

Better Performance

Today I wrote a converter for PLY.files. The first result can be seen on the left.

After several optimizations, also the framerate could be increased in average about 10% and depending on the scene of up to 50%.