Research on Methods for Traversing Two-Level BVH Trees on Graphics Processors

Мұқаба

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат Ашық рұқсат
Рұқсат жабық Рұқсат берілді
Рұқсат жабық Рұқсат ақылы немесе тек жазылушылар үшін

Аннотация

A key part of the most common ray tracing methods is the traversal/search for intersections with a hierarchical structure – BVH, which describes the scene geometry. This paper presents a comparative analysis of the performance of several BVH tree traversal methods on stationary and mobile graphics processors. We investigated BVH trees with varying depths and numbers of child nodes, implemented several stack-based traversal algorithms, and two different stackless traversal algorithms; we proposed our own stackless traversal variant, which is more performant than existing ones in some cases. We proposed our own compression method for BVH trees with 2 nodes, losing no more than 15% performance. We identified a common problem that occurs in almost all algorithms when implemented on graphics processors. We believe that our analysis will help developers of ray tracing hardware accelerators create more economical hardware solutions, not limited solely to ray tracing. More specifically, our experimental results suggest that up to a 5x speedup can be achieved by changing the L2 cache mechanism, and this has likely already been implemented in stationary GPUs with hardware-accelerated ray tracing, not only within the ray tracing mechanism itself but also in a more general context.

Негізгі сөздер

Толық мәтін

Рұқсат жабық

Авторлар туралы

L. Smirnov

Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics

Хат алмасуға жауапты Автор.
Email: lyovasmirnov@gmail.com
ORCID iD: 0000-0001-5385-4841
Ресей, Moscow, 119991 Russia

V. Frolov

Institute of Artificial Intelligence, Moscow State University Leninskie Gory; Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics; Keldysh Institute of Applied Mathematics, Russian Academy of Sciences

Email: vladimir.frolov@graphics.cs.msu.ru
ORCID iD: 0000-0001-8829-9884
Ресей, Moscow, 119899; Moscow, 119991; Moscow, 125047

Y. Kryachko

Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics

Email: yuri.kryachko@gmail.com
ORCID iD: 0009-0009-5676-0475
Ресей, Moscow, 119991

A. Voloboy

Keldysh Institute of Applied Mathematics, Russian Academy of Sciences

Email: voloboy@gin.keldysh.ru
ORCID iD: 0000-0003-1252-8294
Ресей, Moscow, 125047

Әдебиет тізімі

  1. Beets K. The six levels of ray tracing acceleration. Imagination white paper. https://forums.macrumors.com/attachments/imagination-raytracing-primer-sept2020-pdf.1973926/
  2. Meister D., Bittne J. Performance Comparison of Bounding Volume Hierarchies for GPU Ray Tracing, Journal of Computer Graphics Techniques (JCGT). 2022. V. 11. № 4. Р. 1–19. http://jcgt.org/published/0011/04/01/
  3. Meister D., Ogaki S., Benthin C., Michael J., Doyle M.J., Guthe М., Bittner J. A Survey on Bounding Volume Hierarchies for Ray Tracing. Computer Graphics Forum. 2021. V. 40. P. 683–712.
  4. Aila T., Laine S. Understanding the Efficiency of Ray Traversal on GPUs. In Proceedings of HPG. 2009. Р. 145–149. 16, 17, 23.
  5. Aila T., Karras T. Architecture Considerations for Tracing Incoherent Rays. In Proceedings of PG. 2010. Р. 113–122. 18, 19.
  6. Aila T., Karras T., Laine S. On Quality Metrics of Bounding Volume Hierarchies. HPG. 2013. Р. 101–108. 4, 5, 10.
  7. Lier A., Stamminger M., Selgrad K. CPU-style SIMD ray traversal on GPUs. In Proceedings of the Conference on High-Performance Graphics, Association for Computing Machinery, 7:1–7:4. 2018. https://doi.org/10.1145/3231578. 3231583. 7, 8, 11, 15
  8. Ernst M., Greiner G. Early Split Clipping for Bounding Volume Hierarchies. In Proceedings of Symposium on Interactive Ray Tracing. 2007. Р. 73–78. 9, 10.
  9. Stich M., Friedrich H., Dietrich A. Spatial Splits in Bounding Volume Hierarchies. In Proceedings of the High-Performance Graphics. 2009. Р. 7–13. 10, 22, 23.
  10. Segivia B., Ernst M. Memory Efficient Ray Tracing with Hierarchical Mesh Quantization. In Proceedings of Graphics Interface. 2010. Р. 153–160. 12, 13.
  11. Mahovsky J., Wyvill B. Memory-Conserving Bounding Volume Hierarchies with Coherent Raytracing. Computer Graphics Forum 25. 2006. № 2. Р. 173–182. 12.
  12. Liktor G., Vaidyanathan K. Bandwidth-efficient BVH Layout for Incremental Hardware Traversal. In Proceedings of High-Performance Graphics. 2016. Р. 51–61. 8, 18, 19.
  13. Kalojanov J., Billeter M., Slusallek P. Two‐level grids for ray tracing on GPUs // Computer Graphics Forum. Oxford, UK: Blackwell Publishing Ltd. 2011. V. 30. № 2. P. 307–314.
  14. Bartels P., Harada T. Combining GPU Tracing Methods within a Single Ray Query. In SIGGRAPH Asia 2022 Technical Communications (SA '22). Association for Computing Machinery, New York, NY, USA, 2022. Article 17, 1–4. https://doi.org/10.1145/3550340.3564231
  15. Zellmann S., Wu Q., Ma K.L., Wald I. Memory-Efficient GPU Volume Path Tracing of AMR Data Using the Dual Mesh. 2023.
  16. Garanzha K., Bel A., Premoze S., Galaktionov V. (2011). Out-of-core GPU ray tracing of complex scenes. In ACM SIGGRAPH 2011 Talks (pp. 1-1).
  17. Roberto T. et al. Ray casting using a roped BVH with CUDA. Spring conference on Computer graphics. 2009.
  18. Binder N., Keller A. Efficient Stackless Hierarchy Traversal on GPUs with Backtracking in Constant Time. In Proceedings of High-Performance Graphics. 2016. Р. 41–50. 17.
  19. Laine S., Karra T., Aila T. Megakernels Considered Harmful: Wavefront Path Tracing on GPUs. High-Performance Graphics 2013.
  20. Wald I., Woop S., Benthin C., Johnson G.S. & Ernst M. (2014). Embree: a kernel framework for efficient CPU ray tracing. ACM Transactions on Graphics (TOG). 2014. № 33(4). Р. 1–8.
  21. Wald I., Benthin C., Boulos S. Getting Rid of Packets – Efficient SIMD Single-Ray Traversal using Multi-Branching BVHs. In Symposium on Interactive Ray Tracing. 2008. Р. 49–57. 10, 14, 22, 23.
  22. Popov S., Georgiev I., Dimov R., Slusallek P. Object Partitioning Considered Harmful: Space Subdivision for BVHs. In Proceedings of High-Performance Graphics. 2009. Р. 15–22. 4, 5, 10, 22.
  23. Ylitie H., Karras T., Laine S. Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs. In Proceedings of High-Performance Graphics. 2017. Р. 4: 1–4: 13, 10, 12, 16, 17, 22, 23
  24. Yoon S.E., Manocha D. Cache-Efficient Layouts of Bounding Volume Hierarchies. Computer Graphics Forum. 2006. 8.
  25. Wachter C., Keller A. Instant Ray Tracing: The Bounding Interval Hierarchy. In Proceedings Eurographics Symposium on Rendering. 2006. Р. 139–149. 13.
  26. Eisemann M., Woizischke C., Magnor M. Ray Tracing with the Single Slab Hierarchy. In Proceedings of Vision, Modeling, and Visualization. 2008. Р. 373–381. 13.
  27. Lin D., Vasiou E., Yuksel C., Kopta D., Brunvand E. Hardware-Accelerated Dual-Split Trees. Proceedings of the ACM on Computer Graphics and Interactive Techniques 3, 2 (2020). 13.
  28. Weier P., Rath A., Michel É., Georgiev I., Slusallek P., Boubekeur T. N-BVH: Neural ray queries with bounding volume hierarchies. In ACM SIGGRAPH 2024 Conference Papers (SIGGRAPH '24). Association for Computing Machinery, New York, NY, USA, Article 99, 1–11. doi: 10.1145/3641519.3657464.
  29. Frolov V., Sanzharov V., Garifullin A., Raenchuk M., Voloboy A. CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications // arXiv:2409.12617

Қосымша файлдар

Қосымша файлдар
Әрекет
1. JATS XML
2. Fig. 1. Comparison of the number of arithmetic operations per ray for the Cry Sponza scene. Yellow (first column) is the LBVH builder implemented in the Embree library, green and purple (second and third) are SBVH variants. Green is our own builder, purple is the implementation from the Intel Embree library. The left image is data for the Cry Sponza scene. The right one is for the Bistro scene.

Жүктеу (1MB)
3. Fig. 2. Comparison of the average number of arithmetic operations per ray for scenes with geometrically complex Cry Sponza and Bistro. For scenes with simpler geometry, no such difference is observed.

Жүктеу (778KB)
4. Fig. 3. CALBVH (top) / COLBVH (bottom). Average ray tracing speed for different triplet variants, %, from the fastest depending on different triplet types and memory layouts. Since the measurements were made on a mobile GPU, the most efficient approach was the one where the entire triplet fits into 64 bytes (SuperTreeletAlignedMerged2).

Жүктеу (860KB)
5. Fig. 4. Proposed 6-bit scheme for encoding plane coordinates. The regular grid inside the reference frame shows the possible positions of the planes of the inner frames. The blue dots mark the minimum and maximum, stored with half precision.

Жүктеу (2MB)
6. Fig. 5. Schematic representation of the vertex preservation algorithm. The basic algorithm is on top, the modified one is on the bottom.

Жүктеу (568KB)
7. Fig. 6. Images of test scenes. The first number (before '/') for each scene represents the number of unique triangles stored in memory, the second (after '/') represents the number of visible (instanced) triangles. The number after the comma is the number of instances. Next are the scene characteristics: (1) instanced_objects: 170K/62M pixels, 601 insts; (2) dragon: 871K/871K pixels, 1 inst. (3) bonsai: 650K/650K pixels/ 1 inst; (4) sponza_c: 66K/66K pixels, 1 inst. (5) cry_sponza_c: 244K/262K pixels, 17 insts; (6) cry_sponza: 244K/262K pixels, 281 insts. (7) hero: 28K/28K t., 4 inst; (8) human_sophi: 16.5K/16.5K t., 4 inst.

Жүктеу (5MB)
8. Fig. 6.2 Test scene images. The first number (before '/') for each scene represents the number of unique triangles stored in memory, the second (after 'right slash') represents the number of visible (instanced) triangles. The number after the comma is the number of instances. Next are the scene characteristics: (3) bonsai: 650K/650K pixels/1 inst; (4) sponza_c: 66K/66K pixels, 1 inst. (5) cry_sponza_c: 244K/262K pixels, 17 inst; (6) cry_sponza: 244K/262K pixels, 281 inst. (7) hero: 28K/28K pixels, 4 inst; (8) human_sophi: 16.5K/16.5K pixels, 4 inst.

Жүктеу (8MB)
9. Fig. 7. Number of millions of rays per second on Adreno 660 (left) and Mali-G57 (right) at different resolutions in different scenes.

Жүктеу (498KB)
10. Fig. 8. Number of millions of rays per second on Nvidia 2070 at different resolutions in different scenes.

Жүктеу (275KB)
11. Fig. 9. Number of MRays/s on Adreno 660 with different traversal algorithms on two scenes.

Жүктеу (562KB)
12. Fig. 10. Number of MRays/s on Mali-G57 with different traversal algorithms on two scenes.

Жүктеу (672KB)
13. Fig. 11. Average MRays/s on Adreno 660 with different bypass algorithms on other scenes.

Жүктеу (1MB)
14. Fig. 12. Average MRays/s on NV2070 with different traversal algorithms on other scenes.

Жүктеу (782KB)
15. Fig. 13. MRays/s on Mali-G57 using one large core (WhittedRT) or separate cores (WhittedRTSeparate) in two different implementations of BVH4 tree traversal.

Жүктеу (681KB)
16. Figure 14. MRays/s on Nvidia 2070 using one large core (WhittedRT) or separate cores (WhittedRTSeparate) in two different implementations of BVH4 tree traversal.

Жүктеу (706KB)
17. Fig. 15. Number of MRays/s on Adreno 660 using one large core (WhittedRT) or separate cores (WhittedRTSeparate) in two different implementations of BVH4 tree traversal.

Жүктеу (307KB)
18. Fig. 16. Comparison of the developed solution (BVH2Fat32, red column) and [2] (Hippie, blue column) on the Nvidia 3070M video card in MRays/s.

Жүктеу (1MB)
19. Fig. 17. Comparison of rendering time (primary ray tracing) in ms for the developed solution (BVH2Fat32, left part of the picture) and (right part) on NV3070 RTX, 1400 × 1400. Table 3 shows data in MRays/s.

Жүктеу (1MB)

© Russian Academy of Sciences, 2025