Current search engines such as censys or shodan give everyone an insight into the global Internet. Unfortunately, they don't provide a comprehensive view of the Internet because you can't access the raw data. Consequently, you have to scan the Internet yourself. Anyone can perform a one-shot scan via Mass-Scan & Co. However, how to build an infrastructure for regular Internet scans that is not blocked after a short time by Intrussion Detection System and Spam/Blacklists is not easy. The following questions must be answered: Which scanning algorithms are used (centralized, distributed, BGP prefix hit lists)? How could you reduce scan traffic? How do I process the data in the long term (up to 600GB / scan)? With which further data do I enrich the scans for further analyses (BGP prefixes, Inetnum objects) ? How do I build the right server without a bottleneck and how do I connect it to the internet (rent a server or become a RIPE-Member/ your own ISP with a /22 IPv4 /32 IPv6 Block)?
In the first half of the talk we will deal with these questions. In the second half of the lecture we will discuss real scan data. We will concentrate on the analysis of the network topology and distribution of BGP prefixes, whois blocks and network services of well-known autonomous systems on the Internet. As a further example, we will look at the network structure of a large well-known German hoster, which gives us a good overview of its internal organization of data centers and other services. Finally, we will look at some data and analysis from a security perspective.