Hacking 101: Mastering Information Gathering Techniques | Practical Guide 2024
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
This video by The House of Hackers delves into the intricate realm of information gathering techniques essential for mastering web penetration testing. Taking a structured approach based on guides like OWASP, the session emphasizes the importance of sequentially performing reconnaissance, exploitation, and reporting stages. Information gathering is highlighted as a crucial phase, featuring tools and techniques for uncovering IP addresses, subdomains, and technology stacks, among other components. Additionally, exercises on using whois services, Google dorks, and other intelligence tools equip viewers with practical skills to enhance their pentesting prowess.
Highlights
Information gathering is the backbone of effective pentesting. ๐
Utilize the whois service to uncover critical domain details. ๐
Google's advanced search capabilities are a pentester's secret weapon. ๐ต๏ธโโ๏ธ
Understanding server information and port configurations is key. ๐ช
Discovering subdomains and hosts can reveal hidden vulnerabilities. ๐ง
Key Takeaways
Master the art of information gathering to boost your pentesting skills. ๐
Follow a structured approach to web penetration testing for thorough assessments. ๐ต๏ธโโ๏ธ
Leverage tools like whois, nmap, and Google dorks to gather vital target information. ๐ง
Understand the importance of exploring attack surfaces and technology stacks. ๐พ
Recognize the power of search engines and crawlers in uncovering hidden data. ๐
Overview
Dive into the fascinating world of hacking with The House of Hackers as they unveil the secrets of information gathering techniques. This session is tailored for aspiring pentesters eager to learn the nuances of web penetration testing. The video emphasizes a methodical approach, urging learners to follow a structured methodology based on the OWASP testing guide.
Information gathering, or reconnaissance, is portrayed as the pivotal step in the penetration testing process. Viewers are introduced to various tools and methods that help in collecting valuable data about the target application, such as IP addresses, subdomains, and server configurations. This phase ensures pentesters have a comprehensive understanding of their target, paving the way for more informed testing decisions.
Practical demonstrations of tools like whois, Fierce, and The Harvester dominate the video, providing viewers with firsthand experience in executing searches and analyzing collected data. The video also highlights the role of search engines like Google in hacking, showcasing methods to exploit advanced search features for gathering sensitive information. This all-encompassing guide aims to equip viewers with the essential skills needed for a successful pentesting journey.
Chapters
00:00 - 00:30: Introduction to Hacking and Methodology The chapter titled 'Introduction to Hacking and Methodology' is about transitioning from theoretical understanding to practical application in hacking. It teases the excitement of getting into actual hacking practices after having grasped the basics of web applications, their interconnectivity, specifications, and standards. The chapter's primary focus is to prepare the reader for hands-on practice in a lab setting, hinting at a methodology to approach hacking effectively.
00:30 - 01:00: Stages of Web Penetration Testing The chapter discusses the process of web penetration testing, which is frequently divided into various stages as outlined in methodologies like the OWASP Testing Guide. The typical stages involved in web penetration testing are reconnaissance, exploitation, and reporting. Additionally, the course created by the speaker breaks down the penetration testing procedures into seven stages, aligning with common practices accepted in the industry.
01:00 - 01:30: Importance of Information Gathering The chapter emphasizes the significance of meticulous information gathering in penetration testing. It advises the reader to approach the process methodically, ideally following a sequential order to ensure thoroughness. The objective is to identify as many vulnerabilities as possible, covering every aspect of the application comprehensively. This ensures that no potential issue is overlooked, thereby leaving 'no stone unturned' in the assessment.
01:30 - 02:00: Information Gathering Techniques The chapter titled "Information Gathering Techniques" delves into the initial and crucial phase of pentesting, which involves information gathering, reconnaissance, or discovery. It underlines the importance of extracting information about the target intended for hacking, setting the groundwork for the subsequent phases of penetration testing.
02:00 - 02:30: Using Whois Service The chapter 'Using Whois Service' delves into the significance of gathering comprehensive information about a target application during a web penetration test. It emphasizes the importance of understanding every aspect of the application, including the database server and user information, to explore all possible vulnerabilities. The chapter underscores that having detailed information enhances the tester's options and strategies for attempting to penetrate the web application.
02:30 - 03:00: Tools for Discovering Subdomains and Hosts This chapter focuses on the tools and techniques used for discovering subdomains and hosts of a target website. It emphasizes the importance of identifying the IP addresses, subdomains, and related information by gathering data from publicly available resources. These resources include search engines like Google, Bing, and Yahoo, as well as websites like archive.org. Additionally, it highlights the role of social networking sites such as Facebook and Twitter in identifying individuals related to the target.
03:00 - 03:30: Understanding Hosts and Subdomains Chapter: Understanding Hosts and Subdomains
In this chapter, the focus is on the initial steps of penetrating a web application by spidering and creating sitemap to comprehend its flow. It emphasizes the importance of meticulously gathering information at this early stage as seemingly trivial details may prove crucial for exploitation in later phases. The overall success of the penetration test is heavily reliant on the quality of information collected during this stage.
03:30 - 04:00: Using Tools for Subdomain Scanning The chapter titled 'Using Tools for Subdomain Scanning' discusses the process of conducting web penetration tests, which are typically akin to blackbox testing. Testers often start with minimal information such as a URL, IP address, or domain name. The initial step involves digging around the provided IP or domain name to gather necessary information for the test.
04:00 - 04:30: Port and Service Scanning with Nmap In this section, we begin by extracting domain registration information using the WHOIS service. Next, we utilize the extracted data to identify subdomains of the target, as well as other hosts within the same network. Additionally, when feasible, we attempt to discover applications that are being served on the same server or service.
04:30 - 05:00: Expanded Nmap Scans and Techniques This chapter discusses the WHOIS service in the context of registering a domain. It explains that the domain owner must provide personal information, such as name, phone number, and email address, to the domain registrar. This information is public due to the nature of the WHOIS service, allowing anyone to view the name, address, phone number, and email address of the person or entity who registered the domain by querying the WHOIS database.
05:00 - 05:30: Technology Stack and Vulnerability Identification The chapter titled 'Technology Stack and Vulnerability Identification' discusses the process of obtaining information about domain ownership and registration. It highlights how some registrars offer services that conceal the actual owner's information by replacing it with the registrar's details. The chapter explains the use of the 'who is' protocol, which operates on TCP port 43, to access and query registration details from various who is servers available on the internet.
05:30 - 06:00: Extracting Directory Structure by Crawling The chapter discusses the operation of servers managed by Regional Internet Registrars, which are responsible for extracting information about domains and associated contact information. The process is conducted via a terminal command in a software environment (kie) which has a 'who is' client, simplifying the retrieval of this data. It emphasizes the ease of use provided by developers who have integrated this functionality into the terminal interface, allowing users to access helpful information via a simple 'who is' command.
06:00 - 06:30: Using Tools for Crawling and Mapping Applications The chapter discusses using tools for crawling and mapping applications. It starts by using Google.com as an example of a target domain. To find information about the domain, the text suggests using a 'whois' query by typing 'whois google.com' and analyzing the output. This output contains long registration information, and it identifies Mark Monitor as the registrar for Google.com.
06:30 - 07:00: Identifying Vulnerabilities and Gathering Information The chapter focuses on identifying vulnerabilities and gathering information about domains. It begins with the dates of the validity period displayed in a WHOIS query output for a domain like google.com. The WHOIS query also reveals DNS servers associated with google.com, which could be instrumental in finding additional hosts within the domain. The chapter proceeds with a demonstration of how to perform another WHOIS query, starting with obtaining the IP address for google.com using the host command. The host command then executes a reverse IP lookup, facilitating the expansion of information about the domain.
07:00 - 07:30: Using Search Engines for Information Gathering ### Using Search Engines for Information Gathering
In this chapter, the focus is on how to effectively use tools like search engines for gathering information. The chapter provides practical examples, such as using 'whois' queries to find domain details and corresponding IP addresses. This includes pointing to specific 'whois' servers, demonstrating the use of parameters to fine-tune queries, and analyzing the outputs to obtain detailed domain information. This step-by-step guide helps users familiarize themselves with the basic to advanced features of these search tools, enhancing their proficiency in information retrieval.
07:30 - 08:00: Advanced Search Techniques with Google The chapter discusses advanced search techniques using Google, focusing on the use of 'who is' queries. These queries are essential for retrieving detailed information about a domain name, including network range, phone numbers, and addresses. The chapter recommends revisiting online who is services for obtaining authoritative DNS server addresses, which can help identify additional hosts for a domain.
08:00 - 08:30: Google Hacking and Search Operators The chapter discusses the concept of discovering services within a domain such as FTP servers and mail servers. It also mentions using search engines to query and extract information about subdomains and hosts. The speaker promises to clarify the difference between a host and subdomain.
08:30 - 09:00: Showdan and Internet Archive for Reconnaissance The chapter titled 'Showdan and Internet Archive for Reconnaissance' explains the concept of a subdomain using the example 'x.example.com'. It describes how a subdomain (denoted as 'X') is part of the main domain (example.com) and can function as a host if it is associated with an IP address that resolves to a specific computer. The chapter suggests there are multiple methods to connect and manipulate subdomains for reconnaissance purposes.
09:00 - 09:30: Using Foca for Metadata and Network Information This chapter explores the use of tools like Fierce and the Harvester for extracting host and subdomain information. Fierce utilizes brute force techniques to discover subdomains and performs reverse lookups to identify additional hosts once a valid host is found.
09:30 - 10:00: Demonstrating Foca and Extracting Metadata In this chapter titled 'Demonstrating Foca and Extracting Metadata', the focus is on how to conduct a basic scan using a tool called 'Fierce'. The process involves specifying a domain name to scan, in this example 'google.com', and setting parameters such as 'threads' for multi-threading, and specifying an output file to save the results. The chapter aims to demonstrate the practical steps and commands used to perform such metadata extraction tasks.
10:00 - 10:30: Maltego for Relationship Analysis The chapter 'Maltego for Relationship Analysis' delves into various aspects of utilizing Maltego, a tool designed for data mining and relationship linking within computer security investigations. As discussed in the chapter, Maltego can be used to analyze relationships and networks, allowing for detailed insights into connections between data entities. The transcript snippet talks about optimizing scanning performance by increasing the number of threads with the `threads` parameter to run scans faster, and using the `file` parameter to save scan resultsโa process often crucial when analyzing large sets of data or subdomains. Though not explicit, these functions may relate to Maltego's use in gathering data from various nodes, highlighting its utility in relationship analysis by improving efficiency in data handling and storage.
10:30 - 11:00: Demonstrating Maltego Fierce first tries to find the DNS servers for the target domain.
00:00 - 00:30 so finally we're going to get into the actual hacking part I know you're impatient no I'm just kidding I know you're eager and that's a good thing so you've got the idea of web and web applications and the interconnectivity and all that the specifications and how all the standards work and what they actually mean to us so now it's time to start practicing real hacking in the lab now following a
00:30 - 01:00 methodology like the oasp testing guide is really useful and for the most part in these guides you will see testing procedures that are divided into different stages such as reconnaissance exploitation and Reporting so here when I was creating a course I pretty much did the same and divided the web penetration testing procedures into seven different stages and these are mostly acceptable in the
01:00 - 01:30 field you can start from wherever you want to I don't want to force you but to me I really do advise you to follow along sequentially every part and then perform all of the different steps because you have to remember your aim is to identify as many bugs as you can as a penetration testure so that way you can cover every aspect of the application no stone unturned so to speak
01:30 - 02:00 so let's get started with information gathering reconnaissance and Discovery so information gathering or or reconnaissance or Discovery it all means the same thing to us it's to me the crucial stage of pentesting so allow me to identify this face like this it's a phase in which we will extract information regarding the Target that we're attempting to hack and then this
02:00 - 02:30 information can be anything directly or indirectly about the target application or the customer remember the video about attack surfaces when doing a web pen test we need to explore all the possibilities of breaking into the web application that's why we need to know about the application database server as well as the users so the more information We Gather about the target the more options we will have while we're
02:30 - 03:00 testing now to be more specific this phase includes identifying the IP addresses subdomains and related information accumulating information about the Target website from publicly available resources such as Google Bing Yahoo archive.org and showon identifying people related to the target with the help of social networking sites such as Facebook or Twitter
03:00 - 03:30 spidering the web application and creating site maps to understand the flow of the application you really should consider any information gathered at this stage important because even a small bit of information that looks like it could be nothing may help you exploit in the later stages of the test the success of the penetration test depends on the quality of the information gathered in this stage
03:30 - 04:00 cool domain host related info so the conducted web penetration tests are generally close to blackbox testing as we mentioned before so we don't have well hardly any information about the target if any maybe just a URL IP address or a domain name is given to us so to start testing and gathering information we'll need to dig around in order IP or domain name no problem so in
04:00 - 04:30 this section we're going to first extract domain registration information by using the who is service then we're going to use the extracted information to get subdomains of the Target and even the other hosts in the network and then if it's possible we're going to discover the applications that are served on the same server or same service all all right so let's start
04:30 - 05:00 with the who is service when registering a domain the domain owner needs to provide his personal information to the domain registar such as name phone number other contact information and all this is public information due to the nature of the who is service so that means that you can view the name address phone number and email address of the person or entity who registered the domain if you query the regist who is
05:00 - 05:30 service you can get this information but sometimes the registrars can hide this if they have a service to change the owner information with theirs who is records and holds the registration details provided by the domain owner to The Domain registar yes I say who is who is is a protocol that works on Port TCP 43 and there are multiple who is servers on the internet around the
05:30 - 06:00 world these servers are operated by Regional internet registar so they are used to extract information about the domains and the associated contacts information okay so open your terminal in colie and thanks to our developers kie has a who is client and it's very easy to use okay so type who is to see the help file and we're we're going to use
06:00 - 06:30 that so let's think of google.com as your target domain and you got to wonder about Google's registration info right so type who isg google.com and hit enter okay so this is the output for this domain it's long so scroll up and as you're seeing this with me the registar for this domain is Mark monitor
06:30 - 07:00 the dates of validity period are also displayed the output of a who is query also points out the DNS servers for google.com and those will help us to find additional hosts in the domain so now I will run another who is query but first let me get the google.com IP address host google.com and the host command performs a reverse IP
07:00 - 07:30 check so when you give the domain it Returns the corresponding IP address so type who is- who is. arin.net now I will copy this IP address with the H parameter you can point to a specific who is server to query so hit enter and again you get a long
07:30 - 08:00 output but it's got a lot of information such as Network Range phone number and address so who is queries are very handy to originate a domain name there are also online who is services and you've probably had a look at them you may want to look at them again now once we get the authoritative DNS server address by using who is we can identify any additional hosts in the
08:00 - 08:30 domain such as FTP server mail server and so on if there are any other services in this domain we can also extract information from them now another way to discover subdomains and the host is to query search engines and then you can compare their results okay so here you might be confused about host and subdomain so let me give you a quick
08:30 - 09:00 explanation if you think of the address x. example.com now X is the subdomain of example.com and x. example.com can be a host if it is connected to an IP address and resolves to a computer when one goes to x. example.com cool so anyway there are multiple ways
09:00 - 09:30 to extract additional host and subdomain information so that means there's lots of tools out there uh for us to use on this purpose and I'm going to use two tools that are already present in coling Fierce and the Harvester now Fierce is a really cool tool besides that it uses Brute Force methods to get subdomains also after it finds a valid host then it performs a reverse lookup to un cover additional hosts and here are the
09:30 - 10:00 options for Fears okay so first we're going to run a basic scan so type Fierce ddns google.com D threads 10- file /root SL desktop SLG gooogle info. text and hit enter so the DNS parameter specifies the domain that you want to scan so in our example it is
10:00 - 10:30 google.com by default Fierce runs in single thread mode so because of this I can add the threads parameter to increase the speed so that makes the scan run faster and then the file parameter helps us to save the results to a file and now while scanning the hosts or subdomains what happens in the background
10:30 - 11:00 so Fierce first tries to find the DNS servers for the Target domain and next as I'm showing you on the screen it attempts to do a Zone transfer now at this point if a Zone transfer is successful Fierce will stop running and then you can take that information that you got from the Zone transfer now if Zone transfer fails as
11:00 - 11:30 it has in our scan it checks if wildcard DNS is enabled then it performs a Brute Force against the domain using its built-in word list okay so now the scan is complete and as you can see once the scan is finished the found subdomains and discovered subnets are listed we can also View and Save the file but the content is not that
11:30 - 12:00 different in fact it's not different at all from the output on our screen it's just makes it convenient and by default Fierce uses its own built-in word list but it also provides the ability to use a custom word list that you can build and sometimes different word lists can uncover new subdomains so the second tool is the
12:00 - 12:30 Harvester it is another subdomain scanner and it gathers public information such as employee names emails subdomains banners and other similar information but for now we're just going to deal with subdomains and host and type the Harvester and you'll see options now it's quite easy to use this tool so let's just run it quickly type the Harvester DD and the
12:30 - 13:00 domain that you want to search so in my case of course it's google.com now it might be strange because uh I will be using Bing to search for google.com uh but by using the B parameter you can provide a data source L parameter limits the search output so the Harvester will analyze the first 500 search results of being about
13:00 - 13:30 google.com and finally the F parameter helps us to save the result to a file okay so hit enter run and first the search is conducted then it will analyze the result okay so here's what we expected to see so
13:30 - 14:00 now go to the saved file directory going to go to the desktop and here's a file so let's have a look so it's not very different from the output that's on the screen at least we can have smaller Graphics great so you can run these tools um or some other ones if you wish uh for your Target you can go ahead and practice that now let's have a look at applications on
14:00 - 14:30 the same service so this is not just all about the target domain because in general while we're conducting a penetration test we say the URL to the address of the web application and especially in an intranet Network it's quite often uh to see unusual URL addresses in the port and service detection part I'm going to show you a couple of things um but
14:30 - 15:00 sometimes you will see just an HTTP service on a port with different applications and Earls may be in the form of httpd domain.com apppp 1 app 2 Etc so in a situation like that you'll need to detect these applications to get extra information about your Target
15:00 - 15:30 there's really no tool to use in a situation like that so you will have to manually discover these kinds of applications ports and services on the web server web applications are generally served on ports 80 and 443 but it's not limited to use just these port numbers port numbers are configurable so it's not on common to see web application served on a
15:30 - 16:00 non-standard port such as 880 okay so basically a URL or an IP address will be provided to you if you uh start a penetration test but sometimes due to your contract you you may need to run a blackbox testing approach with nothing you wouldn't even have an IP or a URL to get to test in both of these scenarios you
16:00 - 16:30 would need to identify and then map the target Network in some level so the first thing to do is to identify the target network with who is so we've already done that the second thing to do is to map the network and that's what we're going to do now and we will use Celly Linux it's uh it's got a great tool in there to help us out nmap so nmap is short for Network
16:30 - 17:00 mapper it's a powerful open- Source Network scanning tool perfect for conducting reconnaissance and enumeration so here we are going to benefit from nmap so now open up your terminal in Cali and type nmap you can see there are a few options nmap can perform many tasks they can identify live hosts scan TCP and UDP
17:00 - 17:30 open ports detect firewalls get service version running in remote hosts and even with the use of scripts find and exploit vulnerabilities so why don't we just start with a basic scan so at this point let's assume I was given or I have found the target IP or URL so I'm going to use bbox IP address in the example
17:30 - 18:00 simply Type n map 192.168.20.10 and hit enter and here's a basic scan so nmap will scan the target IP with its default options and the result shows the open ports and corresponding service names running on these ports so now let's uh touch on some of these other parameters to perform a detailed
18:00 - 18:30 scan now unmap has several approaches for scanning open ports it sends raw Network packets to several TCP or UDP ports of the Target and checks to see if there's a response and if there is depending on the type of response it will uh Define the port as to whether it's open or not and remembering that HTTP uses TCP for transmitting packets and map will play
18:30 - 19:00 with these packets and then different scan types come up so a regular TCP connection between two IPS is called a three-way handshake so first a sign flag reaches a destination and then it sends the sign act Flags back to the source then the source sends back back the ACT flag to
19:00 - 19:30 start the data transmission so this is the basic and fully qualified TCP connection right so type nmap 192.168.20.10 DSS and the SS parameter will do the sign scan on the target so that means that end map will not complete the three-way handshake it's not going to replace the last sign
19:30 - 20:00 packet with a reset packet but the result is not going to be different than the first one because the sign scan is the default option for nmap now type nmap 192.168.20.10 St and this is a TCP scan if the St parameter is used so that
20:00 - 20:30 means nmap completes the three-way handshake and that way n map can actually connect to the Target Port and then the connection is logged by the server and the result is the same as a previous one but it provides a more accurate state of the port okay so you can see that M maps are really clever tool it first identifies
20:30 - 21:00 the Live host on the target Network and then scans the host for open ports so now let's type n map 192.168.20.10 SN and with the flag SN you can force nmap to check to see if the host is alive or not and then of course the opposite is is also possible change SN to
21:00 - 21:30 PN and then this time in map will not check to see if the host is live or not it will only perform a port scan okay so now let's uh let's be a little bit more specific about these ports so you can use the P flag to define a port number port list or Port range for nmap to scan so here I'm going to give it Port uh 80 and
21:30 - 22:00 443 and of course you can provide service names like HTTP or https this time it will find ports running these services so if you want to scan all the ports just put a dash after the P it's going to take a lot longer but if you're not looking for a specific Port you can limit the number of ports by using top ports as a
22:00 - 22:30 parameter it will look for nmap's top 100 ports now if you only want the open ports just add the open parameter the closed Port is gone also you can add a reason parameter to display the reason a port is in its particular state so until now now uh we've got open
22:30 - 23:00 ports and the names of the services right so thankfully in map can show the software versions running on the open ports and just add the parameter SV to detect versions and it might take a little while to run this kind of scan and here are the results with their version banners so now you can look for these version numbers to see if they have vulnerability I or not and then the
23:00 - 23:30 last thing that we'll do here is determining the operating system running on the target host so just add the parameter o to the previous command and this could also take a little more time and map will analyze the information collected from the open ports and versions and guess the probable OS you can also use inmap scripts and also you can compose your own script by
23:30 - 24:00 using the nmap scripting engine so here are some script names that you can use to help you when you're doing penetration tests I won't show you how to use them but simply because I'm going to be using some other tools and techniques for the same purpose using scripts is certainly not complicated and I I do advise you to use them review technology and architecture information now as we talked about previously the technology stack behind
24:00 - 24:30 an application can vary widely so that means is in the real world it's very common to see complex applications everywhere you turn different Technologies different vendors different versions and so on and so forth so naturally these Technologies can have vulnerabilities or configuration well shall we say shortcomings so in order to discover these vulnerabilities extracting the
24:30 - 25:00 technology information is a one priority it's absolutely critical for your career as a pentester so that's why we need to detect the type in the version of the server software and the application framework or application platform this information will help you to shape the payloads that you're going to use and deploy and it will also bring
25:00 - 25:30 you the awareness that you need to have about known vulnerabilities if the framework has any as well because that's part of your research so I think that you detect a jQuery library or an old server version which has some insecure functions or vulnerabilities yes what a coincidence because it just happens to be perfect in order to compromise the application we're about to do so first open up your terminal in
25:30 - 26:00 coli and we are going to use what web to get some information so see the options for the tool and then before using the tool I want to just tell you a few more things so it's not hard to detect the server and the framework information right you can get this information from the HTTP headers or cookies error messages whatever so the first place to look is the HTTP response headers so
26:00 - 26:30 type what web- A3 HTTP colon1 192.168.20.10 bwap slash so parameter a determines the aggression level of the tool and then the target URL comes up and what web will analyze some HTTP head letters for you it has a very colorful
26:30 - 27:00 output and the tool also follows the redirections so that's a very good feature and here you can see the server header for the server software information and the application framework and platform information from the X powered by header now there may be some other HTTP headers specific to to some particular
27:00 - 27:30 technology so you always need to look out for the headers okay so now I'm going to just minimize a terminal now you can also view headers manually in burp so open up your browser and burp and enable foxy proxy to send traffic to burp now request the login page of B weap I'm going to forward the request and here are the
27:30 - 28:00 headers server and X powered by headers so you see it's very easy however these headers are configurable so that means the administrator can easily change these entries from the configuration of the server and application or even some security products can do this that's why you need to dig into the application and the environment to gain more clues about the server framework and the platform so
28:00 - 28:30 let's say for example a different web server software can have different HTTP header orders and then these servers can behave in a different way if you send some malformed requests also you can look at HTTP headers cookies HTML sources file types and extensions and the error messages are a great way to detect things as
28:30 - 29:00 well extracting directory structure crawling so mapping the application layout and its structure is another very important task although there are some single page applications the applications you will test can generally consist of multiple web pages and by multiple I mean a lot and these Pages can be independent where they can be linked to one another
29:00 - 29:30 there's actually no magic to get the structure unless you're the actual developer of the application and that it's just hours and hours of work so the best way to extract the structure of the application is to visit every page and click every link and fill every form then observe all the URL so you can manually walk through the
29:30 - 30:00 application and identify web pages from authenticated and unauthenticated users perspective so this whole process is called crawling or spidering the application now you might think and you would be right that it's not really possible to do everything manually this really is a time-consuming process so gosh darn it wouldn't you You' just love it if somebody created a tool for crawling and how about some
30:00 - 30:30 scripts for good measure well I'm glad you asked CU now let's go over to collie and open up your browser and burp I know if you were raised like me it's rude but I'm going to ask you to do it again open your browser and burp okay so sorry go to dashboard and make sure that capturing is actually Ive then disable Master interception
30:30 - 31:00 from this button okay so see it's off but in your browser redirect the traffic to burp by enabling foxy proxy so now burp will passively intercept all traffic coming from Firefox then what we're going to do here is visit pages and click links see so that way burp will create a a structure of the application from the clicked URLs by intercepting
31:00 - 31:30 passively right so I'm going to click on uh some links here on the page and then log in open up a few other pages and okay that's enough to show you so okay look at the history and Bur so all the requests are
31:30 - 32:00 here and okay so the aim here is to see a sort of a broad overview of the application layout so there are a few more things in the step to perform but I want you to just have a look here I'm sure that you've heard about robots.txt I know almost every website uses this file to allow or this allow directories to be crawled by bots so robots.txt is a
32:00 - 32:30 file that uses a specification protocol called robots exclusion protocol now it's not really something that you need to consider but you might want to have a look at it later but yeah displaying this file is very handy if you want to see the sensitive pages and directories easily so what am I saying yeah go to the
32:30 - 33:00 robots.txt file of bwab and here are some directories that are not allowed so I'm going to visit each of them admin directory documents and images and then wait what's this passwords directory now go to burp click Target Target tab click site
33:00 - 33:30 map on the left pane you can see the site structure and also there are filter options above you know it this is great I hope you get excited as I do so by clicking show all resources like CSS and images they're all going to show up on the map as well now I think the commercial version of burp will do this task automatically but um I really do want you to uh you
33:30 - 34:00 know get your hands dirty as I was saying before I know it's not the most efficient way but this is how you learn so we need something a bit more intrusive now I should think so C Linux has a number of tools for this job dur Buster is one of them so open up the terminal and simply type type dirbuster and the goey will come
34:00 - 34:30 up dur Buster is the directory brute forcer for web applications so now let's provide the target URL which is HTTP col1 192.168.20.10 bwap and here's a starting point for dirbuster and it will automate the tedious t tasks of cataloging the pages within the application that sounds good
34:30 - 35:00 huh so it works by requesting a web page parsing through it for links and then sending requests to these new links until all the web pages are mapped so then let's increase the number of threads here to 20 and then choose a list browse now there are several lists in dirbuster directory under word
35:00 - 35:30 lists and I'm going to choose the medium directory list okay and click Start oh and another good thing is to identify administrative and test Pages these Pages can contain sensitive information and provide entry points to perform attack such as a Brute Force attack and it's also possible to see old and backup files in the directory structure don't laugh I've seen it many
35:30 - 36:00 times in real world situations so if the old version of the application functions and has any vulnerabilities Bingo you can own the entire system besides that the folders and files belong to the application so there may be meta files and folders of the service software as well as the application framework work and I mean known files and folders
36:00 - 36:30 such as PHP my admin so forth so dirbuster also looks for this kind of stuff and here as you can see it detects Drupal PHP my admin and SQL light okay so at this point I'm going to stop the scan and you can take a report fold text report browse for the location to save and give it the name uh I'm just going to
36:30 - 37:00 type bwap and generate report so now you can go to that folder and here's the report we saved so now you can analyze it minimum information principle so let me tell you there's an interesting belief out there in the development community that they explain everything about the application in
37:00 - 37:30 detail now it's not necessarily bad but sometimes this really good behavior goes beyond its good intention and that means that it provides some important information for a pentester so this kind of information sometimes can help us in well particularly vital situations there may be different ways to get this kind of information but I want to give you a few
37:30 - 38:00 tips so the first thing is to read just about all the HTML source files or at least write your own script to investigate the sources for certain special tags and words you can find an HTML comment that contains any information about the back end of the application such as passwords usernames that's golden also look through the help Pages use the demo users if the
38:00 - 38:30 application has one now once I test an application I can see an unauthenticated help document which contains an administrative demo user so it's these types of errors that are well decreasing constantly because companies are moving faster and faster to better deployment and better development environments but that doesn't mean they're not out there also there may be some error and warning
38:30 - 39:00 directives that are helpful to the general users such as your password is wrong see but for a pentester this means brute force a password because you've already got some usernames right and sometimes errors are caused in the back end and those can be directly reflected to the user and again for a general user it doesn't have any meaning but the hacker or
39:00 - 39:30 pentester is not the same as a general user and the one last thing that I want to mention to you sometimes we can observe all the information about all the employees on the website of the company seriously all board members employees their phone numbers names emails even way more info than that and I'm not telling you that that's a vulnerability or you've got to hide this
39:30 - 40:00 information but I really think that it shouldn't be that easy to find anyway it really does help for the social engineering purposes so in the previous video we talked about crawlers but don't forget search engines are the best crawlers they work almost exactly as we want them to and they have a huge amount of data about the publicly exposed web
40:00 - 40:30 applications see when a search engine bot crawls a web application it indexes the pages According to some rules that are associated to the page and its content they can index almost anything within a website including sensitive information so they have a complex working style and they always update the way they crawl but at the end end of the day they provide us with so much good information uh
40:30 - 41:00 from everything from error messages to vulnerable files and servers okay so now go to Cali and open up your browser and for the majority of Our Generation Google is one of the first search engines that comes to mind but it is by no means alone there are several other search engines such as B and Yandex and Yahoo and of course
41:00 - 41:30 anybody could go on but we are going to conduct the Google queries to get more so for the average person Google is just a search engine used to find text images videos it's even a spell checker for some however for pen testers Google is a very useful hacking tool so go ahead type google.com and you can run Google search queries from this interface but besides this simple
41:30 - 42:00 interface Google has an advanced search functionality so go to settings and click on advanced search and you can use this page for more detailed queries also Google search engine has its own built-in query language and I'll give you a list of these search operators so you can also use these search operators to get detailed results so in order to benefit from Google more
42:00 - 42:30 using these operators can come in quite handily so let's run some searches with some of these Advanced operators okay so let's find New York Times subdomains type site colon NY times.com dite www.nytimes.com so the site command will bring the results that contain only NY
42:30 - 43:00 times.com and the dash before the second site will exclude the results that contain www.nytimes.com okay so look at the number of results now if you add and in Earl colon login it will bring us the results that contain login pages but be careful Google doesn't necessarily want us using the advanced
43:00 - 43:30 search for our purposes Google will start blocking your connection if you connect from a single static IP okay so it will ask for capture challenges to prevent automated queries so I'm going to fill in this capture box it's always a favorite thing to do now look at the number of results it decreases a lot now add or inurl colon signup to bring
43:30 - 44:00 signup pages so this time the number of results are increased so you can also look for a vulnerable version of any of the web Technologies so type in Earl colon PHP myadmin index.php and entitle colon PHP myadmin 2.11 and look at the results it's going to bring up all of the pages that are
44:00 - 44:30 version 2.11 of PHP my admin so also we can perform the same search for SQ light manager just type in title colon SQ light manager in earlon ssq light inext colon Welcome 2 and look at that here are the SQ light manager Pages now there are a few pages
44:30 - 45:00 listed but Google does not only index the HTTP based servers it also indexes open FTP servers so if you type intitle colon index of inurl colon FTP open FTP servers will be listed now I know you might say ah this is not enough well you can go with the prepared queries that are performed by the hacking Community why didn't I say that before
45:00 - 45:30 because these queries are stored in the Google hacking database bet you didn't know it existed now you do so open this page ghdb is also served by offensive security so here every query is called Google dork so you can apply any dork to your target application and server just click on any dork and from here just click to see the Google search result now you can analyze
45:30 - 46:00 the results so imagine that a Drupal vulnerability is announced you can create a Google query to identify the servers or applications that have this vulnerability or you can check Google hacking database to find it and use it so now the world is up to your imagination and you Showdown is a search engine for internet connected devices showan gathers information about all devices directly
46:00 - 46:30 connected to the internet if a device is directly hooked up to the internet then Showdown queries it for various publicly available information the types of devices that are indexed can vary tremendously ranging from small desk talks up to nuclear power plants and everything in between how is it different from Google the most fundamental difference is that showan crawls the internet whereas Google craws the worldwide web however the devices powering the worldwide web only make up
46:30 - 47:00 a tiny fraction of what's actually connected to the internet Shan's goal is to provide a complete picture of the internet you can use the search parameters displayed in the slide country use the country code you want to look for for example use GB for Great Britain us for United States Etc City filters the results according to the specifi ifed City Gio searches in a given location host name filters the
47:00 - 47:30 results according to the host or domain name given net searches in given IP or subnet range OS filters the results according to the operating system Port searches for specific ports before after filters the results according to the date show the results which are before or after the date given as a plus you can see the current showan scans on ic- radar. shodan.io
47:30 - 48:00 let's try to find out the accessible Cisco device interfaces in Great Britain in the search box search for Cisco and last modified words use country search the parameters with the GB code now you can use the country code in double quotes otherwise don't let any space occur between the country colon parameter and GB
48:00 - 48:30 code when you start the search you'll see the results in a few seconds surf on the results you will probably face the login pages of Cisco interfaces you've already found something you can perform for example a brute attack on the login
48:30 - 49:00 page if you're luckier you can find a Cisco device manager interface like the one seen on the slide it's a good approach to look at The Archives or backups of the systems you'll probably find some valuable data in those archives the internet archive is a San Francisco based nonprofit digital library with the stated mission of universal access to all knowledge it
49:00 - 49:30 provides free public access to collections of digitized materials including websites software applications and Games music Movie videos moving images and nearly 3 million public domain books web. archive.org is the archive of the website served by internet archive Library simply write down the URL of the Target website
49:30 - 50:00 you'll see a timetable if you select a year from the timetable you'll see the archive dates of the Target website in details on the side you see the date in which the Target website is captured there in green circles suppose that a sensitive data was published on the Target website accidentally a few days later admins realized the mistake and removed the data from the website but what if someone has already archived that website with the sensitive data select a
50:00 - 50:30 year click a date which is in the green circle you'll see an old version of your Target website that's what the nhs.uk website look like on the 1st of May
50:30 - 51:00 2010 footprinting also known as reconnaissance is the technique used for gathering information about computer systems and the entities they belong to to get this information a hacker might use various tools and Technologies foca fingerprinting organizations with collected archives is a tool used mainly
51:00 - 51:30 to find metadata and hidden information in the documents it scans these documents may be on web pages and can be downloaded and analyzed with focer it's capable of analyzing a wide variety of documents with the most common being Microsoft Office open office or PDF files these documents are searched for three possible search engines Google Bing and duck ducko here's how you can download and install foca you can download foca from the 11
51:30 - 52:00 paths website that is seen on this slide foca is open source you can download all the sources as well as the executable binary from github.com lpath foca however this version requires SQL Server Express installed on the host machine so I prefer to download and use the previous version of focer which requires net framework version 3.5 only it's a portable version so you don't need to install it download the zip file
52:00 - 52:30 extract it go to the bin folder and run focus. exec file that's it to work with focer start a new project using project button on the upper left corner give the project a name enter the website and choose the folder to save the results to when you finish filling the fields click the create button to create a new project after creating a
52:30 - 53:00 new foca project we can start a network scan from the tree at the left side select network node now select the search types the search types listed on the panel are web search you can choose whether Google or Bing DN search dictionary search to perform DNS search using a dictionary IP Bing to serve the domain names posted on the same IP address showan and robex queries and click the
53:00 - 53:30 start button to start the scan now we can collect some documents published by the Target domain to collect their metadata from the tree at the left side select metadata node you're supposed to see a panel similar to the one which is seen on the slide select the document types you want to collect and click the search all button to start the document search you can see the documents found under metadata node of the tree you should download the documents
53:30 - 54:00 to be able to extract the metadata right click the documents you want to download from the menu select download now you can extract the metadata of the downloaded documents you can understand if a document is downloaded from the download column of the table select the documents that you want to collect the metadata rightclick and select extract metadata from the menu you'll see the results under the metadata node of the
54:00 - 54:30 tree let's see foca in action find the 11 paths focal website on the website you see a download button which brings you to the GitHub page of 11 paths you can find the latest release version of foca under foca releases folder it requires SQL Server Express installed on the host machine go back back to the 11 paths website you can find a link to the previous version of
54:30 - 55:00 foca read and accept the Ula and download the foca pro. zip file extract the zip file go to bin folder and run focus. exact
55:00 - 55:30 file on the project menu select new project to create a new project fill the boxes in carefully
55:30 - 56:00 and then click create save the project file for further usages now we can start a new scan select the network node from the tree select the search types on the dictionary search panel you have to choose a valid dictionary the default
56:00 - 56:30 path is probably not valid you can find a valid dictionary inside the DNS dictionary folder which is under the bin folder where you found the focus. exec file click the start button to start the scan and let the scan continue for a couple of minutes
56:30 - 57:00 let's collect the documents from the Target website and extract their metadata select the metadata node from the tree select the document types you're interested in and click search all button to find the
57:00 - 57:30 documents let the search continue for a couple of minutes select the documents that you want to collect the metadata right click and select download select the downloaded documents
57:30 - 58:00 rightclick and select extract metadata at this time look at the nodes under the metadata node of the tree and you will see the metadata extracted from other downloaded documents you can examine the metadata of each document one by one
58:00 - 58:30 or you can find valuable data summarized under the metadata summary node usernames of the owners of the documents operating system where the document is created email addresses collected from the metadata of the documents and more multigo is an interactive data mining tool that renders directed graphs for link analysis the tool is used in online investigations for finding relationships between pieces of information from various sources located
58:30 - 59:00 on the internet the focus of maltego is analyzing real world relationships between information that is publicly accessible on the internet this includes footprinting internet infrastructure as well as gathering information about the people and organizations who own it multigo can be used to determine the relationships between the following entities people names email addresses and aliases groups of people social networks companies
59:00 - 59:30 organizations websites internet infrastructure such as domains DN names net blocks and IP addresses affiliations documents and files connections between these pieces of information are found using open-source intelligence ENT techniques by querying sources such as DNS Records who is records search engines social networks various online apis and extracting metadata multigo provides the results in
59:30 - 60:00 a wide range of graphical layouts that allow for clustering of information which Mak seeing relationships instant and accurate this makes it possible to see hidden connections even if they're three or four degrees of separation apart you can download the multigo CE that's the Community Edition from www.pa.com and it's embedded in C Linux let's see multigo in action go to Cali Linux and open a terminal screen and
60:00 - 60:30 type multigo that's mte goce to run multigo Community Edition embedded first we choose a machine to run there are different machine options specified for different purposes for example there's a TW twiter Digger machine to work on a Twitter account and analyze the tweets let's choose footprint L1 machine this time which is
60:30 - 61:00 a fast and basic footprint for the Target domain the second step is to specify the target domain when we click the Finish button it's going to start to collect data now in the community edition of multigo the results are limited to 12 entries it shows the results in graph
61:00 - 61:30 mode we can zoom out to see the entire picture or zoom in to focus on specific results if you select a node and right click on it you can see all the transforms you're
61:30 - 62:00 able to run for that node transforms are grouped according to their purposes you can expand a group and select a single transform to run or you can run a group of transforms at once you can configure a transform before the run or save a transform in your favorites list let's run on the mirror colon email addresses found transform at this time the results start to come in about a
62:00 - 62:30 minute let's see some more about multigo while we're waiting there are different graph types you can choose to see the results if you zoom out the graph under 30% the entities are shown as the dots instead of meaningful symbols all the colors of the dot point to a specific type email addresses websites domain names and more you can see the color legend in the lower right
62:30 - 63:00 corner the results of the transform we ran a minute ago are here now you can use the toggle full screen button or simply press alt plus enter buttons to toggle the graph to full screen you can select the email addresses node and see the collected email addresses listed in the detailed view window