Exploring the Multifaceted Role of Compilers

Lecture 02: Introduction (Contd.)

Estimated read time: 1:20

    Summary

    In this lecture, we delve into the various applications and phases of compilers, emphasizing their pivotal role in translating and optimizing across different domains. We explore how compilers not only facilitate compatibility between diverse software tools through format conversion but also aid in transitioning legacy code, like COBOL, to newer languages such as C or C++. They are integral to the silicon compilation process in VLSI, optimizing time-to-market for new products. Additionally, compilers enhance database query processing through efficient query optimization, impacting file operation times. We also touch upon their role in text formatting, highlighting the efficiency of text-based formatting tools over graphical ones. The lecture concludes by detailing the phases of a compiler from lexical analysis to code optimization, emphasizing interphase cooperation to detect errors and streamline code generation.

      Highlights

      • Compilers act as translators between incompatible software tools, making it essential for software integration. 🔄
      • Legacy system updates, like COBOL conversions to new languages, rely heavily on compilers. 💾
      • VLSI design leverages silicon compilation to speed up time-to-market, a critical commercial pressure. 💡
      • Database query optimization through compilers ensures faster, more efficient data operations. ⚡
      • Text formatting systems highlight the flexibility and efficiency of compilers in document management. 📘
      • The phases of compilers illustrate their intricate design, enhancing both the synthesis and execution of code. ⚙️

      Key Takeaways

      • Compilers enable format conversion between incompatible software tools, crucial in domains like VLSI CAD. ⚙️
      • Silicon compilation leverages compilers for efficient chip design, addressing time-to-market pressures. 🕒
      • Legacy code conversion, such as COBOL to C++, is a critical role of compilers, preserving functionality. 🔄
      • Query optimization in databases enhances efficiency, showcasing the compiler's role in fast data retrieval. 📊
      • Text formatting through compilers aids in clear, structured document generation, streamlining user input. 🖋️
      • Phases of a compiler—lexical analysis, syntax analysis, semantic analysis, and more—showcase the complexity of modern software development. 🛠️

      Overview

      Compilers are the unsung heroes in software development, quietly working behind the scenes to convert and optimize code. Whether it's translating older languages like COBOL into modern C++, or ensuring that your VLSI design project meets the accelerated demands of market deadlines, compilers are there to bridge the gaps.

        In the realm of database query optimization, compilers turbocharge the efficiency of data retrieval processes, proving pivotal in managing large-scale data operations. They're the hidden engine that makes complex query processes quick and reliable. Meanwhile, for text formatting, compilers provide a more streamlined method for document preparation, offering an alternative to graphical formatting tools.

          The architecture of compilers showcases a meticulous yet ingenious network where phases such as lexical and syntax analysis work together, exemplifying the high level of cooperation required to handle error detection and facilitate seamless translation. This layered approach ensures that from code creation to execution, every step is optimized, reliable, and precise.

            Chapters

            • 00:00 - 03:00: Introduction to Compilers and Translations The chapter introduces the concept of compilers and their importance in software development, particularly emphasizing their role in format conversion. This is crucial when integrating different software packages that use diverse data formats.
            • 03:00 - 08:00: Format Conversion and Language Translation The chapter discusses the challenges of format conversion and language translation between software packages. It highlights the necessity of translating output from one software as input for another, especially when they come from different vendors with potentially incompatible input and output formats.
            • 08:00 - 15:00: Silicon Compilation and Hardware Description Languages The chapter discusses the challenges of silicon compilation and the role of hardware description languages (HDLs). It explores the issue of tool compatibility, specifically highlighting a scenario where the output of one tool (Tool 1) is not directly compatible with the input requirements of another tool (Tool 2). This incompatibility may arise due to differences in expected input formats, such as the inclusion or omission of semicolons at the end of lines. The chapter emphasizes the commonality of such challenges in the field, indicating the need for careful handling of data formats during the compilation and design process.
            • 15:00 - 22:00: Database Query Optimization The chapter discusses database query optimization and introduces the concept of format converters and translators. These are tools used to ensure compatibility between different systems or tools that require variables to be defined in various ways and outputs to be in specific formats. The chapter emphasizes the importance of translators in the context of the VLSI CAD domain, where multiple tools might need to process data in different formats, necessitating translation to ensure seamless operation.
            • 22:00 - 27:00: Text Formatting Tools and Their Compilation This chapter delves into the intricacies of text formatting tools, particularly focusing on their ability to convert between various input and output formats across different vendor tools. It emphasizes the importance of compatibility in ensuring seamless transitions and effective data management. Furthermore, the chapter explores the conversion of legacy programs, notably those written in older languages such as COBOL, to more modern languages like C and C++. COBOL, known as the Common Business Oriented Language, was extensively utilized historically, especially within office environments, underscoring the need for efficient conversion tools to maintain continuity and operational efficiency as technology evolves.
            • 27:00 - 31:30: Phases of a Compiler The chapter discusses the historical context and functionality of different programming languages, specifically focusing on COBOL. It highlights COBOL's effectiveness in file handling despite its slower execution speed, and contrasts this with the later development of languages like C and C++, which provided improvements in various computational aspects.
            • 31:30 - 39:00: Lexical Analysis and Syntax Analysis This chapter discusses the evolution of programming languages, emphasizing the shift from older languages like COBOL to more advanced languages such as 4GL (Fourth Generation Languages). The focus is on maintaining existing programs, such as legacy payroll software written in COBOL, by upgrading or converting them rather than discarding them. This necessitates an understanding of both lexical analysis and syntax analysis to effectively translate and optimize the old code into newer programming paradigms.
            • 39:00 - 41:00: Semantic Analysis and Intermediate Code Generation The chapter discusses the role of format converters in automatically translating older programming languages like COBOL into modern languages like C or C++. This process is crucial for updating and maintaining legacy systems. It also touches upon the role of compilers in this transformation and other related applications.
            • 41:00 - 45:00: Code Generation and Optimization The chapter discusses the concept of silicon compilation which is a method used for designing integrated circuit chips, such as processors. There are two approaches to designing a processor. The first approach involves starting with the lowest level modules, which are the basic digital circuits like adders and subtractors.
            • 45:00 - 48:00: Symbol Table Management and Error Handling In this chapter, the focus is on symbol table management and error handling within the context of multiplier register design, building up to full system design. The chapter explores two approaches to handling errors. One approach acknowledges the time-consuming nature of careful manual design and the potential for error introduction, resulting in incorrect chip fabrication. An alternative approach is suggested to address this issue more effectively.
            • 48:00 - 52:00: Integration of Compiler Phases The chapter titled 'Integration of Compiler Phases' begins with an explanation of how the system's behavioral description is captured using hardware description languages such as VHDL and Verilog. These popular languages enable the description of hardware at various levels including behavioral, structural, and functional levels. This foundational understanding at the behavioral level is essential for further process generation and integration efforts.

            Lecture 02: Introduction (Contd.) Transcription

            • 00:00 - 00:30 [Music] another application that we have for compare for these compilers they are the format conversion so many times what happens is that we need to as I was telling that we have got two different software packages and between the
            • 00:30 - 01:00 software packages the output of one software package should go to the as a go as input of another packet so accordingly we need to give some translation because therefore there may be the input output formats between the tools may not be compatible I mean particularly if they are coming from different vendors so in that case so the software output that we have from one tool so if this is say tool one it takes some input and it produces some output
            • 01:00 - 01:30 and then we have got tool two but this this tool one output I need to feed it to tool 2 but it is not compatible so tool to it expects input in some format so it is very much common particularly like maybe in tool to the input is the input format so it does not take semicolons at the end of the lines whereas tool 1 it produces semicolons at the end so like that
            • 01:30 - 02:00 so maybe tool to it requires the variables to be defined in some fashion whereas tool 1 output is in some different format so in those cases what is needed is that in between we put a translators so this is a this is another compiler and this translator produces the output which is understandable by tool too so and it goes like this so this way we have some format converters and this is very much common particularly if you look into the CAD domain VLSI CAD domain so you will find the many such tools where we need to do
            • 02:00 - 02:30 lot of format conversions so this is compatibility between input output formats between tools of different vendors now next thing is that also it is used for converting heavily used programs written in some older languages like COBOL to newer languages like C C++ so COBOL is a language common business oriented language so that was used at one point of time so they were used very much particularly for this office
            • 02:30 - 03:00 automation purposes where we used to generate this payroll and all using this COBOL programs now though this COBOL as a language is very much service and it is the programs that are that runs in COBOL so they are not very much efficient as far as execution speed is concerned but they are very efficient as far as file handling is concerned on the other hand today after that this language is like C and C++ came so now
            • 03:00 - 03:30 the new software there be that are being developed so they are not they are not being done in COBOL buts in some next-level programming languages or even in for GL for generation language but that point is there so there are large number of programs which were existing so this payroll software that we had so that is that was existing in some old language like COBOL so we need to convert we don't want to make though all those programs junk rather we try to
            • 03:30 - 04:00 have some converter so that we can automatically generate the C C++ programs from those old COBOL programs so that way we have got we have we have to have this problem these programs which are written in some older languages translated into newer languages so that is the role of format converter so compare so here also we have got the compilers in picture next we have got another application which is
            • 04:00 - 04:30 known as silicon compilation so silicon compilation is like this that today suppose we are trying to design some new integrated circuit chip for example suppose we want to generate a new processor now if we want to design a processor now it there are there can be two different avenues by which we do it in one Avenue we start with the lowest modules like the lowest-level digital circuits like adder subtractor
            • 04:30 - 05:00 multiplier register design then slowly go up and go up to the full system design so but the problem in this particular approach is that it is time consuming and if the user is not very much careful so there is a possibility that somebody will get introduced into the process and then the chip that is fabricated is not correct on the other hand so there can be another Avenue by which we can handle this problem is
            • 05:00 - 05:30 start with the behavioral description of the system in some language hardware description language which is like VHDL very log so these are popular hardware description language in which you can describe the hardware in different levels so behavioral level structural level functional level like that so at the top level we have got the behavioral level description so you can describe the behavior of the processor in terms of this language constructs and then from there we try to generate the
            • 05:30 - 06:00 circuit now you see here the input to the system is a description in some VHDL or a very log language and then the output that we have so if this is the silicon compiler if this is the silicon compiler so here you have got this VHDL code as input and as an output you do not have machine code but rather you have rather you have got some other you
            • 06:00 - 06:30 have got another VHDL code which is basically a structural level description which we call netlist which we call netlist so this is the netlist of this library components that we have so so that netlist so so net that netlist that we are talking about so this netlist is
            • 06:30 - 07:00 nothing but it is a collection of hardware modules that will be making that will be making this whole system for example in your system you need adder subtractor registers etcetera and some connection pattern between them so this netlist will be a connection it will having all those module descriptions and it will have the connections between them so this thing is automatically synthesized so so user does not design the individual adders so they are designed previously and available in the library and it is taken
            • 07:00 - 07:30 from there so that way it is done so for very complex systems so it is so this is the way that we should proceed we should know it should start with the behavioral description and then convert it into the netlist depending upon upon some library modules so this is this is known as the silicon compilation process and this is also some sort of compiler so complexity of circuits increasing with reduced time to market so there so this is the
            • 07:30 - 08:00 pressure that we have so two motive everyday we are coming up with new or a newer systems electronic systems that have got more and more functionality into it and the and that there is a very high pressure on time to market like if you if we survey the market and see that we want to introduce a new cell phone into the market then maybe if it is introduced with the next six months there will be some benefit some profit but if it is introduced after one year
            • 08:00 - 08:30 maybe that profit will be half and if we introduce after say one and half year maybe the profit will become zero so that way there is a very high pressure on this really on reducing this time to market so here if you are starting at the lowest level and trying to generate the whole system very efficiently so it will take time so we go in the other way so we start with the high level description behavioral description of the system and from there try to come to the circuit and the optimization
            • 08:30 - 09:00 criteria they are also different like when you are writing programs for execution by a processor then the the optimization criteria there is the speed of execution then you have got this what is the memory requirement so these are actually the optimization criteria but when you are doing silicon compilation so the optic criteria is the overall area required by the circuit the power consumption of the circuit the delay that the circuit will
            • 09:00 - 09:30 have so we want to reduce this area power and delay so the measures that we have are totally different so a compiler which is targeted towards say machine code generation will not do well well if you put it onto a silicon composition process so the challenges are totally different next the other application that we have is in the domain of database query processing which is known as query optimization so
            • 09:30 - 10:00 in database you know that a major operation that we have is to answer the user queries and once the database has been created its content is more or less static for example if you look into the if you look into the database of an organization in the employee database of an organization so initially there will be additions and all but after sometime after the database has been stabilized so it is not that every day's a lots of
            • 10:00 - 10:30 employees are joining the companies and leaving the company so that way that database is more or less stabilized or more or less static in nature and the queries that you get on the database are like this maybe it is trying to find out the average salary or every month we have to do salary processing and all that so the queries are much higher in terms of some information taking getting some information from the database but the basic data value changes are much less okay so which is very much
            • 10:30 - 11:00 important that whenever we have got this query so these queries are to be processed and they are to be answered quite fast now how to reduce this search time so if we look into the database theory you will find that there are depending upon the size of the tables that we handle so we can hand we can manipulate the tables in certain order to reduce the overall time requirement so given a query what is the exact sequence of operations that we should do
            • 11:00 - 11:30 on the database for finding the answers quickly so that is a big challenge and that is known as the process of query optimization so that is also some sort of compiler because you see that the given the query in some high-level language like say SQL and all so converting it into some file operations so ultimately you are it is it is doing some file operations over the ultimately the operating system file routines are to be called so how this translation will be done so that is a big question and we have got these
            • 11:30 - 12:00 compilers to help us as it is noted here that we want to optimize the search time so more than one evaluation sequence may be there for each query and the cost depends on upon the relative sizes of tables availability of indexes etcetera so the same query if the table sizes are different so it may be if you process in different sequence of operations then the timing requirement will be different so as it is so we definitely do something so that particular ordering
            • 12:00 - 12:30 has to be found so the the size of the tables will dictate like other with the sequence in which the query should be evaluated so generate proper sequence of operations suitable for the fastest query processing so that is very important so here the optimization is not in terms of service space requirement or that execution time but it is in terms of the retrieval from the database like if you know that these are file operations so whenever you have got
            • 12:30 - 13:00 file operations involved so that file operation take very high amount of time compared to the processing like once the ones are an employee salary record has been retrieved so maybe we just want to increase the basic pay by some amount so that operation is just an addition operation or at most another multiplication operation but the major time is spent in getting the record from the secondary storage so the optimization that we have is in terms of the number of disk accesses and all so
            • 13:00 - 13:30 the optimization criteria is totally different when we are going for going for database operation compared to the situation that we have for say this may code like execution of normal programs another very interesting application that we have for compilers is in the text formatting so there are many tools by which you can format the text files like today you know that there are many such formatting
            • 13:30 - 14:00 tools for example we have some of these thumbs so these formatting tools that we have so they can be classified into two categories sorry so this formatting tools that we have so this text formatting some of these tools so they're what we are doing so we are providing some we have provided some sort of graphical interface and in that graphical interface the font and everything is designed and we just choose the appropriate font and enter
            • 14:00 - 14:30 the text in that format so immediately you get a feedback like you are writing in such-and-such font and font size and also you immediately get it and get an understanding like how is it looking like so now you can you can choose some other format you can make some part bold and all by doing it immediately on to the text in some other format so that it is like this that if you just write a file and in that file you put your
            • 14:30 - 15:00 normal text and before this takes you put some and also embed some formatting command in it some formatting comment you input into it in the same file so in this case what will happen is that so now this whole thing is given to a compiler and the compiler so it applies these formatting commands on this text and then it will be converted into a formatted text so here in this in case
            • 15:00 - 15:30 of graphical thing what is what was happening is it was that you are just you are immediately looking getting this formatted text and as a visual output so you are just looking into that but in case of this one so this file that we have is nothing but a simple text file and this simple text file is converted by means of a compiler into a formatted text file by this so these formatting commands are taken care of like this formatting command maybe it will tell
            • 15:30 - 16:00 that the next few lines I want to make it Alex so accordingly it will do that so it has got many advantages because this the portability of this file is very easy so because it is a simple text file so you can very easily transfer and all and you can so after some after some practice and experience so you may find that these formatting tools are much better compared to this graphics based tools so
            • 16:00 - 16:30 this text formatting tools so they come under this compiler so this is also some sort of compiler because you are accepting input in some text file format and then as an output you are producing this graphical thing so that is also a text formatting so this takes formatting tools so they accept ordinary text file as input having formatting comments embedded into it and it generates the formatted text so and so these are some of the example like so in UNIX operating
            • 16:30 - 17:00 system you have got trough in teed-off in roof and there is another very well known package it is known as Latics which are used for doing this type of translation so they come under the broad heading of formatting tools so that is also a type of compilers next we will look into the phases of a compiler so you see that if I have got a big job to do like this called translating from one
            • 17:00 - 17:30 language to another language maybe high level sassy language to target machine code or say some a VHDL to this silicon output or say this formatting tools text formatting tools or whatever for whatever the application we think about so this transformation process is quite complex so ideally I can have I have a monolithic piece of software which is doing this translation the compiler appears to be monolithic but in the design phase just to make this compiler
            • 17:30 - 18:00 design process simpler so we can we can think about it to be divided into a number of phases though practically speaking so there is no hard demarcation okay so these modules are there they are not they are not demur Keable clearly but for our understanding for our discussion in the course so we will be dividing them into number of phases okay so the first phase is known as the lexical analysis phase so to start with so before going into this so let us try
            • 18:00 - 18:30 to understand like if I have got some language for example if I have got the language English okay so any language so what we have immediately is the first thing that we have is the alphabet any language starts with the alphabet now you know that in English language the alphabet set is say this capital a B etcetera up to capital Z then this small a small B small C etcetera up to small
            • 18:30 - 19:00 zip now there are many more symbols like this 0 1 2 9 then whatever special symbols that we are allowing in say today's English language text so they all come under this alphabet set so for any language it starts with the alphabet so we have we are having the alphabet now once I know the alphabet sake so this is commonly represented by the symbol Sigma ok and then this alphabet some of these symbols in the alphabet
            • 19:00 - 19:30 they are combined to form words so this alphabet is from the alphabet we get words word is nothing but collection of alphabets and it is separated by some special separator for example in most commonly we are using the separator blank ok for english-language sentence so this word so what so any collection of alphabet you take ok so so that is a
            • 19:30 - 20:00 word now this word may be a valid what what may be an invalid word so what you can classify into two categories one is the valid word and another is the invalid hood for example for English language CA Tcat so this is a valid word as far as we know but say see
            • 20:00 - 20:30 this possibly not a valid word so I am NOT very much knowledgeable in English language that way so I cannot say very clearly that whether CTI is a valid word of English or not but assuming that it is not a valid word so it is it goes into the invalid category so that is the second thing that we have so alphabet said so if there is a so if my if my English language is does not allowing the Greek symbols like alpha beta etcetera then if I get the symbol beta somewhere in the text then I will say
            • 20:30 - 21:00 okay this beta is a is not in the alphabet set so there is a problem with the alphabet but even if the alphabet part is correct this ETA is not a valid word so that is the another level of invalidity the next level of invalidity that comes like when we consider the sentences when we consider a sentence now sentence so it's a collection of world collection of world now if we
            • 21:00 - 21:30 consult the grammar of this English language grammar of the English language so there are certain rules which will tell us like what are the valid sentences what are invalid sentences and also it's a very very huge process or I should say very complex process to say whether a language that a statement that I have made in English language whether it is grammatically correct or not but anyway so if we take some simpler language maybe the grammar rules will be
            • 21:30 - 22:00 simple and it will be very Civic it will be easy to tell given a sentence whether it is whether it is a whether it is valid for the language or not so accordingly you can say that this language this sentence it can also be valid sentence or it can be invalid sentence now this if a sentence is valid sentence we say that if the sentence belongs to the language if the sentence is an invalid one we say that it is not
            • 22:00 - 22:30 belonging to the language so that way we can have this foreign language starts with alphabet then goes to words then goes to sentence at each level we have got the issue of validity and invalidity so in the lexical analysis phase so what we will try to do is to see whether we have got some valid word or not so alphabet we can check immediately so whether there is any foreign character that are present in the in the
            • 22:30 - 23:00 description and if there is no if there is no foreign character so it will try to see like what are the words of the language that are appearing in the disk in the program so accordingly it can determine the word so it can it will do the processing and it will find out the words so that is the job of lexical analysis once this lexical analysis is done we have got the phase of syntax analysis so in the syntax analysis phase we do the grammatical check then we have
            • 23:00 - 23:30 got semantic analysis where we try to find out whether there is any problem with the meaning like maybe some variable is undefined and all that so that is a semantic analysis phase then we have got some intermediate code generation phase so up to this semantic analysis phase or the syntax analysis phase you can say so this is based on automata theory after that whatever we will have so they are more they will be using the cost of the outcomes of this
            • 23:30 - 24:00 automata theory based analysis but they will be augmented significantly by means of other techniques so that we can do the code generation part the first phases of that is intermediate code generation then that intermediate code is translated into target code so that is a target code generation then after the code has been generated so we can stop there but most of the compilers they will go into a code optimization phase so since this is a the entire
            • 24:00 - 24:30 process is an automated process so there is a very high chance that the code that is generated by this code generation process is not optimized significantly so it will be optimizing it and then maybe we will be getting a faster code for that so that is a code optimization phase so apart from that there are some other points which are also important for Impala design 1 is the symbol table management so symbol table is a table of
            • 24:30 - 25:00 all the symbols that appear in the program all the variables procedures labels that appear in the program so they come under the symbol table part so so the symbol table has to be managed because many times the compiler module so they will refer to this symbol table for the checking and code generation process so they will do that and we have got another very important phase which is known as error handling and recovery
            • 25:00 - 25:30 so whenever there is some error in the source language program then this compiler has to detect it and after detecting it it has to say it has to proceed like if there is a there is an error at line number 10 then it is not advisable that the compiler just prints that there is an error in line number 10 and comes up rather it should check the remaining part of the program also and come up with a list of all those lines so why are the compiler thinks that there is an error maybe there are errors
            • 25:30 - 26:00 at line 10 15 18 19 20 like that so all those lines as many as possible so if we can list it then in the next go the programmer may correct all those errors and then come up with the corrected version of the program so that way the program works life will be simpler otherwise the programmer corrects line number 10 gives it for compilation and finds that again the compiler has given an error that there is an error in guideline number 15 so that is not very
            • 26:00 - 26:30 much advisable so this error handling and recovery so this is important so the recovery part will be clear when you go into the details of this error analysis job so it is not very much understandable now so there is a caution of course that you should not think that all these modules are totally dealing from each other it is not that the output of one module is given entirely the second module then the second module starts so it is not like that so there is not much demarcation between the
            • 26:30 - 27:00 modules and they work hand-in-hand interspersed when so it is like this that whenever syntax analysis phase it will find it may find that it needs some more words okay next word from the system from the from the program so it will ask the lexical analysis phase to give it the next word similarly when the code generation phase so it will try to generate the code it may ask the syntax energy analysis phase to generate the to give it to tell it a what is the next
            • 27:00 - 27:30 rule by which I should proceed what is the next grammar rule by which I should proceed so that way all these modules they go hand-in-hand the later part like code optimization and also they are a bit independent but up to this intermediate code generation phase so I should say that they are all interlinked with each other so next we have this this diagram actually depicts the whole thing like your source language program
            • 27:30 - 28:00 it enters into the lexical analysis phase the lexical analysis phase so it does many thing like there are there normally the pro source language program they have got a lot of comments so the comments are early move because comments are never translated into machine code so they are removed from the program then between two words somebody may put one blank somebody may put ten blanks like that so all those extraneous blank spaces and they will be removed and many times we have got some header files we
            • 28:00 - 28:30 include some header files and we say that the compiler will for example in C language we have got the hash include directive so they are all expanded so this is all the expansion will be done by the lexical analysis phase so ultimately this lexical analysis phase so it will generate some tokens so these tokens will be used by Cintas analysis phase synthesis analysis phase generates parts tree which is used for semantic analysis that is the program is semantically correct then it goes into the intermediate code generation phase
            • 28:30 - 29:00 then from there the intermediate code it is translated into target code goes to target code generation phase and from the target code it goes into optimization phase and the symbol table management it is used by all the modules from lexical analysis to code optimization some of the phases they will put entries into the symbol some of the phases so they will use the values that are present in the symbol table and there are error handling and
            • 29:00 - 29:30 recovery routines so which will be which are integrated with all these phases again and if there is some error that has occurred so it will be it will be trying to get trying to get some information from different phases and flash appropriate error message to the user so that the user can correct the program in a better fashion so these are the different phases so by which the compiler generates that I optimized target code from the source language program