CIT 594 Module 7 Programming AssignmentCSV Slicer
CIT 594 Module 7 Programming Assignment
CSV Slicer
In this assignment you will read files in a format known as “comma separated values” (CSV), interpret the formatting and output the content in the structure represented by the file.
Q1703105484
Learning Objectives
In completing this assignment, you will:
- Implement a method to extract content from an input CSV
- Read and understand a formal specification for the CSV format
- Use a “state machine”
Background and Getting Started
Applications have long used delimited text files to store and transmit tables. The simplicity of the format means these files are human readable and editable as well as relatively easy to support in code. The breadth of support from nearly any general purpose application or tabular data loading library continues to make these files particularly popular for publishing data despite their limitations and inefficiencies. You will see more of this in later assignments where you will be required to support reading publicly available datasets.
One of the more common choices for delimiters are commas to separate fields within a row of the table, and line breaks to separate rows of the table. The common name for these comma based file formats are “comma separated values” (CSV).
To support values in the fields that include delimiter characters (e.g., if a comma is part of the data, not just a field separator) requires some added complexity. There is no single, universally-accepted specification for CSV files, so we will focus on one specific format that is widely accepted for this assignment, RFC 4180.
To get started, carefully read sections 1 and 2 of RFC 4180: RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files
That document uses a precise syntax to express an exact formal specification. If you are unsure of how to interpret the ANBF grammar you can reference RFC 2234.
CSV Format For This Assignment
You will write a method to process CSV files character-by-character. The exact format is a relaxation of RFC 4180.
Adjustments and clarifications of the descriptive rules:
- Formatting characters describe structure; they are not part of the content. For example, the comma that separates two fields should not be included in either field. For example (easy3.csv):
"example of using "" in a field",1
Should result in a single row that is the equivalent an array constructed with the Java expres- sion:
new String[]{ "example of using \" in a field", "1" }
- The same applies for the escape character in an escaped field. Two double quotes in the middle of an escaped field count as a single double quote character in the content.
- You may ignore rule number 4 (for this assignment). Instead, follow these rules:
- No special treatment should be applied to the first row. You should not treat a header any differently than a record.
- You will not need to check if all rows have the same number of fields.
-
- Commas at the end of a line signify empty fields. For example, "a,b,c," results in a row with four fields: ["a", "b", "c", ""]1
- An empty line terminated by a line break is a valid row if it’s outside an escaped field. Inside an escaped field, it is just part of the content of that field.
- Clarification of rule 2: There is no additional record if the file ends at the start of a line. That includes but is not limited to example in rule 1 where the file ends with CRLF EOF
Adjustments to the grammar:
CRLF = [CR] LF
TEXTDATA = x00-09 / x0B-0C / x0E-21 / x23-2B / x2D-7F
The first rule change makes the carriage return (CR) optional everywhere CRLF is used. This is a relaxation seen in many places to adjust for the fact that many systems and applications choose to omit the carriage return and only use a single line feed character for line breaks.
The second rule change expands TEXTDATA2 to all characters that are not comma, double quote, carriage return, and line feed.
Note: these are relaxations, therefore all documents that are valid for the strict interpretation of RFC 4180 are valid for this format.
For the purposes of the assignment there are five classes of characters for you to consider:
Common Name | RFC Name | RFC Code (hex) | Decimal | Java Character |
Line Feed | LF | x0A x0D x22 x2C | 10 | \n \r " , [^\r\n,"]3 |
Carriage Return | CR | 13 | ||
Double Quote | DQUOTE | 34 | ||
Comma | COMMA | 44 | ||
Anything Else | TEXTDATA |
1The line in rule 4 in RFC about "The last field in the record must not be followed by a comma" is referring to their rule that each line should contain the same number of fields. A comma at the end of a line would insert an additional, empty field. The statement indicates that an extra comma that would increase the number of fields beyond the expected limit should not be dropped or ignored.
|
(Integer.MAX_VALUE) in TEXTDATA if you wish. That will cover many more special characters and allow you to process Unicode values.
3This is just a regular expression to write any other character aside from the four listed.
Activity
Implement the readRow method in CSVReader.java.
You may also add supporting fields and helper methods to the CSVReader object as needed. Each call to readRow must return one row of data from reader until the input is exhausted.
If there is a format error in the input, you should raise a CSVFormatException. You are welcome to use the optional fields available in CSVFormatException for your own informative error messages (potentially quite useful for debugging). Any extra values and messages you use will not be evaluated by the grader.
Performance matters. The overall runtime for reading through the entire input should be O(n) where n is the number of characters in the input. As a reminder, this does mean that certain seemingly convenient operations and data structures may not be appropriate for this assignment. Choose your data structures carefully.
You will process the input one character at a time (hence the provided CharacterReader which is more restricted than the standard java.io.Reader). Along those lines, you may wish to organize your code in a “state machine”. The term “state machine” just implies that a program may respond differently to a given input based on a current “state” or context. A more detailed explanation is included with the assignment as supplemental reading.
Before You Submit
Please be sure that:
- your classes are in the default package, i.e. there is no “package” declaration at the top of the source code
- your classes compile and you have not changed the signature of the methods we have provided
- you have not created any additional .java files
- you have not overloaded any existing method names
- you have filled out the required academic integrity signature in the comment block at the top of your submission files
How to Submit
After you have finished implementing the CSVReader class, go to the “Module 7 Programming Assignment submission” item and click the “Open Tool” button to go to the Codio platform.
Once you are logged into Codio, read the submission instructions in the README file. Be sure you upload your code to the “submit” folder.
To test your code before submitting, click the “Run Test Cases” button in the Codio toolbar.
Unlike the other assignments, you will have different sets of tests you can run in Codio. Most have limits, use them carefully. Once you use up a test, you will not get to try it again, there will be no exceptions.
The test cases we provide here are “sanity check” tests to make sure that you have the basic functionality working correctly, but it is up to you to ensure that your code satisftes all of the requirements described in this document.
Assessment
Your code will be evaluated against test cases that stress the limits of the specification including tricky but valid inputs and invalid inputs. There is no ambiguity in the specification. If something does not fit the grammar it is not valid.
Grading will be roughly:
- 0% trivial CSV inputs
- 20% simple valid CSV inputs
- 60% tricky valid inputs
- 20% invalid inputs
After submitting your code for grading, you can go back to this assignment in Codio and view the “results.txt” file, which should be listed in the Filetree on the left. This file will describe any failing test cases.
FAQ and Hints
- “What about. . . ” → RFC 4180.
- java.lang.StringBuilder
- java.io.Reader
- Characters in C and Java are written with single quotes (’a’). Double quotes are for strings ("this is a string").
- The switch statement is your friend (at least for this assignment). See the provided StateMa- chineIntro.pdf for an example.
Processing trivial CSV files (e.g., a file with quotes) can be as simple as:
Files.lines(Path.of(filename)).map(line -> line.split(",")).toArray()
This might help you get started with some initial testing. This is also just the baseline functionality for your implementation. You should expect no credit if your implementation does not correctly handle more sophisticated tests.
相关文章:
CIT 594 Module 7 Programming AssignmentCSV Slicer
CIT 594 Module 7 Programming Assignment CSV Slicer In this assignment you will read files in a format known as “comma separated values” (CSV), interpret the formatting and output the content in the structure represented by the file. Q1703105484 Learning …...
链路追踪——【Brave】第一遍小结
前言 微服务链路追踪系列博客,后续可能会涉及到Brave、Zipkin、Sleuth内容的梳理。 Brave 何为Brave? github地址:https://github.com/openzipkin/brave Brave是一个分布式追踪埋点库。 #mermaid-svg-riwF9nbu1AldDJ7P {font-family:"…...
Vision Transformer(ViT)
1. 概述 Transformer[1]是Google在2017年提出的一种Seq2Seq结构的语言模型,在Transformer中首次使用Self-Atttention机制完全代替了基于RNN的模型结构,使得模型可以并行化训练,同时解决了在基于RNN模型中出现了长距离依赖问题,因…...
104-JVM优化
JVM优化为什么要学习JVM优化: 1:深入地理解 Java 这门语言 我们常用的布尔型 Boolean,我们都知道它有两个值,true 和 false,但你们知道其实在运行时,Java 虚拟机是 没有布尔型 Boolean 这种类型的&#x…...
QML 颜色表示法
作者: 一去、二三里 个人微信号: iwaleon 微信公众号: 高效程序员 如果你经常需要美化样式(最常见的有:文本色、背景色、边框色、阴影色等),那一定离不开颜色。而在 QML 中,颜色的表示方法有多种:颜色名、十六进制颜色值、颜色相关的函数,一起来学习一下吧。 老规矩…...
基础数据结构--线段树(Python版本)
文章目录前言特点操作数据存储updateLazy下移查询实现前言 月末了,划个水,赶一下指标(更新一些活跃值,狗头) 本文主要是关于线段树的内容。这个线段树的话,主要是适合求解我们一个数组的一些区间的问题&am…...
【micropython】SPI触摸屏开发
背景:最近买了几块ESP32模块,看了下mircopython支持还不错,所以买了个SPI触摸屏试试水,记录一下使用过程。硬件相关:SPI触摸屏使用2.4寸屏幕,常见淘宝均可买到,驱动为ILI9341,具体参…...
【云原生】k8s中Pod进阶资源限制与探针
一、Pod 进阶 1、资源限制 当定义 Pod 时可以选择性地为每个容器设定所需要的资源数量。 最常见的可设定资源是 CPU 和内存大小,以及其他类型的资源。 当为 Pod 中的容器指定了 request 资源时,调度器就使用该信息来决定将 Pod 调度到哪个节点上。当还…...
AI - stable-diffusion(AI绘画)的搭建与使用
最近 AI 火的一塌糊涂,除了 ChatGPT 以外,AI 绘画领域也有很大的进步,以下几张图片都是 AI 绘制的,你能看出来么? 一、环境搭建 上面的效果图其实是使用了开源的 AI 绘画项目 stable-diffusion 绘制的,这是…...
应用场景五: 西门子PLC通过Modbus协议连接DCS系统
应用描述: 西门子PLC(S7200/300/400/200SMART)通过桥接器可以支持ModbusRTU串口和ModbusTCP以太网(有线和无线WIFI同时支持)两种通讯方式连接DCS系统,不需要编程PLC通讯程序,直接在模块中进行地…...
我继续问了ChatGPT关于SAP顾问职业发展前景的问题,大家感受一下
目录 SAP 顾问 跟其他IT工作收入情况相比是怎么样的? 如何成为SAP FICO 优秀的顾问 要想成为SAP FICO 优秀的顾问 ,需要ABA开发技能吗 SAP 顾问中哪个类型收入最多? 中国的ERP软件能够取代SAP吗? 今天我继续撩 ChatGPT。随便问…...
Python小白入门---00开篇介绍(简单了解一下)
Python 小白入门 系列教程 第一部分:Python 基础 介绍 Python 编程语言安装 Python 环境变量和数据类型运算符和表达式控制流程语句函数和模块异常处理 第二部分:Python 标准库和常用模块 Python 标准库简介文本处理和正则表达式文件操作和目录操作时…...
【算法基础】C++STL容器
一、Vector 1. 初始化(定义) (1)vector最基本的初始化: vector <int> a;(2)定义长度为10的vector: vector <int> a(10);(3)定义长度为10的vector,并且把所有元素都初始化为-3: vector <int...
【经典蓝牙】蓝牙 A2DP协议分析
A2DP 介绍 A2DP(Advanced Audio Distribution Profile)是蓝牙高音质音频传输协议, 用于传输单声道, 双声道音乐(一般在 A2DP 中用于 stereo 双声道) , 典型应用为蓝牙耳机。 A2DP旨在通过蓝牙连接传输高质量的立体声音…...
Objective-C 构造方法的定义和声明规范
总目录 iOS开发笔记目录 从一无所知到入门 文章目录源码中 NSArray 的构造方法与命名规律自定义类的构造方法命名截图代码输出源码中 NSArray 的构造方法与命名规律 interface NSArray<ObjectType> (NSArrayCreation) (instancetype)array;(instancetype)arrayWithObject…...
Matlab图像处理学习笔记
Matlab图像处理 Matlab基础 数组 1、向量 生成方式1: x = [值] x = [1 2 3] % 行向量 y = [4; 5; 6] % 列向量 z = x % 行向量转列向量...
笔记(三)——迭代器的基础理论知识
迭代器是一种检查容器内元素并且遍历容器内元素的数据类型。它提供对一个容器中的对象的访问方法,并且定义了容器中对象的范围。一、vector容器的iterator类型vector容器的迭代器属于随机访问迭代器,一次可以移动多个位置。vector<int>::iterator …...
没有公网ip怎么外网访问nas?快解析内网端口映射到公网
对于NAS用户而言,外网访问是永远绕不开的话题。拥有NAS后的第一个问题,就是搞定NAS的外网访问。不过众所周知,并不是所有的小伙伴都能得到公网IP,由于IPV4资源的枯竭,一般不会被分配到公网IP。公网IP在很大程度上除了让…...
spring integration使用:消息转换器
系列文章目录 …TODO spring integration开篇:说明 …TODO spring integration使用:消息路由 spring integration使用:消息转换器 spring integration使用:消息转换器系列文章目录前言消息转换器(或者叫翻译器&#x…...
Vue3电商项目实战-商品详情模块7【21-商品详情-评价组件-头部渲染、22-商品详情-评价组件-实现列表】
文章目录21-商品详情-评价组件-头部渲染22-商品详情-评价组件-实现列表21-商品详情-评价组件-头部渲染 目的:根据后台返回的评价信息渲染评价头部内容。 yapi 平台可提供模拟接口,当后台接口未开发完毕或者没有数据的情况下,可以支持前端的开…...
地址,指针,指针变量是什么?他们的区别?符号(*)在不同位置的解释?
指针是C语言中的一个重要概念,也是C语言的一个重要特色;使用指针,可以使程序简洁、紧凑、高效。不掌握指针,就没有掌握C语言的精华。 目录 一、定义 1.1地址 1.2指针 1.3指针变量 1.4指针和指针变量的区别 二、使用指针变量…...
【MongoDB】一、MongoDB的安装与部署
【MongoDB】一、MongoDB的安装与部署实验目的实验内容实验步骤一、下载MongoDB安装包二、创建文件夹data及子文件夹db和log三、启动MongDB服务1. 在命令行窗口执行启动MongoDB服务命令2. 打开mongodb.log3. 打开浏览器进行启动验证四、登录MongoDB五、配置环境变量六、将MongDB…...
《爆肝整理》保姆级系列教程python接口自动化(二十三)--unittest断言——上(详解)
简介 在测试用例中,执行完测试用例后,最后一步是判断测试结果是 pass 还是 fail,自动化测试脚本里面一般把这种生成测试结果的方法称为断言(assert)。用 unittest 组件测试用例的时候,断言的方法还是很多的…...
MySQL的mvcc
mvcc(多版本并发控制) MVCC 是通过数据行的多个版本管理来实现数据库的并发控制 。使得在InnoDB的事务隔离级别下执行 一致性读操作有了保证。可以认为是行级锁的变种,在很多情况下可以避免加锁,开销更低 mvcc没有正式的标准&…...
vite:常见的配置
最近在捣鼓一下vite,因为自己一直在使用react,就选择vite、react来体验一下vite。 使用最简单的方法创建一个应用:yarn create vite,然后选择react框架。 vite默认配置是使用了defineConfig工具函数: import { defi…...
计算机图形学:liang算法和Cyrus-Beck算法
其中Cyrus-Beck算法呢,是计算一根直线一个多边形的交线段;liang算法是Cyrus的一个特例,即多边形刚好是矩形;先看看Cyrus算法的思路【从别的博客找的图片】:这很容易理解,点积>0时就可能中内部嘛…...
React组件之间的通信方式总结(上)
先来几个术语: 官方我的说法对应代码React elementReact元素let element<span>A爆了</span>Component组件class App extends React.Component {}无App为父元素,App1为子元素<App><App1></App1></App> 本文重点&…...
C++17 nodiscard标记符
文章目录前言弃值表达式nodiscard标记符函数非弃值声明类/枚举类/结构 非弃值声明返回类引用与类指针前言 在C 17中引入了一个标记符nodiscard,用于声明一个 “非弃值(no-discard)表达式”。那么在开始之前,我们需要了解一下什么是弃值表达式。 弃值表…...
SAP 寄售业务的标准流程
SAP的标准寄售业务,供应商提供的物料只有在公司使用之后才需支付应付账款,类似是一种先吃后付钱的餐饮流程。 SAP的寄售流程把实际业务中的供应商,采购方收货,采购方消耗物料,采购方依据消耗物料数量进行付款ÿ…...
操作系统高频知识
目录 一、线程与进程的区别 区别: 二、多进程和多线程区别 三、进程与程序的区别 三、死锁 1、是什么 2、产生的原因 3、产生的必要条件(4个) 4、如何预防 5、如何避免 6、如何检测 7、如何解除 一、线程与进程的区别 1、线程&a…...
wordpress url绝对路径/win10系统优化软件哪个好
1、判空函数 说明:使用指定的替换值替换 NULL。 语法:ISNULL ( check_expression , replacement_value ) 参数: check_expression:将被检查是否为 NULL 的表达式。check_expression 可以为任何类型。 replacement_value࿱…...
自己可以给公司做网站吗/千锋教育的官网
2019独角兽企业重金招聘Python工程师标准>>> 场景:生产环境下,多个普通用户登录,登录后自动记录history操作到某个统一目录保存。 具体要求: 1) 每个用户登录后自动创建子目录及history记录文件ÿ…...
山东省住房城乡建设厅网站首页/朝阳网站seo
ndk android 硬件解码(2012-11-06 17:33:59)标签:杂谈首先,ndk下实现硬件解码是要针对不同的平台做多版本,将android中的头文件和源码拿出来是可以直接用的.我们设计一个IOMX的调用方式,因为这一层在android是统一的,代码的接口都…...
做视频素材哪个网站好/百度网站关键词排名助手
作用 tf.concat( values, axis, name‘concat’) 输入张量的数据沿axis 维度合并 举例使用 (1)沿着0维度合并 t1 [[1, 2, 3], [4, 5, 6]] t2 [[7, 8, 9], [10, 11, 12]] tf.concat([t1, t2], 0)<tf.Tensor: shape(4, 3), dtypeint32, numpy arra…...
响应式网站建设报价单/市场营销策略有哪些
搜索单词 Windows: Ctrl F Mac : Cmd F 会在当前激活的文件上查询输入的关键字,以高亮显示 跳转行 Windows: Ctrl L Mac : Cmd L 比Eclipse更加细致,可以先输入行号,然后输入冒号,最后跟上字符的位置 Navig…...
网络营销方式举个例子/温州seo公司
下面介绍无监督机器学习算法,与前面分类回归不一样的是,这个不知道目标变量是什么,这个问题解决的是我们从这些样本中,我们能发现什么。 这下面主要讲述了聚类算法,跟数据挖掘中的关联挖掘中的两个主要算法。 K均值算法…...